Bulk Importing to MongoDB: Handling Millions of Products
This is Part 3 of the "Building a Scalable, Faceted Online Marketplace" series. Read the Introduction here.
Why Bulk Import?
Once you have generated massive product datasets, the next step is to get them into a database for further processing and search indexing. MongoDB is a great choice for flexible, document-based storage and is well-suited for high-volume imports.
The Tool: bulk_import.js
- Streams NDJSON or gzipped product files
- Batches inserts for memory efficiency
- Handles errors and retries
- Supports configurable batch size and MongoDB URI
Example Usage
npm run import -- --file ./out/products.computers.00000.ndjson.gz --uri "mongodb://127.0.0.1:27017/online-marketplace" --batch 10000
Key Features
- Streaming Input: Handles huge files without loading all data into memory
- Batch Inserts: Fast, efficient, and safe for large-scale imports
- Flexible Config: Choose file, batch size, and MongoDB URI
Real-World Tips
- Use indexes on key fields (e.g., productId, category) for faster queries later
- Monitor MongoDB memory and disk usage during import
- Use gzip for faster disk I/O and network transfer
Next up: Migrating Products to Elasticsearch: Powering Faceted Search at Scale
In the next article, we'll move our data from MongoDB to Elasticsearch, unlocking powerful faceted search and analytics!