Product Data Generation: Simulating a Real Marketplace

This is Part 2 of the "Building a Scalable, Faceted Online Marketplace" series. Read the Introduction here.


Why Simulate Product Data?

Before you can build a search engine or a beautiful UI, you need data—lots of it. Realistic, large-scale product data is the foundation for testing scalability, search, and filtering. In this article, we'll build tools to generate hundreds of thousands of products, each with dynamic, category-specific attributes (just like Amazon or Meesho).

The Approach

Key Features

Example: Generating 200,000 Computer Products

npm run generate -- --category=Computers --count=200000 --out=./out --shard-size=50000 --format=ndjson --gzip --csv

This command will create four 50k-product NDJSON files, gzipped and ready for import.

Code Walkthrough

The main script is product-backend/generate_product_stream_per_category.cjs. It uses category definitions and attribute pools from facets.json and facets_values.json.

Real-World Tips

What's Next?

In the next article, we'll cover how to efficiently bulk import this data into MongoDB, handling millions of records with batching and error handling. (Note: We'll use MongoDB for storage, not for faceted search.)


Next up: Bulk Importing to MongoDB: Handling Millions of Products


Continue the series to see how we bring this data to life in a real database, ready for further processing and search indexing!