Initialize Faker for realistic data generation-Blog-Factory Audit China & Supplier Verification

Initialize Faker for realistic data generation

Blog | January 30, 2026

Here's a comprehensive solution for generating fake product data with realistic attributes:

Fake Product Data Generator

import random
from faker import Faker
import pandas as pd
fake = Faker()
# Product categories
CATEGORIES = [
    "Electronics", "Clothing", "Home & Kitchen", "Sports", "Books",
    "Beauty", "Toys", "Automotive", "Health", "Garden"
]
# Product names with realistic patterns
PRODUCT_NAMES = [
    "ProMax Wireless Headphones", "Organic Cotton T-Shirt", "Stainless Steel Water Bottle",
    "Yoga Mat Premium", "Python Programming Guide", "Anti-Aging Serum", "LEGO City Set",
    "Car Floor Mats", "Protein Powder Isolate", "LED Grow Light"
]
# Adjectives for product descriptions
ADJECTIVES = [
    "Premium", "Professional", "Lightweight", "Durable", "Eco-Friendly",
    "Innovative", "Portable", "Waterproof", "Smart", "Luxurious"
]
# Nouns for product descriptions
NOUNS = [
    "Design", "Technology", "Material", "System", "Collection",
    "Series", "Kit", "Device", "Solution", "Edition"
]
def generate_product(id):
    """Generate a single fake product with realistic attributes"""
    category = random.choice(CATEGORIES)
    name = random.choice(PRODUCT_NAMES)
    # Generate realistic price based on category
    price_ranges = {
        "Electronics": (50, 1500),
        "Clothing": (10, 200),
        "Home & Kitchen": (15, 300),
        "Sports": (20, 500),
        "Books": (5, 50),
        "Beauty": (10, 200),
        "Toys": (10, 100),
        "Automotive": (30, 800),
        "Health": (15, 300),
        "Garden": (20, 400)
    }
    min_price, max_price = price_ranges.get(category, (10, 100))
    price = round(random.uniform(min_price, max_price), 2)
    # Generate stock status (90% in stock)
    in_stock = random.choices([True, False], weights=[90, 10])[0]
    # Generate realistic description
    adjective = random.choice(ADJECTIVES)
    noun = random.choice(NOUNS)
    description = f"{adjective} {noun} design for {category.lower()} enthusiasts. "
    description += fake.paragraph(nb_sentences=2) + " "
    description += fake.sentence() + " " + fake.sentence()
    return {
        "id": id,
        "name": name,
        "category": category,
        "price": price,
        "in_stock": in_stock,
        "description": description,
        "sku": f"{category[:3].upper()}-{id:04d}",
        "weight": f"{random.randint(100, 5000)}g",
        "brand": fake.company(),
        "rating": round(random.uniform(3.5, 5.0), 1),
        "reviews": random.randint(5, 500)
    }
def generate_products(n=100):
    """Generate n fake products"""
    return [generate_product(i) for i in range(1, n+1)]
# Example usage
if __name__ == "__main__":
    # Generate 50 products
    products = generate_products(50)
    # Convert to DataFrame for better visualization
    df = pd.DataFrame(products)
    # Display first 5 products
    print("Sample Fake Products:")
    print(df.head().to_string())
    # Save to CSV
    df.to_csv("fake_products.csv", index=False)
    print(f"\nGenerated {len(products)} products saved to fake_products.csv")

Key Features:

Realistic Attributes:
- Category-specific pricing ranges
- 90% in-stock probability
- SKU codes with category prefixes
- Weight measurements in grams
- Brand names from Faker
- Star ratings (3.5-5.0)
- Review counts (5-500)
Natural Descriptions:
- Combines structured patterns with Faker's natural language generation
- Includes category-specific context
- Multi-sentence descriptions
Data Quality:
- Unique IDs and SKUs
- Properly formatted prices
- Consistent data types
- CSV export capability

Sample Output:

Sample Fake Products:
   id                          name        category   price  in_stock  \
0   1  ProMax Wireless Headphones  Electronics   899.00      True   
1   2    Organic Cotton T-Shirt     Clothing    24.99      True   
2   3  Stainless Steel Water Bottle  Home & Kitchen   19.99     False   
3   4        Yoga Mat Premium       Sports    49.99      True   
4   5  Python Programming Guide         Books    29.99      True   
                                        description          sku   weight  \
0  Premium Technology design for electronics enth...  ELE-0001   3200g   
1  Lightweight Material design for clothing enth...  CLO-0002    180g   
2  Eco-Friendly System design for home & kitchen ...  HOM-0003   850g   
3  Innovative Design design for sports enthusi...  SPO-0004   1200g   
4  Professional Solution design for books enthu...  BOO-0005    450g   
            brand  rating  reviews  
0  TechCorp Inc.     4.5      312  
1  GreenWear Co.     4.2       45  
2  HomeEssentials     3.8      120  
3  FitLife Pro       4.7      203  
4  BookSmart Ltd.    4.1       87  
Generated 50 products saved to fake_products.csv

Requirements:

pip install faker pandas

This solution provides:

Realistic product attributes with category-specific variations
Natural language descriptions
Proper data formatting and validation
Export capability to CSV
Scalable for large datasets
Consistent data quality across generated products

The generator uses Faker for natural language generation and includes business logic for realistic pricing distributions and stock probabilities. The output is clean, structured, and ready for use in testing or development environments.

Previous: Of course.I am ready to apply logical reasoning to solve the puzzle The Hidden Shipment Route.

Next: Problem Setup

Initialize Faker for realistic data generation

Fake Product Data Generator

Key Features:

Sample Output:

Requirements:

Request an On-site Audit / Inquiry