MongoDB Pipeline Interview Questions

Medium

1. What are MongoDB pipeline stages and how do they work?

Pipeline stages are the building blocks of MongoDB aggregation framework:

$match: Filters documents
$group: Groups documents by specified expression
$sort: Sorts documents
$project: Reshapes documents
$limit: Limits number of documents

Pipeline Stage Order:

Stages execute sequentially
Output of one stage becomes input to next
Order affects performance
Can be optimized for efficiency

Hard

2. How do you optimize pipeline performance?

Pipeline optimization involves several strategies:

1. Stage Order Optimization

// Optimized pipeline order
db.orders.aggregate([
    // Early filtering reduces documents
    { $match: { status: "completed" } },
    // Index usage for sorting
    { $sort: { orderDate: -1 } },
    // Group after filtering
    { $group: {
        _id: "$customerId",
        totalOrders: { $sum: 1 },
        totalAmount: { $sum: "$total" }
    }},
    // Final projection
    { $project: {
        customerId: "$_id",
        totalOrders: 1,
        totalAmount: 1,
        _id: 0
    }}
])

2. Memory Optimization

// Using allowDiskUse for large datasets
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: "$customerId",
        orders: { $push: "$$ROOT" }
    }},
    { $sort: { "_id": 1 } }
], { allowDiskUse: true })

// Using $facet for parallel processing
db.orders.aggregate([
    { $facet: {
        "totalOrders": [
            { $match: { status: "completed" } },
            { $count: "count" }
        ],
        "topCustomers": [
            { $match: { status: "completed" } },
            { $group: {
                _id: "$customerId",
                total: { $sum: "$total" }
            }},
            { $sort: { total: -1 } },
            { $limit: 10 }
        ]
    }}
])

Hard

3. What are advanced pipeline operations?

Advanced pipeline operations include complex data transformations:

1. Complex Grouping and Calculations

// Advanced grouping with multiple accumulators
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: {
            year: { $year: "$orderDate" },
            month: { $month: "$orderDate" }
        },
        totalSales: { $sum: "$total" },
        avgOrderValue: { $avg: "$total" },
        minOrder: { $min: "$total" },
        maxOrder: { $max: "$total" },
        orderCount: { $sum: 1 }
    }},
    { $sort: { "_id.year": -1, "_id.month": -1 } }
])

2. Data Transformation

// Complex data transformation
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $lookup: {
        from: "customers",
        localField: "customerId",
        foreignField: "_id",
        as: "customerInfo"
    }},
    { $unwind: "$customerInfo" },
    { $project: {
        orderId: 1,
        total: 1,
        customerName: "$customerInfo.name",
        customerEmail: "$customerInfo.email",
        orderDate: 1,
        items: {
            $map: {
                input: "$items",
                as: "item",
                in: {
                    name: "$$item.name",
                    quantity: "$$item.quantity",
                    price: "$$item.price",
                    subtotal: { $multiply: ["$$item.quantity", "$$item.price"] }
                }
            }
        }
    }}
])

Hard

4. How do you handle large datasets in pipelines?

Handling large datasets requires specific strategies:

Large Dataset Strategies:

Use allowDiskUse option
Implement proper indexing
Optimize stage order
Use $facet for parallel processing
Implement pagination

Implementation Examples

// Pagination with large datasets
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $sort: { orderDate: -1 } },
    { $skip: 1000 },
    { $limit: 100 }
], { allowDiskUse: true })

// Parallel processing with $facet
db.orders.aggregate([
    { $facet: {
        "dailyStats": [
            { $match: { status: "completed" } },
            { $group: {
                _id: { $dateToString: { format: "%Y-%m-%d", date: "$orderDate" } },
                total: { $sum: "$total" },
                count: { $sum: 1 }
            }}
        ],
        "customerStats": [
            { $match: { status: "completed" } },
            { $group: {
                _id: "$customerId",
                total: { $sum: "$total" },
                orders: { $sum: 1 }
            }},
            { $sort: { total: -1 } },
            { $limit: 10 }
        ]
    }}
], { allowDiskUse: true })

Hard

5. What are the best practices for pipeline operations?

Follow these best practices for efficient pipeline operations:

1. Performance Optimization

// Optimize pipeline with proper indexes
db.orders.createIndex({ status: 1, orderDate: -1 })

// Use covered queries when possible
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $project: {
        status: 1,
        orderDate: 1,
        _id: 0
    }}
])

// Implement proper error handling
db.orders.aggregate([
    { $match: { status: "completed" } },
    { $group: {
        _id: "$customerId",
        total: { $sum: { $ifNull: ["$total", 0] } }
    }}
])

2. Pipeline Design

// Modular pipeline design
const matchStage = { $match: { status: "completed" } }
const groupStage = {
    $group: {
        _id: "$customerId",
        total: { $sum: "$total" }
    }
}
const sortStage = { $sort: { total: -1 } }

db.orders.aggregate([
    matchStage,
    groupStage,
    sortStage
])

// Using $expr for complex conditions
db.orders.aggregate([
    { $match: {
        $expr: {
            $and: [
                { $eq: ["$status", "completed"] },
                { $gte: ["$total", 100] }
            ]
        }
    }}
])

MongoDB Pipeline Interview Questions

Introduction

1. What are MongoDB pipeline stages and how do they work?

Pipeline Stage Order:

2. How do you optimize pipeline performance?

1. Stage Order Optimization

2. Memory Optimization

3. What are advanced pipeline operations?

1. Complex Grouping and Calculations

2. Data Transformation

4. How do you handle large datasets in pipelines?

Large Dataset Strategies:

Implementation Examples

5. What are the best practices for pipeline operations?

1. Performance Optimization

2. Pipeline Design

Next Steps

MongoDB Interview Questions