MongoDB Aggregation Framework

What is the Aggregation Framework?

The MongoDB Aggregation Framework is a powerful tool for data processing and analysis. It allows you to process documents through a pipeline of stages, where each stage transforms the documents as they pass through.

MongoDB Aggregation Pipeline

Aggregation Pipeline Flow

Pipeline Stages

$match Stage

Filters documents to pass only those that match the specified condition.

db.orders.aggregate([
    { $match: { status: "completed" } }
])

$group Stage

Groups documents by specified expression and applies accumulator expressions.

db.orders.aggregate([
    { $group: {
        _id: "$customer_id",
        total_amount: { $sum: "$amount" },
        order_count: { $sum: 1 }
    }}
])

$sort Stage

Sorts all input documents and returns them in sorted order.

db.orders.aggregate([
    { $sort: { total_amount: -1 } }
])

$project Stage

Reshapes each document by adding new fields or removing existing ones.

db.orders.aggregate([
    { $project: {
        customer_name: 1,
        order_date: 1,
        total: { $multiply: ["$price", "$quantity"] }
    }}
])

Advanced Aggregation

Complex Pipeline Example

Combining multiple stages for complex data analysis:

db.orders.aggregate([
    // Match completed orders
    { $match: { status: "completed" } },
    
    // Group by customer and calculate metrics
    { $group: {
        _id: "$customer_id",
        total_spent: { $sum: "$amount" },
        order_count: { $sum: 1 },
        avg_order_value: { $avg: "$amount" }
    }},
    
    // Sort by total spent
    { $sort: { total_spent: -1 } },
    
    // Limit to top 10 customers
    { $limit: 10 }
])

Array Operations

Working with arrays in aggregation:

db.products.aggregate([
    { $unwind: "$categories" },
    { $group: {
        _id: "$categories",
        product_count: { $sum: 1 },
        avg_price: { $avg: "$price" }
    }}
])

Performance Optimization

  • Use indexes to support aggregation operations
  • Limit the number of documents processed early in the pipeline
  • Use $match stages early to reduce the number of documents
  • Consider memory usage when using $group and $sort stages

Next Steps

Now that you understand the Aggregation Framework, you can explore: