MongoDB Sharding Interview Questions

Introduction

This guide covers essential MongoDB sharding concepts commonly asked in technical interviews. Each question includes detailed answers and practical examples.

Medium

1. What is MongoDB sharding and why is it important?

MongoDB sharding is a method for distributing data across multiple machines to support deployments with very large data sets and high throughput operations. Key benefits include:

  • Horizontal Scaling
  • Improved Query Performance
  • Increased Storage Capacity
  • Better Resource Utilization
  • High Availability
Sharding Components:
  • Shard Servers
  • Config Servers
  • Mongos Router
  • Shard Key
  • Chunks
Hard

2. How do you set up and configure sharding?

Sharding setup and configuration:

1. Basic Configuration
# Start config servers
mongod --configsvr --dbpath /data/configdb --port 27019

# Start shard servers
mongod --shardsvr --dbpath /data/shard1 --port 27018
mongod --shardsvr --dbpath /data/shard2 --port 27017

# Start mongos router
mongos --configdb configReplSet/localhost:27019

# Add shards to cluster
sh.addShard("localhost:27018")
sh.addShard("localhost:27017")

# Enable sharding for database
sh.enableSharding("myapp")

# Shard collection
sh.shardCollection("myapp.users", { "userId": "hashed" })
2. Advanced Configuration
# Configure shard zones
sh.addShardToZone("shard0", "zone1")
sh.addShardToZone("shard1", "zone2")

# Define zone ranges
sh.updateZoneKeyRange(
    "myapp.users",
    { "userId": MinKey },
    { "userId": MaxKey },
    "zone1"
)

# Configure balancer
sh.setBalancerState(false)  // Disable balancer
sh.startBalancer()  // Start balancer
sh.stopBalancer()   // Stop balancer

# Set chunk size
use config
db.settings.update(
    { _id: "chunksize" },
    { $set: { value: 64 } },
    { upsert: true }
)
Hard

3. How do you choose and implement a shard key?

Shard key selection and implementation:

1. Shard Key Selection
# Analyze data distribution
db.users.aggregate([
    { $group: { _id: "$userId", count: { $sum: 1 } } },
    { $sort: { count: -1 } }
])

# Check cardinality
db.users.distinct("userId").length

# Monitor chunk distribution
sh.status()

# Check chunk splits
db.chunks.find({ ns: "myapp.users" })
2. Shard Key Implementation
# Compound shard key
sh.shardCollection("myapp.orders", 
    { "customerId": 1, "orderDate": 1 })

# Hashed shard key
sh.shardCollection("myapp.users", 
    { "userId": "hashed" })

# Range-based sharding
sh.shardCollection("myapp.products", 
    { "category": 1, "price": 1 })

# Update shard key
db.users.updateMany(
    { shardKey: { $exists: false } },
    { $set: { shardKey: "$userId" } }
)
Hard

4. How do you manage and monitor a sharded cluster?

Sharded cluster management and monitoring:

1. Cluster Management
# Check cluster status
sh.status()

# Monitor balancer
sh.getBalancerState()
sh.isBalancerRunning()

# Check chunk distribution
db.chunks.find({ ns: "myapp.users" })

# Move chunks
sh.moveChunk("myapp.users",
    { userId: 1000 },
    "shard1")

# Split chunks
sh.splitAt("myapp.users",
    { userId: 5000 })
2. Performance Monitoring
# Check operation distribution
db.currentOp().inprog.forEach(function(op) {
    if (op.shard) {
        printjson(op);
    }
})

# Monitor chunk splits
db.changelog.find({
    what: "split",
    ns: "myapp.users"
})

# Check balancer rounds
db.changelog.find({
    what: "balancer.round"
})

# Monitor shard utilization
db.adminCommand({ serverStatus: 1 })
Hard

5. What are the sharding best practices?

Follow these sharding best practices:

1. Design Best Practices
# Optimal shard key selection
function analyzeShardKey(coll, key) {
    // Check cardinality
    const cardinality = db[coll].distinct(key).length;
    
    // Check distribution
    const distribution = db[coll].aggregate([
        { $group: { _id: "$" + key, count: { $sum: 1 } } },
        { $sort: { count: -1 } }
    ]).toArray();
    
    // Check query patterns
    const queries = db.system.profile.find({
        ns: db.getName() + "." + coll
    }).toArray();
    
    return {
        cardinality,
        distribution,
        queries
    };
}

# Implement shard key
db.users.createIndex({ userId: 1 })
sh.shardCollection("myapp.users", { userId: 1 })
2. Operational Best Practices
# Regular maintenance
function maintainShardedCluster() {
    // Check cluster health
    const status = sh.status();
    
    // Monitor chunk distribution
    const chunks = db.chunks.find({
        ns: "myapp.users"
    }).toArray();
    
    // Check balancer status
    const balancerState = sh.getBalancerState();
    
    // Monitor performance
    const performance = db.adminCommand({
        serverStatus: 1
    });
    
    // Take action based on metrics
    if (chunks.length > 1000) {
        sh.splitAt("myapp.users", { userId: 5000 });
    }
    
    if (!balancerState) {
        sh.startBalancer();
    }
}

Next Steps

Continue your MongoDB interview preparation with: