Written by nikhil Dec 15, 2024

Mastering MongoDB Sharding: Ultimate Guide to Setting Up a Scalable Cluster

A comprehensive guide to setting up and managing a MongoDB sharded cluster for scalable applications using docker.

mongo sharding database distributed availability docker

Introduction

MongoDB sharding enables horizontal scaling by distributing data across multiple servers. This guide provides a step-by-step process to configure and manage a MongoDB sharded cluster using Docker.

Step 1: Create a Docker Network

All containers must be on the same network for communication. Create a Docker network using:

docker network create mongo-shard-cluster

MongoDB Sharding Architecture: Building Blocks

Before diving into the setup process, let’s explore the core components of a sharded MongoDB architecture:

Routers (mongos):
Routers act as query routers. They are responsible for interfacing with applications, routing queries to the appropriate shards, and consolidating results. Routers maintain no data but are critical for query distribution.
Config Servers:
Config servers store metadata and configuration details about the cluster, such as shard locations and chunk distribution. They must run as a replica set to ensure high availability. A replica set consists of multiple nodes (typically three), providing redundancy and failover. If the primary node goes down, one of the secondary nodes automatically becomes the primary to ensure uninterrupted operation.
Shards:
Shards are the actual data storage units in a sharded cluster. Each shard is itself a replica set consisting of three nodes: one primary and two secondary. The primary node handles all write operations and replicates data to the secondary nodes. This setup ensures data redundancy and fault tolerance.
Data is distributed across shards based on the shard key, and queries are directed to the relevant shard(s) by the routers.

These components work together to ensure scalability, fault tolerance, and efficient query execution in a distributed setup.

Step 2: Setup Config Servers

Config servers store metadata about the cluster using the following script

# Start Config Servers
docker run -d --net mongo-shard-cluster --name config-svr-1 -p 27101:27017 mongo:4.4 mongod --port 27017 --configsvr --replSet config-svr-replica-set
docker run -d --net mongo-shard-cluster --name config-svr-2 -p 27102:27017 mongo:4.4 mongod --port 27017 --configsvr --replSet config-svr-replica-set
docker run -d --net mongo-shard-cluster --name config-svr-3 -p 27103:27017 mongo:4.4 mongod --port 27017 --configsvr --replSet config-svr-replica-set

# Wait for containers to start
sleep 5

# Initiate Config Server Replica Set
docker exec -it config-svr-1 mongo --eval '
  rs.initiate({
    _id: "config-svr-replica-set",
    configsvr: true,
    members: [
      { _id: 0, host: "config-svr-1:27017" },
      { _id: 1, host: "config-svr-2:27017" },
      { _id: 2, host: "config-svr-3:27017" }
    ]
  });'

Step 3: Setup Shards

# Function to create a shard replica set
create_shard_replica_set() {
  local shard_id=$1

  # Start shard nodes
  docker run -d --net mongo-shard-cluster --name shard-${shard_id}-node-a -p $((27110 + shard_id * 10 + 1)):27017 mongo:4.4 mongod --port 27017 --shardsvr --replSet shard-${shard_id}-replica-set
  docker run -d --net mongo-shard-cluster --name shard-${shard_id}-node-b -p $((27110 + shard_id * 10 + 2)):27017 mongo:4.4 mongod --port 27017 --shardsvr --replSet shard-${shard_id}-replica-set
  docker run -d --net mongo-shard-cluster --name shard-${shard_id}-node-c -p $((27110 + shard_id * 10 + 3)):27017 mongo:4.4 mongod --port 27017 --shardsvr --replSet shard-${shard_id}-replica-set

  # Wait for containers to start
  sleep 5

  # Initiate Shard Replica Set
  docker exec -it shard-${shard_id}-node-a mongo --eval "
    rs.initiate({
      _id: 'shard-${shard_id}-replica-set',
      members: [
        { _id: 0, host: 'shard-${shard_id}-node-a:27017' },
        { _id: 1, host: 'shard-${shard_id}-node-b:27017' },
        { _id: 2, host: 'shard-${shard_id}-node-c:27017' }
      ]
    });"
}

# Create three shard replica sets
create_shard_replica_set 1
create_shard_replica_set 2
create_shard_replica_set 3

The above script creates 3 shard-replica set each with 3 nodes. One of nodes acts as primary and the other 2 act as secondary. By default, primary node is used for write operations.

Step 4: Setup Routers

Routers serve as the interface between the application and the sharded cluster.

# Start Router Instances
docker run -d --net mongo-shard-cluster --name router-1 -p 27151:27017 mongo:4.4 mongos \
--port 27017 \
--configdb config-svr-replica-set/config-svr-1:27017,config-svr-2:27017,config-svr-3:27017 \
--bind_ip_all

docker run -d --net mongo-shard-cluster --name router-2 -p 27152:27017 mongo:4.4 mongos \
--port 27017 \
--configdb config-svr-replica-set/config-svr-1:27017,config-svr-2:27017,config-svr-3:27017 \
--bind_ip_all

docker run -d --net mongo-shard-cluster --name router-3 -p 27153:27017 mongo:4.4 mongos \
--port 27017 \
--configdb config-svr-replica-set/config-svr-1:27017,config-svr-2:27017,config-svr-3:27017 \
--bind_ip_all

# Wait for routers to start
sleep 5

Step 5: Connect Shards to the Cluster

Connect each shard to the routers using the following script:

# Add shards to the cluster via Router-1
docker exec -it router-1 mongo --eval "sh.addShard('shard-1-replica-set/shard-1-node-a:27017');"
docker exec -it router-1 mongo --eval "sh.addShard('shard-2-replica-set/shard-2-node-a:27017');"
docker exec -it router-1 mongo --eval "sh.addShard('shard-3-replica-set/shard-3-node-a:27017');"

# Initialize the bank database
docker exec -it router-1 mongo --eval "use bank;"

# Enable sharding for the 'bank' database
docker exec -it router-1 mongo --eval "sh.enableSharding('bank');"

# Create collections
docker exec -it router-1 mongo --eval "use bank; db.createCollection('users');"
docker exec -it router-1 mongo --eval "use bank; db.createCollection('transactions');"

# Enable balancer and start it
docker exec -it router-1 mongo --eval "sh.setBalancerState(true);"
docker exec -it router-1 mongo --eval "sh.startBalancer();"

# Shard the 'users' collection on the hashed 'user_id' field
docker exec -it router-1 mongo --eval "sh.shardCollection('bank.users', { user_id: 'hashed' });"

# Shard the 'transactions' collection on the hashed 'transaction_id' field
docker exec -it router-1 mongo --eval "sh.shardCollection('bank.transactions', { _id: 'hashed' });"

The script also setups a sample database “bank” with “users” and “transactions” collection.

In the script, we specify MongoDB to shard the collections on a specific key: “user_id” for users collection and “transaction_id” for transactions collection.

Step 6: Verify Cluster Status

Check the status of your cluster:

Connect to the terminal of any of the routers using:

docker exec -it router-1 mongo

sh.status();

Conclusion

You now have a fully functional MongoDB sharded cluster! Monitor the cluster and the balancer to ensure optimal performance.

Please take a look at the code repository that fully sets up MongoDB Sharded Cluster. Also has the code to start a simple banking application.

https://github.com/NikhilBayyavarapu/CSE512_Final_Project

Additional Configuration Resources

https://www.MongoDB.com/docs/manual/core/indexes/index-types/index-hashed/

https://www.MongoDB.com/docs/manual/core/sharded-cluster-query-router/

https://www.mongodb.com/docs/manual/reference/write-concern/

https://www.mongodb.com/docs/manual/reference/read-concern/