Learning MongoDB

0% completed

Previous
Next
MongoDB Sharding

Sharding is a method used to distribute data across multiple servers to ensure scalability and performance. It allows MongoDB to handle large datasets and high throughput operations by dividing the data into smaller, more manageable pieces called shards.

Key Concepts

  1. Shard:

    • Each shard contains a subset of the data.
    • Shards can be deployed on separate servers or clusters to balance the load and ensure high performance.
  2. Shard Key:

    • A field or a set of fields in a document that determines how the data is distributed across the shards.
    • The choice of shard key is crucial for efficient sharding and balanced data distribution.
  3. Config Server:

    • Stores metadata and configuration settings for the sharded cluster.
    • Contains the mapping of which data ranges are stored on which shards.
  4. Query Router (mongos):

    • Acts as an interface between the application and the sharded cluster.
    • Routes queries to the appropriate shards based on the shard key.

How Sharding Works

  1. Data Distribution:

    • Data is divided into chunks based on the shard key.
    • Chunks are distributed across multiple shards to balance the load and ensure that no single shard becomes a bottleneck.
  2. Query Routing:

    • The mongos instance determines which shard or shards contain the data needed for a query.
    • Routes the query to the appropriate shards and combines the results before returning them to the client.
  3. Rebalancing:

    • MongoDB automatically balances the data across shards as the dataset grows.
    • Moves chunks between shards to maintain an even distribution and prevent any shard from becoming overloaded.

Benefits of Sharding

  1. Scalability:

    • Allows the database to scale horizontally by adding more shards to handle increased data and traffic.
    • Each shard can handle a portion of the total data and queries.
  2. High Throughput:

    • Distributes the load across multiple shards, improving performance and reducing latency.
    • Enables the system to handle more read and write operations simultaneously.
  3. Fault Tolerance:

    • Even if one shard goes down, the data on other shards remains available.
    • Increases the overall availability and reliability of the database.

Setting Up a Sharded Cluster

  1. Start Config Servers:

    • Launch multiple config servers to store the cluster metadata.
    mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/config1 mongod --configsvr --replSet configReplSet --port 27020 --dbpath /data/config2 mongod --configsvr --replSet configReplSet --port 27021 --dbpath /data/config3
  2. Initiate Config Server Replica Set:

    mongo --port 27019 rs.initiate() rs.add("localhost:27020") rs.add("localhost:27021")
  3. Start Shard Servers:

    • Launch multiple shard servers to store the data.
    mongod --shardsvr --replSet shardReplSet1 --port 27022 --dbpath /data/shard1 mongod --shardsvr --replSet shardReplSet1 --port 27023 --dbpath /data/shard2
  4. Initiate Shard Replica Set:

    mongo --port 27022 rs.initiate() rs.add("localhost:27023")
  5. Start Mongos Instances:

    • Launch mongos instances to route queries.
    mongos --configdb configReplSet/localhost:27019,localhost:27020,localhost:27021 --port 27017
  6. Add Shards to the Cluster:

    mongo --port 27017 sh.addShard("shardReplSet1/localhost:27022,localhost:27023")
  7. Enable Sharding for a Database:

    sh.enableSharding("myDatabase")
  8. Shard a Collection:

    • Choose a shard key and shard the collection.
    sh.shardCollection("myDatabase.myCollection", {shardKey: 1})

Example Configuration

  1. Start Config Servers:

    mongod --configsvr --replSet configReplSet --port 27019 --dbpath /data/config1 mongod --configsvr --replSet configReplSet --port 27020 --dbpath /data/config2 mongod --configsvr --replSet configReplSet --port 27021 --dbpath /data/config3
  2. Initiate Config Server Replica Set:

    mongo --port 27019 rs.initiate() rs.add("localhost:27020") rs.add("localhost:27021")
  3. Start Shard Servers:

    mongod --shardsvr --replSet shardReplSet1 --port 27022 --dbpath /data/shard1 mongod --shardsvr --replSet shardReplSet1 --port 27023 --dbpath /data/shard2
  4. Initiate Shard Replica Set:

    mongo --port 27022 rs.initiate() rs.add("localhost:27023")
  5. Start Mongos Instances:

    mongos --configdb configReplSet/localhost:27019,localhost:27020,localhost:27021 --port 27017
  6. Add Shards to the Cluster:

    mongo --port 27017 sh.addShard("shardReplSet1/localhost:27022,localhost:27023")
  7. Enable Sharding for a Database:

    sh.enableSharding("myDatabase")
  8. Shard a Collection:

    sh.shardCollection("myDatabase.myCollection", {shardKey: 1})

Sharding in MongoDB is essential for handling large datasets and high traffic loads. By distributing data across multiple shards, you can achieve horizontal scalability, high throughput, and fault tolerance. Setting up a sharded cluster involves configuring config servers, shard servers, and mongos instances, along with selecting appropriate shard keys to ensure efficient data distribution.

Next, we can cover Change Streams, but let me know if you have any questions about sharding first!

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next