# MongoDB and Docker – How to create and configure a replica set

Creating a replica set in MongoDB requires several steps that need to be performed accurately. Taking advantage of Docker's capabilities, you can automate the whole process. We are going to find out step by step how to configure the various components of our project.

## Share

Installing and configuring a database often takes hours of work. Finding the right configuration so that other services are not compromised and at the same time ensuring a high level of efficiency and security is not always an easy task. When it comes to configuring a MongoDB replica set, the job can be even harder and full of pitfalls. Moreover, if there is a failure or even a blackout of the servers, the automatic restart of the databases may reserve some unpleasant surprises.

As we saw in the Introduction to Docker and Docker compose – how to orchestrate different containers articles, using Docker and Docker Compose allows us to create a highly reliable virtual environment for both development and production. In this article we’ll look at how you can create a MongoDB installation configured in replica set by leveraging Docker Compose.

## Workspace configuration

Before starting, you should verify that you have all the necessary software installed. In particular, you must have both Docker and Docker Compose installed on your machine. You can find all the information on how to install and properly configure your PC in the article Introduction to Docker. It is not necessary to have MongoDB installed. MongoDB instances will, in fact, be created within Docker.

Let’s go ahead and create a folder for our tutorial that we will call mongo_example. Inside this folder we create a file called docker-compose.yml that will have the following content.

version: '3'

services:
mongodb1:
image: mongo:4
restart: always
container_name: mongodb1
volumes:
- mongodata1:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all"]

mongodb2:
image: mongo:4
restart: always
container_name: mongodb2
volumes:
- mongodata2:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all"]

mongodb3:
image: mongo:4
restart: always
container_name: mongodb3
volumes:
- mongodata3:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all" ]

volumes:
mongodata1:
mongodata2:
mongodata3: 

In the docker file we have thus defined 3 services each based on the latest version of the MongoDB image. Each service has a name and a dedicated volume for saving data. To make the various services talk to each other we enabled the default port used by MongoDB using the expose option. It is also possible to map the port of each container to a host port using the ports option.

### Attention

The ports mapped to the host by the various services must all be different. Moreover, if on the machine there is already an installation of MongoDB that uses the default port (27017), you must carefully choose the mapping. If there is a conflict, the service will not be started.

To follow the various instances of MongoDB in replica set mode we use the entrypoint option. In this way we specify the command that should be executed every time the container is started. In particular, we added options to define the name of the replica set, rsmongo, and the ability to accept requests from any ip address (–bind_ip_all option).

At this point our environment is almost ready. The only thing missing is the configuration of the replica set. To do this, however, we need to run the services. To do this we simply need to run the following command.

$docker-compose up  You can also run it in detach mode using the -d option. The advice to understand how a replica set works behind the scenes is to read the long output that each container will print. If instead the services have been launched in detach mode you can verify that they are active using the command. $ docker ps 

In this case you should see output similar to this.

PORTS               NAMES
0e5fa683450d        mongo:4             "/usr/bin/mongod --r…"   8 seconds ago       Up 3 seconds        27017/tcp           mongodb3
8a2568914450        mongo:4             "/usr/bin/mongod --r…"   8 seconds ago       Up 3 seconds        27017/tcp           mongodb2
7ad6132bb37d        mongo:4             "/usr/bin/mongod --r…"   8 seconds ago       Up 4 seconds        27017/tcp           mongodb1 

## Replica set configuration

The MongoDB instances are now working and configured to belong to the rsmongo replica set. However, the replica set is not yet configured and therefore no node has been elected primary. To configure the replica set we need to open the shell of a MongoDB instance. In order to do this we can use the following command.

\$ docker-compose exec mongodb1 mongo 

This will open the mongo shell of the mongodb1 instance. The choice of the instance is arbitrary. Inside the shell we will go to provide the configuration of the replica set. Since the MongoDB shell is based on javascript, we can define a configuration variable that will then be passed to the rs.initiate() command. So let’s go ahead and define an rsconf variable as follows.

rsconf = {
_id : "rsmongo",
members: [
{
"_id": 0,
"host": "mongodb1:27017",
"priority": 4
},
{
"_id": 1,
"host": "mongodb2:27017",
"priority": 2
},
{
"_id": 2,
"host": "mongodb3:27017",
"priority": 1
}
]
}


As you can see the _id of the document is the name of the replica set, rsmongo, while the members vector contains the description of each node that will belong to the replica set. Each node, represented by an embedded document, will be characterized by an _id equal to a number and by the host. For the host, being inside a Docker service, we use the name of the container followed by the port on which the service is listening. This is because it is not possible a priori to know the IP address assigned to each container. It will be the Docker’s task to route the traffic appropriately. We have also inserted a priority property for each member of the replica set. Although it is not necessary, this information will allow us to influence the election of the primary. In fact, having given higher priority to the mongodb1 node, we are sure that unless there are problems with that service, it will always be elected as primary.

To initialize the replica set you only need to pass this variable to the rs.initiate() command as shown below.

> rs.initiate(rsconf); 

At this point we will see that the command prompt changes by entering the name of the replica set followed by the node type (PRIMARY or SECONDARY). If we are connected to the highest priority node, we will see that it will be labeled SECONDARY at first. This should not surprise us. In fact, it takes a few seconds before the primary election is made. Simply by pressing the “enter” key after a while we will see that this node has become PRIMARY.

We can use the following command to verify the configuration of the replica set we just created.

rsmongo:PRIMARY> rs.conf() 

Thus, the output we would get would be similar to the one below.

{"_id" : "rsmongo",
"version" : 1,
"term" : 1,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 0,
"host" : "mongodb1:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 4,
"tags" : {

},
"slaveDelay" : NumberLong(0),
},
{
"_id" : 1,
"host" : "mongodb2:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {

},
"slaveDelay" : NumberLong(0),
},
{
"_id" : 2,
"host" : "mongodb3:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {

},
"slaveDelay" : NumberLong(0),
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {

},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("60180880996f5407158e79e8")
}
}


As you can notice MongoDB will report all the information related to the replica set configuration. Members have some additional options than those provided through the configuration defined above. These values are the default ones that can be changed during configuration. The same applies to all parameters in the settings attribute.

If, on the other hand, we want to see the status of the replica set, that is, the information contained in the heartbeat packets sent by the other members of the replica set and received by a node, we can type the following command.

rsmongo:PRIMARY> rs.conf() 

The output will show a long list of information. To understand the meaning of each item we refer you to the official documentation.

Now the replica set is working and ready to be used. Remember, however, that all write operations must be performed on the primary, while to read from a secondary you must enable readings from the secondary or type the command

rsmongodb:SECONDARY> rs.secondaryOk() 

## Insight: automating replica set configuration

As we saw earlier, it is possible to create a MongoDB replica set using Docker Compose. However, in the example shown, it was necessary to connect to a MongoDB instance and execute the commands to initialize the replica set.

This manual procedure, which must be done when creating the services, reduces the advantages of having a Docker-based architecture. In fact, every time the project will have to be installed on a machine we will have to repeat it introducing possible errors. How is it possible to automate this aspect as well? Let’s see it together!

First we need to create a new service that will have the task of configuring the replica set. So we create a folder called mongo-setup and inside it we define a Docker file.

The Dockerfile will rely on the mongo image to have the client with which to connect to the other instances of the replica set. It will also copy inside the container the file with MongoDB shell commands for configuring the replica set (called mongo-setup.js) and a bash script (mongo-setup.sh) to forward the commands to a mongo instance.

Since the configuration command must be executed when at least one MongoDB instance is ready, we will use the wait-for-it script. There are other tools to synchronize the execution of the various containers. You can find some suggestions in the docker documentation.

The command launched at the start of the container will then be wait-for-it with parameter a MongoDB container and finally the bash script for initialization. In the following there is the Dockerfile.

FROM mongo:4
RUN mkdir /config
WORKDIR /config
COPY wait-for-it.sh .
COPY mongo-setup.js .
COPY mongo-setup.sh .
RUN chmod +x /config/wait-for-it.sh
RUN chmod +x /config/mongo-setup.sh
CMD [ "bash", "-c", "/config/wait-for-it.sh mongodb1:27017 -- /config/mongo-setup.sh"]



The mongo-setup.js configuration file will contain the rsconf variable seen earlier as well as the rs.initiate(rsconf) statement.

Instead, the bash script will check if the replica set has already been initialized by checking for the existence of an appropriate file. If initialization is required it will make the connection to the MongoDB instance by passing the mongo-setup.js script. It finally will create the file to indicate that the initialization has taken place. The full code is shown below.

#!/usr/bin/env bash

if [ ! -f /data/mongo-init.flag ]; then
echo "Init replicaset"
mongo mongodb://mongodb1:27017 mongo-setup.js
touch /data/mongo-init.flag
else
fi


Last but not least, the docker-compose file. Compared to the one seen above, it is enough to add the new service and the associated volume to keep track of the initialization state of the replica set. The content of the file is listed in the following.

services:
mongodb1:
image: mongo:4
restart: always
container_name: mongodb1
volumes:
- mongodata1:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all", "--wiredTigerCacheSizeGB", "1"]

mongodb2:
image: mongo:4
restart: always
container_name: mongodb2
volumes:
- mongodata2:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all", "--wiredTigerCacheSizeGB", "1"]

mongodb3:
image: mongo:4
restart: always
container_name: mongodb3
volumes:
- mongodata3:/data/db
expose:
- "27017"
entrypoint: [ "/usr/bin/mongod", "--replSet", "rsmongo", "--bind_ip_all", "--wiredTigerCacheSizeGB", "1" ]

mongosetup:
image: "mongo-setup"
build: "./mongo-setup"
container_name: "mongosetup"
depends_on:
- mongodb1
volumes:
- mongostatus:/data/

volumes:
mongodata1:
mongodata2:
mongodata3:
mongostatus:


The whole project is available on github.

## More To Explore

Python language

### Plotly Go: advanced visualization in Python

Visualizing data is critical to better understand the data and analysis performed. There are several tools, free and paid, that allow you to create fantastic dashboards. However, it is possible to write a few riches in Python to get great results and be more flexible depending on the project of interest. Let’s find out how to create interactive Scatter Bubble charts with Plotly Go on a real project.

Python language

### Clustering: a real project to explore data

Clustering is a very powerful tool for grouping data. There are many algorithms that can be applied, so the choice is always difficult. In addition, all clustering algorithms require parameters to work. By means of a real case study, applied to real estate data, we will combine PCA, hierarchical clustering and K-means to provide optimal clustering solutions.

### 7 Responses

1. Cole says:

This does not work, i am unable to connect, always getting:

“error”:”NotYetInitialized: Cannot use non-local read concern until replica set is finished initializing.”

1. The replica set takes a while (at least 30 seconds on average) first time to be initialized. If you have troubles you can remove the volumes and retry. Do not stop the container first time after the replica set is initialized.

2. Andrew says:

I can see it running in docker, but I can’t connect to it externally. Not sure why.

1. To connect to the replica set you need to modify the docker-compose file so that the ports of mongodb instances are mapped to the physical ports of yuor machine (it may also be sufficient to map only the port of the primary instance). To do this, remove the “expose” parameter in the docker-compose file for teach instance and add “ports” specifying which port you want to use (e.g. ports: – “27017:27017”). Otherwise you can connect using the command docker-compose exec [mongodbistance] mongo, replacing mongodbistance with the instance you want to connect to.