MongoDB 5: the new features

MongoDB is the most widely used NoSQL database in the world. Its continuous growth is due to the continuous development of new features. Version 5, released at the end of July 2021, introduced some very interesting new features. In this article we will analyze the most relevant and most useful in their daily use.

Share

Share on facebook
Share on linkedin
Share on twitter
Share on email
Reading time: 4 minutes

MongoDB is the world’s most widely used document-based NoSQL database and, in recent years, is becoming a viable alternative to relational databases. Suffice it to say that it is number 1 among NoSQL databases and fifth overall among all databases.

What makes it so special that many companies have decided to focus on it over more traditional relational databases? Its features are the answer. In addition to being schemaless, i.e. the lack of a fixed data schema definition that allows a reduction in software production time, the features released with each release increase its ability to handle increasingly complex data in different application contexts. If you are curious to learn how to efficiently use MongoDB through modeling patterns we recommend the book Design with MongoDB: Best models for applications.

In this article we will analyze the most important new features of the latest version released on July 13, 2021, namely MongoDB 5.

MongoDB 5 introduces time series collections that efficiently store sequences of measurements taken over a period of time. This feature enables the use of MongoDB 5 in the Internet Of Things (IOT) field. Using this type of collection over standard collections improves query efficiency and reduces disk usage for data and secondary indexes.

Time series collections behave like standard collections. Therefore, data insertion and querying is done in the same way as for other collections. Internally, MongoDB treats these types of collections as non-materialized views writable to internal collections that automatically organize time series data in an insertion-optimized storage format.

Queries on time series collections benefit from the optimized internal storage format, returning results faster.

Commands

Creating a Time Series Collection

It is necessary to explicitly define that a collection is used for time series using the db.createCollection() command. You cannot transform an existing collection into this type. Below is an example of creating a collection for time series.

db.createCollection(
    "weather24h",
    {
       timeseries: {
          timeField: "timestamp",
          metaField: "metadata",
          granularity: "hours"
       },
       expireAfterSeconds: 86400
    }
) 

During creation you can specify the following parameters.

ParameterTypeDescription
timeseries.timeFieldstringRequired. The name of the field that contains the date in each document in the time series. Documents in a time series collection must have a valid BSON date as the value for the timeField.
timeseries.metaFieldstringOptional. The name of the field that contains metadata in each time series document. The metadata in the specified field should be data used to label a unique set of documents and should rarely or never change.

The name of the specified field cannot be _id or the same as timeseries.timeField. The field cannot be of type array.
timeseries.granularitystringOptional. Possible values are "seconds", "minutes" and "hours". The default granularity is set to "seconds".

To improve performance you should set a value corresponding to the closest time interval between the measurements you want to store. In case you specify the timeseries.metaField field, you need to consider the time interval between consecutive measurements that have the same unique value for the metaField field, i.e. those that come from the same source. Otherwise, you must consider the time interval between all measurements that will be included in the collection.
expireAfterSecondsnumberOptional. Enable automatic deletion of documents in the collection by specifying the number of seconds after which the documents expire. MongoDB, using a Time To Live (TTL) index type, will automatically delete expired documents.

Insert measurements

Each inserted document must contain only one measure. To insert a single document, you use the db.collection.insertOne() method. Otherwise, you use the insertMany() method as shown below.

db.weather.insertMany([{
   "metadata": [{"sensorId": 5578}, {"type": "temperature"}],
   "timestamp": ISODate("2021-05-18T00:00:00.000Z"),
   "temp": 12
}, {
   "metadata": [{"sensorId": 5578}, {"type": "temperature"}],
   "timestamp": ISODate("2021-05-18T04:00:00.000Z"),
   "temp": 11
}]) 

Query and aggregation pipeline

To query a document from a collection of time series, you run queries with the default syntax. For example, to retrieve documents with a certain timestamp, you perform the following query.

db.weather.findOne({
   "timestamp": {"$date": "2021-05-11T04:00:00.000Z"}
}) 

You can also use the aggregation pipeline to perform more complex queries. For example, if you want to calculate the average temperature measured during each day you would run the following pipeline.

db.weather.aggregate([
   {
      $project: {
         date: {
            $dateToParts: { date: "$timestamp" }
         },
         temp: 1
      }
   },
   {
      $group: {
         _id: {
            date: {
               year: "$date.year",
               month: "$date.month",
               day: "$date.day"
            }
         },
         avgTmp: { $avg: "$temp" }
      }
   }
]) 

The $dateToParts command extracts the values of the various timestamp fields and saves them in the date field as an embedded document. In this way you can then group the measurements by day, month and year. In case you want to filter the results for a specific day you should insert a stage of type $match.

Aggregation pipeline

Several new features have been added to the aggregation pipeline. In addition to an improvement in some of the operators through the use of indexes, the following are the most significant changes from previous versions.

New operators

MongoDB 5 introduces new aggregation pipeline operators shown below.

OperatorDescription
$count$count (aggregation accumulator) provides a count of all documents when used in the existing $group pipeline and in the new $setWindowFields stage of MongoDB 5.0.
$dateAddIncrements a Date object by a specified number of time units.
$dateDiffReturns the difference between two dates.
$dateSubtractDecreases a Date object by a specified number of time units.
$dateTruncTruncate a date.
$getFieldReturns the value of a field specified by a document. You can use $getField to retrieve the value of fields with names that contain dots (.) or begin with a dollar sign ($).
$sampleRateIt is used to probabilistically select documents from a pipeline at a given rate.
$setFieldAdds, updates, or removes a specified field in a document. You can use $setField to add, update, or remove fields with names that contain dots (.) or begin with a dollar sign ($).
$randIt generates a random float value between 0 and 1 each time it is executed. The new $sampleRate operator is based on $rand.

Window operator

MongoDB 5.0 introduces the $setWindowFields stage to perform operations on a specified range, called a window, of documents within a collection. The operation returns results based on the chosen window operator.

For example, you can use the $setWindowFields stage to produce the result of:

  • Difference in sales between two documents in a set.
  • Sales rankings.
  • Cumulative sales totals.
  • Analysis of complex time series information without exporting the data to an external database.

For example, you can calculate the cumulative amount of bake sales for each state with the following command.

db.cakeSales.aggregate( [
   {
      $setWindowFields: {
         partitionBy: "$state",
         sortBy: { orderDate: 1 },
         output: {
            cumulativeQuantityForState: {
               $sum: "$quantity",
               window: {
                  documents: [ "unbounded", "current" ]
               }
            }
         }
      }
   }
] ) 

The partitionBy: “$state” parameter partitions the documents according to the value of the state field. Within each partition the documents are sorted by increasing values of the orderDate field (the oldest orderDate is first). Finally, the stage sets a cumulativeQuantityForState field to calculate the cumulative quantity for each state. The calculation is done using the $sum operator within the document window defined by a lower bound (in this case unbounded) and an upper bound (in the example the current document).

The description of the various parameters of this stage with examples can be found in the official documentation.

New Shell MongoDB: mongosh

As of MongoDB version 5, the mongo shell is deprecated and replaced by mongosh. The new shell offers several advantages over the previous version, including:

  • Improved syntax highlighting.
  • Improved command history.
  • Improved recording.

In the first release mongosh only supports a subset of the mongo shell methods. The full list of currently supported methods are described in detail in the official documentation. Also, to maintain backward compatibility, the methods that mongosh supports use the same syntax as the corresponding methods in the mongo shell.

Conclusions

MongoDB 5 is another step forward of this NoSQL database. With the introduction of time series collections, its application to IOT becomes much easier. There are also other new features/changes besides the ones described above that improve the performance and capabilities of the DBMS. You can find the full list of new features in the Release Notes.

Recommended Readings

More To Explore

Google Cloud platform

BigQuery: WITH clause

Extracting data and analyzing it is a process that requires knowledge of data sources and the ability to write complex queries. BigQuery, Google’s database, makes it easy to access terabytes of data. Query writing, however, requires method. Let’s discover the WITH clause to increase the readability of our queries.

Python language

Jupyter Notebook: user’s guide

The development of data analytics pipelines by Data Scientists requires several skills. Having an easy, intuitive, and interactive development environment is critical. Jupyter Notebook is an open source web application that allows you to create and share interactive textual documents, containing objects such as equations, graphs and executable source code in different languages. Let’s discover its main features.

Leave a Reply

Your email address will not be published. Required fields are marked *

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!