MongoDB 6.0: new features to improve applications

The new version of MongoDB provides new features to both improve the efficiency of some operations and increase developer productivity. Switching, therefore, to MongoDB 6 is an excellent choice!!!

Reading time: 6 minutes

MongoDB version 6 was announced at MongoDB World 2022 in June 2022. Available now for more than 6 months and then stabilized with the new releases, we can say that it is a mature product. But what are the features introduced by the new version? The main goal of MongoDB 6.0 is simplification: instead of forcing you to resort through external software or third-party tools, the new features of MongoDB allow you to develop, iterate, test and release applications faster. The latest version helps developers avoid data silos, confusing architectures, wasted time integrating external technologies, missed SLAs and other opportunities, and the need for custom work (such as data export pipelines). Let’s find out the main new features available.

Enhanced support for time series

Time series are the primary and fundamental source of modern applications in so many areas, from financial services to e-commerce to the Internet Of Things. When collected, processed and analyzed correctly, time series data provide a gold mine of information, from user growth to promising revenue areas, helping you grow your business and improve your application.

First introduced in MongoDB 5.0, time series collections offer a way to manage these workloads without resorting to the addition of niche technology and the resulting complexity. In their development, it was critical to overcome some of the obstacles that characterize time series data, such as high volume, storage and cost considerations, and gaps in data continuity (caused by sensor outages). Their introduction is an alternative to bucket modeling, which you can find in the book “Design with MongoDB,” although they are two different approaches to the same problem.

Dalla loro introduzione, le collezioni di serie temporali sono state continuamente aggiornate e migliorate con una serie di rapidi rilasci. Si è iniziato introducendo lo sharding per le collezioni di serie temporali (5.1), prima di introdurre la compressione colonnare (5.2) per ottimizzare lo spazio di storage e infine passare alla densificazione e al riempimento delle lacune (5.3) per consentire di eseguire alcune analisi anche in mancanza di alcuni dati.

Beginning with version 6.0, time series collections now include secondary and compound indices on measurements, improving reading performance and opening up new use cases such as geo-indexing. By linking geographic information to time series data, analysis can be enriched and expanded to include scenarios involving distance and location. For example, temperature fluctuations of refrigerated delivery vehicles or monitoring fuel consumption of vehicles (ships, planes, etc.) can be tracked.

Query performance and sorting operations have also been improved. For example, MongoDB is now able to easily return the last data point in a set, instead of scanning the entire collection, for faster reading. Clustered and secondary indexes can also be used to efficiently perform sorting operations on time fields and metadata.

Support for building event-driven architectures

With the advent of applications such as Uber, users expect real-time, event-driven experiences such as activity feeds, notifications, or recommendation systems. But moving at the speed of the real world is not easy, because the application must identify and act quickly on changes in data.

Introduced in MongoDB 3.6, change streams provide an API to broadcast any change to a database, cluster, or collection without the high overhead of having to query the entire system. In this way, the application can react automatically, generating an in-app message notification related to the event of interest or creating a pipeline to index new logs as they are generated.

MongoDB release 6.0 enriches change streams, adding new features. It is now possible to get the previous and next status of a modified document, allowing you to send updated versions of entire documents, reference deleted documents, and more. In addition, change streams support data definition language (DDL) operations, such as creating or deleting collections and indexes.

New queries

Aggregation pipelines allow users to process multiple documents and return aggregated statistics and data in order to perform even complex analyses to extract the information of interest.

MongoDB 6.0 adds additional functionality to two key operators, $lookup and $graphlookup, improving JOINS and graph traversals, respectively. Both $lookup and $graphlookup now offer full support for sharded distributions.

The performance of $lookup has been further improved. For example, if there is an index on the foreign key and few documents are compared, $lookup can get results up to 10 times faster than the previous version. If more documents are compared, $lookup will be twice as fast as the previous iterations. If no indexes are available (and the join is for exploratory or ad hoc queries), then $lookup will produce a performance improvement of as much as a hundred times.

The introduction of the read concern snapshot and the optional atClusterTime parameter allows applications to perform complex analytic queries on a globally and transactional consistent snapshot of operational data. Even if the data changes under you, MongoDB will maintain point-in-time consistency of query results returned to users.

These point-in-time analytic queries can span multiple shards with large distributed datasets. By routing these queries on secondaries, you can isolate analytic query workloads from transactional queries, which are served from the same cluster, thus avoiding slow, fragile and expensive ETL.

More operators to improve application efficiency

New operators have been introduced to increase application productivity by devolving more work to the database and thus reducing the time needed to write ad-hoc code for data manipulation.

For example, you can easily discover the most detected values in your data set with operators such as $maxN , $minN or $lastN. In addition, you can use an operator such as $sortArray to sort the elements of an array directly in aggregation pipelines.

More resilient operations

Thanks to the replica set, MongoDB has always proven robust to service interruptions caused by network, software or hardware problems. Initial synchronization is how a member of a replica set obtains a complete copy of data from an existing member. This is critical for updating nodes that have obsolete copies of data or when adding new nodes to improve resiliency, read scalability, or query latency.

MongoDB 6.0 introduces initial synchronization via file copy, which is up to four times faster than current methods. However, this feature is only available with MongoDB Enterprise Server.

In addition to the improvement in initial synchronization, MongoDB 6.0 also introduces major improvements in sharding, i.e., the mechanism that enables horizontal scalability. The default chunk size for sharded collections is now 128 MB, which means fewer chunk migrations and greater efficiency from both a network and internal overhead perspective at the query routing level. A new configureCollectionBalancing command also allows a collection to be defragmented to reduce the impact of the sharding balancer.

Increased safety and operational efficiency

MongoDB 6.0 includes new features that eliminate the need to choose between data security and operations efficiency.

Since its introduction in 2019, client-side field-level encryption (CSFLE) has helped many organizations manage sensitive information securely, especially as they migrate more of their applications to the public cloud. With MongoDB 6.0, CSFLE will include support for any KMIP-compliant key management vendor. KMIP is an industry-leading standard and simplifies the storage, manipulation and management of cryptographic objects such as encryption keys, certificates and more.

MongoDB’s support for auditing allows administrators to track system activity for multi-user deployments, ensuring accountability for actions taken in the database. Although it is important for auditors to be able to inspect audit logs to evaluate activities, the contents of an audit log must be protected from unauthorized access as they may contain sensitive information. For this reason, the new version allows administrators to compress and encrypt audit events before they are written to disk by leveraging their KMIP-compliant key management system. Encrypting the logs protects the confidentiality and integrity of the events. If logs are propagated through central log management systems or SIEMs, they remain encrypted.

In addition, queryable encryption is now available. This pioneering technology enables expressive queries to be performed on encrypted data, decrypting it only when it is made available to the user. This ensures that the data remains encrypted throughout its lifecycle and that richer queries can be executed efficiently without having to decrypt the data first.

Smoother search

A number of ancillary features have also been introduced to make searching more seamless.

The first is support for sharded collections of Atlas Search facets, which allow for quick filtering and counting of results so that users can easily narrow searches and navigate to the data they need.

Another important new feature is Cluster-to-Cluster Sync, which enables easy migration of data to the cloud, creation of development, test or analysis environments, and support for compliance requirements and audits. Cluster-to-Cluster Sync provides continuous, one-way synchronization of data from two MongoDB clusters in any environment, whether hybrid, Atlas, on-premises or edge. You can, in addition, control and monitor the synchronization process in real time, starting, stopping, resuming or even reversing synchronization as needed.

To recap, the new features in MongoDB 6.0 aim to ease development and operations, remove data silos, and eliminate the complexity that accompanies the unnecessary use of separate niche technologies. This means less custom work, troubleshooting and confusing architectures, and more time to focus on your business activities.

More To Explore

Artificial intelligence

Gradio: web applications in python for AI [part2]

Gradio is a python library that allows us to create web applications quickly and intuitively for our machine learning and AI models. Our applications always require user interaction and layout customization. Let us find out, through examples, how to improve our applications.

Alessandro Fiori 22 April 2024

Artificial intelligence

Gradio: web applications in python for AI [part1]

Writing web applications for our machine learning and/or artificial intelligence models can take a lot of time and skills that we do not possess. To streamline and speed up this task we are helped by Gradio, a Python library designed to create web applications with just a few lines of code. Let’s discover its basic functionality with some examples.

Alessandro Fiori 8 April 2024