ElasticSearch 8: new features of the new version

Elastisearch is a NoSQL database used primarily for building search engines. In fact, thanks to the integration of Apache Lucene it allows to properly index text documents and perform very accurate searches. The new release introduces some new features and improvements over version 7.

Share

Reading time: 4 minutes

Elastic released version 8 of its platform on February 10, 2022. Over the past few months, other versions have been released that include new features and bug fixes. This release is the first significant revision since April 2019, when version 7 had been released. In this article we will highlight the new features and changes introduced in the new version. You can of course find the full list of release notes on the official website.

Backward compatibility

The most important note to highlight in version 8.0 is that Elastic has added support for REST API call headers from version 7. In fact. version 8.0 introduced several changes to REST API headers and responses, but because of the support for previous structures, users can upgrade ElastiSearch to the latest version without necessarily having to make immediate code changes.

The only requirements for upgrading to version 8 are upgrading to the latest version 7 (7.17) and enabling REST compatibility using Accept and Content-Type headers. After reviewing and resolving all critical issues listed in the Upgrade Assistant, you can upgrade to Elasticsearch 8.

Beware, however, that the compatibility settings are not intended to become a permanent feature of the platform. In fact, these have only been introduced in order to make the upgrade process to version 8 smoother. Also, once upgraded to the version, an error-free downgrade to previous versions is not guaranteed.

New Features

In this section we will go over the main features introduced in version 8.

KNN dense search vectors

Recommendation engines can be implemented using kNN search vectors. Version 7.x of Elasticsearch included kNN search using the script_score field. This method guaranteed accurate results, but at a high cost in terms of speed and scalability.

In Elasticsearch 8.x, the dense_vector field was added. This new field allows users to perform approximate kNN searches on larger datasets more quickly than script_score. However, these searches return less accurate results.

Therefore, the user can decide which technique to use based on the application context.

Uploading NLP PyTorch Models

PyTorch is a machine learning framework based on the Torch library and used to create models for applications such as computer vision and natural language processing (NLP). Elasticsearch 8.x allows users to upload machine learning models trained in PyTorch and use them for natural language processing. After deploying a model written in TorchScript to a cluster, users can make predictions about incoming data and perform operations based on the results. For example, the ElasticSearch stack supports text classification, embedding, and named entity recognition.

Enhancements

Below we review the feature improvements introduced in the new version of ElasticSearch.

ECS compliant JSON logs

Elasticsearch uses Apache’s log4j2 for JSON logs. In version 8.0, the configuration has been updated to use EcsLayout instead of ESJsonLayout. Previous versions will have no interruptions in logging because the previous infrastructure has been maintained. However, changing the setting will affect some of the ES JSON logging:

  • Stacktrace messages will not be multiline
  • Markers are placed in the tags field instead of in messages
  • Packages are not shortened
  • %node_id and %cluster_id are now separate converters
  • Some changes in field names, such as level becoming log.level and component changing to log.logger

Metricbeat ECS data templates to support earlier log formats

When Elasticsearch changes Metricbeat to write logs in ECS-compliant format, it will stop supporting the legacy format. To support the legacy formats of Elasticsearch 7 and earlier, new mappings have been added with the new ECS fields for data indexing. These mappings include alias fields for the legacy format so that they can point to the corresponding ECS fields. Four new mappings were created for Metricbeat, Elasticsearch, Kibana, and Logstash logs.

Upgrade to Lucene version 9

Elasticsearch is built on top of the Lucene library. Elasticsearch provides a convenient REST API to facilitate user interaction with Lucene’s search functions. With version 8.0 of Elasticsearch, an upgrade to Lucene version 9 was made.

This latest version of Lucene includes new language features for Japanese, Swedish, Serbian, and other languages. In addition, support for high-dimensional numerical vectors in kNN searches and other optimizations have been added. These include faster faceting of taxonomies, faster indexing of multidimensional points, and faster sorting of fields indexed as points.

Discontinuity

With the new version some features found in previous releases are no longer available and supported. Let’s find out what they are.

Mapping of Elasticsearch 6

The new version of Elasticsearch starts only if all indexes created on the cluster were created at least in version 7.0. Therefore, if you have of an index created with an unsupported version, you can use the reindex command to update the index structure.

Plugins for external storage are included in Elasticsearch

Snapshots repositories are used to store backups of the Elasticsearch cluster and protect the data. If the Elasticsearch cluster is corrupted, the snapshots can be used to recover the data.

In previous versions of Elasticsearch, it was necessary to install plugins dedicated to each repository including: Amazon S3, Google Cloud Storage and Microsoft Azure Blob Storage. With the new version, users no longer need to install these plugins as they are already included in the Elasticsearch library by default.

Endpoint API REST

Several REST endpoints were changed or removed as part of the upgrade to version 8.0. Many of these changes were already deprecated in version 7. Therefore, responses on these endpoints report an error and no longer a warning.

Changes include:

  • Removing xpack from endpoint paths
  • Removing mapping types from endpoint paths
  • Replacing nGram and edgeNGram with ngram and edge_ngram respectively in token filter requests.
  • Substitution of wildcard function as like or regex keywords

More To Explore

Artificial intelligence

Gradio: web applications in python for AI [part2]

Gradio is a python library that allows us to create web applications quickly and intuitively for our machine learning and AI models. Our applications always require user interaction and layout customization. Let us find out, through examples, how to improve our applications.

Artificial intelligence

Gradio: web applications in python for AI [part1]

Writing web applications for our machine learning and/or artificial intelligence models can take a lot of time and skills that we do not possess. To streamline and speed up this task we are helped by Gradio, a Python library designed to create web applications with just a few lines of code. Let’s discover its basic functionality with some examples.

Leave a Reply

Your email address will not be published. Required fields are marked *

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!