Elastic released version 8 of its platform on February 10, 2022. Over the past few months, other versions have been released that include new features and bug fixes. This release is the first significant revision since April 2019, when version 7 had been released. In this article we will highlight the new features and changes introduced in the new version. You can of course find the full list of release notes on the official website.
The most important note to highlight in version 8.0 is that Elastic has added support for REST API call headers from version 7. In fact. version 8.0 introduced several changes to REST API headers and responses, but because of the support for previous structures, users can upgrade ElastiSearch to the latest version without necessarily having to make immediate code changes.
The only requirements for upgrading to version 8 are upgrading to the latest version 7 (7.17) and enabling REST compatibility using Accept and Content-Type headers. After reviewing and resolving all critical issues listed in the Upgrade Assistant, you can upgrade to Elasticsearch 8.
Beware, however, that the compatibility settings are not intended to become a permanent feature of the platform. In fact, these have only been introduced in order to make the upgrade process to version 8 smoother. Also, once upgraded to the version, an error-free downgrade to previous versions is not guaranteed.
In this section we will go over the main features introduced in version 8.
KNN dense search vectors
Recommendation engines can be implemented using kNN search vectors. Version 7.x of Elasticsearch included kNN search using the script_score field. This method guaranteed accurate results, but at a high cost in terms of speed and scalability.
In Elasticsearch 8.x, the dense_vector field was added. This new field allows users to perform approximate kNN searches on larger datasets more quickly than script_score. However, these searches return less accurate results.
Therefore, the user can decide which technique to use based on the application context.
Uploading NLP PyTorch Models
PyTorch is a machine learning framework based on the Torch library and used to create models for applications such as computer vision and natural language processing (NLP). Elasticsearch 8.x allows users to upload machine learning models trained in PyTorch and use them for natural language processing. After deploying a model written in TorchScript to a cluster, users can make predictions about incoming data and perform operations based on the results. For example, the ElasticSearch stack supports text classification, embedding, and named entity recognition.
Below we review the feature improvements introduced in the new version of ElasticSearch.
ECS compliant JSON logs
Elasticsearch uses Apache’s log4j2 for JSON logs. In version 8.0, the configuration has been updated to use EcsLayout instead of ESJsonLayout. Previous versions will have no interruptions in logging because the previous infrastructure has been maintained. However, changing the setting will affect some of the ES JSON logging:
- Stacktrace messages will not be multiline
- Markers are placed in the tags field instead of in messages
- Packages are not shortened
- %node_id and %cluster_id are now separate converters
- Some changes in field names, such as level becoming log.level and component changing to log.logger
Metricbeat ECS data templates to support earlier log formats
When Elasticsearch changes Metricbeat to write logs in ECS-compliant format, it will stop supporting the legacy format. To support the legacy formats of Elasticsearch 7 and earlier, new mappings have been added with the new ECS fields for data indexing. These mappings include alias fields for the legacy format so that they can point to the corresponding ECS fields. Four new mappings were created for Metricbeat, Elasticsearch, Kibana, and Logstash logs.
Upgrade to Lucene version 9
Elasticsearch is built on top of the Lucene library. Elasticsearch provides a convenient REST API to facilitate user interaction with Lucene’s search functions. With version 8.0 of Elasticsearch, an upgrade to Lucene version 9 was made.
This latest version of Lucene includes new language features for Japanese, Swedish, Serbian, and other languages. In addition, support for high-dimensional numerical vectors in kNN searches and other optimizations have been added. These include faster faceting of taxonomies, faster indexing of multidimensional points, and faster sorting of fields indexed as points.
With the new version some features found in previous releases are no longer available and supported. Let’s find out what they are.
Mapping of Elasticsearch 6
The new version of Elasticsearch starts only if all indexes created on the cluster were created at least in version 7.0. Therefore, if you have of an index created with an unsupported version, you can use the reindex command to update the index structure.
Plugins for external storage are included in Elasticsearch
Snapshots repositories are used to store backups of the Elasticsearch cluster and protect the data. If the Elasticsearch cluster is corrupted, the snapshots can be used to recover the data.
In previous versions of Elasticsearch, it was necessary to install plugins dedicated to each repository including: Amazon S3, Google Cloud Storage and Microsoft Azure Blob Storage. With the new version, users no longer need to install these plugins as they are already included in the Elasticsearch library by default.
Endpoint API REST
Several REST endpoints were changed or removed as part of the upgrade to version 8.0. Many of these changes were already deprecated in version 7. Therefore, responses on these endpoints report an error and no longer a warning.
- Removing xpack from endpoint paths
- Removing mapping types from endpoint paths
- Replacing nGram and edgeNGram with ngram and edge_ngram respectively in token filter requests.
- Substitution of wildcard function as like or regex keywords