# Elasticsearch: the aggregation types

Elasticsearch is a widely used NoSQL database for developing search engines because of its ability to index text appropriately. But it does not stop at just that. Thanks to aggregations, Elasticsearch can be used to analyze data and extract statistics from large masses of data. Let's learn about this functionality of his that underlies many visualizations used by Kibana.

## Share

In the previous articles, Elasticsearch: use of match queries, Elasticsearch: use of term queries, Elasticsearch: compound query, and Elasticsearch: join and bonus queries, we have seen how to query documents saved within an Elasticsearch index. But Elasticsearch is not just for searching structured information or unstructured text. Aggregations allow you to leverage Elasticsearch’s powerful analytic engine to analyze data and extract statistics.

Use cases for aggregations range from analyzing real-time data to take an action to using Kibana to create a visualization dashboard. In fact, many visualizations, which we have already seen in the articles Kibana: let’s explore data and Kibana: build your own dashboard to create interactive dashboards, rely on aggregation.

The great potential of Elasticsearch is the ability to perform aggregations on huge datasets in milliseconds. Obviously, compared to queries, aggregations consume more CPU cycles and memory. Therefore, this type of search is mainly used for creating dashboards or performing complex analyses on data.

In this series of articles we will study, through examples, the various types of aggregation to understand what information and statistics we can extract. Specifically, in this article we will introduce the syntax of aggregation and the various types, which we will then analyze later.

## Aggregations on text fields

By default, Elasticsearch does not support aggregations over a text field. Since text fields are tokenized, an aggregation on a text field must reverse the tokenization process to return to the original string and then formulate an aggregation based on it. This operation consumes a lot of memory and degrades cluster performance.

Although it is possible to enable aggregations on text fields by setting the fielddata parameter to true in the mapping, the aggregations are still based on the tokenized words and not on the raw text.

It is recommended to keep a raw version of the text field as a keyword type field on which aggregations can be performed. In this case, aggregations can be performed on the title.raw field instead of the title field:

PUT movies
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fielddata": true,
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
} 

## General aggregation structure

The structure of an aggregation query is as follows:

GET _search
{
"size": 0,
"aggs": {
"NAME": {
"AGG_TYPE": {}
}
}
} 

If you are only interested in the aggregation result and not the query results, you should set the size to 0.

Any number of aggregations can be defined in the aggs property. Each aggregation is defined by its name and one of the aggregation types supported by Elasticsearch.

The name of the aggregation helps to distinguish the different aggregations in the response. The AGG_TYPE property allows you to specify the type of the aggregation.

## Sample aggregation

This section uses e-commerce and sample web log data from Kibana. To add sample data, log into Kibana, choose Home and Try our sample data. For sample e-commerce orders and sample web logs, choose Add data.

### Example of average calculation

To find the average value of the taxful_total_price field:

GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"avg_taxful_total_price": {
"avg": {
"field": "taxful_total_price"
}
}
}
} 

#### Sample response

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"avg_taxful_total_price" : {
"value" : 75.05542864304813
}
}
} 

The aggregation block in the response shows the average value of the taxful_total_price field.

## Types of aggregations

There are three main types of aggregations:

• Metric aggregations: calculate metrics such as sum, min, max, and avg over numeric fields.
• Bucket aggregations: sort query results into groups based on some criteria.
• Pipeline aggregations: transform the output of one aggregation into an input for another.

## Nested aggregations

Aggregations within aggregations are called nested aggregations or subaggregations. Not all types of aggregations allow nested aggregations to be defined. In fact, metric aggregations produce simple results that cannot be used for further aggregations. In contrast, bucket aggregations produce groups of documents that can be nested in other aggregations. Complex data analyses can be performed by nested within metric and bucket aggregations.

General syntax of nested aggregation

{
"aggs": {
"name": {
"type": {
"data"
},
"aggs": {
"nested": {
"type": {
"data"
}
}
}
}
}
} 

The internal aggs keyword starts a new nested aggregation. The syntax of the parent aggregation and the nested aggregation is the same. Nested aggregations are executed in the context of the parent aggregations.

You can also associate aggregations with search queries to narrow down the elements to be analyzed before the aggregation. If you do not add a query, Elasticsearch implicitly uses the match_all query.

We will see in subsequent articles some examples for the various types of aggregations.

## More To Explore

Python language

### Plotly Go: advanced visualization in Python

Visualizing data is critical to better understand the data and analysis performed. There are several tools, free and paid, that allow you to create fantastic dashboards. However, it is possible to write a few riches in Python to get great results and be more flexible depending on the project of interest. Let’s find out how to create interactive Scatter Bubble charts with Plotly Go on a real project.

Python language

### Clustering: a real project to explore data

Clustering is a very powerful tool for grouping data. There are many algorithms that can be applied, so the choice is always difficult. In addition, all clustering algorithms require parameters to work. By means of a real case study, applied to real estate data, we will combine PCA, hierarchical clustering and K-means to provide optimal clustering solutions.