Elasticsearch: the aggregation types

Elasticsearch is a widely used NoSQL database for developing search engines because of its ability to index text appropriately. But it does not stop at just that. Thanks to aggregations, Elasticsearch can be used to analyze data and extract statistics from large masses of data. Let's learn about this functionality of his that underlies many visualizations used by Kibana.

Tempo di lettura: 3 minuti

In the previous articles, Elasticsearch: use of match queries, Elasticsearch: use of term queries, Elasticsearch: compound query, and Elasticsearch: join and bonus queries, we have seen how to query documents saved within an Elasticsearch index. But Elasticsearch is not just for searching structured information or unstructured text. Aggregations allow you to leverage Elasticsearch’s powerful analytic engine to analyze data and extract statistics.

Use cases for aggregations range from analyzing real-time data to take an action to using Kibana to create a visualization dashboard. In fact, many visualizations, which we have already seen in the articles Kibana: let’s explore data and Kibana: build your own dashboard to create interactive dashboards, rely on aggregation.

The great potential of Elasticsearch is the ability to perform aggregations on huge datasets in milliseconds. Obviously, compared to queries, aggregations consume more CPU cycles and memory. Therefore, this type of search is mainly used for creating dashboards or performing complex analyses on data.

In this series of articles we will study, through examples, the various types of aggregation to understand what information and statistics we can extract. Specifically, in this article we will introduce the syntax of aggregation and the various types, which we will then analyze later.

Aggregations on text fields

By default, Elasticsearch does not support aggregations over a text field. Since text fields are tokenized, an aggregation on a text field must reverse the tokenization process to return to the original string and then formulate an aggregation based on it. This operation consumes a lot of memory and degrades cluster performance.

Although it is possible to enable aggregations on text fields by setting the fielddata parameter to true in the mapping, the aggregations are still based on the tokenized words and not on the raw text.

It is recommended to keep a raw version of the text field as a keyword type field on which aggregations can be performed. In this case, aggregations can be performed on the title.raw field instead of the title field:

PUT movies
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fielddata": true,
        "fields": {
          "raw": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

General aggregation structure

The structure of an aggregation query is as follows:

GET _search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "AGG_TYPE": {}
    }
  }
}

If you are only interested in the aggregation result and not the query results, you should set the size to 0.

Any number of aggregations can be defined in the aggs property. Each aggregation is defined by its name and one of the aggregation types supported by Elasticsearch.

The name of the aggregation helps to distinguish the different aggregations in the response. The AGG_TYPE property allows you to specify the type of the aggregation.

Sample aggregation

This section uses e-commerce and sample web log data from Kibana. To add sample data, log into Kibana, choose Home and Try our sample data. For sample e-commerce orders and sample web logs, choose Add data.

Example of average calculation

To find the average value of the taxful_total_price field:

GET kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "avg_taxful_total_price": {
      "avg": {
        "field": "taxful_total_price"
      }
    }
  }
}

Sample response

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4675,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_taxful_total_price" : {
      "value" : 75.05542864304813
    }
  }
}

The aggregation block in the response shows the average value of the taxful_total_price field.

Types of aggregations

There are three main types of aggregations:

Metric aggregations: calculate metrics such as sum, min, max, and avg over numeric fields.
Bucket aggregations: sort query results into groups based on some criteria.
Pipeline aggregations: transform the output of one aggregation into an input for another.

Nested aggregations

Aggregations within aggregations are called nested aggregations or subaggregations. Not all types of aggregations allow nested aggregations to be defined. In fact, metric aggregations produce simple results that cannot be used for further aggregations. In contrast, bucket aggregations produce groups of documents that can be nested in other aggregations. Complex data analyses can be performed by nested within metric and bucket aggregations.

General syntax of nested aggregation

{
  "aggs": {
    "name": {
      "type": {
        "data"
      },
      "aggs": {
        "nested": {
          "type": {
            "data"
          }
        }
      }
    }
  }
}

The internal aggs keyword starts a new nested aggregation. The syntax of the parent aggregation and the nested aggregation is the same. Nested aggregations are executed in the context of the parent aggregations.

You can also associate aggregations with search queries to narrow down the elements to be analyzed before the aggregation. If you do not add a query, Elasticsearch implicitly uses the match_all query.

We will see in subsequent articles some examples for the various types of aggregations.

More To Explore

DBMS

Apache Kafka Part 1: What Stream Processing Is and Why It Changes Everything

Kafka is not a typical message broker — it’s the distributed nervous system powering Netflix, LinkedIn, and Uber. It handles millions of events per second without losing a single one, with guaranteed ordering per partition. This first installment explains the core concepts (topics, partitions, offsets, consumer groups) using a real use case: the 50 ARPA Piedmont stations from the Smart City project at Politecnico di Torino.

Alessandro Fiori 6 July 2026

Development

Supabase: the Open-Source Backend for Your Vibe-Coded Apps

Lovable and Bolt build the frontend in minutes. But where does user data live? How does login work? Who can see what? Supabase answers all of these questions: managed PostgreSQL, ready-to-use authentication, file storage, and Row Level Security — all free up to a generous limit, all integrable in a single click from the main vibe coding tools.

Alessandro Fiori 29 June 2026

One Response

Pingback: Elasticsearch: bucket aggregations [part 1] - Flowygo