Skip to content

Craftsmanship

Software Factory

Elastic Stack, content indexing and search

24 Oct 2019

by

Sylvain Chery

rdvtech1-300x147

Content indexing and search is a growing need in an environment that produces more and more data with variable structure in nature and over time. Structured or free research within these data is also of great importance, and the ever-increasing volume requires the establishment of a system whose scaling is guaranteed by robust mechanisms. Elastic stack solutions, historically known by the acronym ELK (ElasticSearch, Logstash, Kibana) provide answers to these needs based on open source and highly distributable solutions.

Distributed indexing and search

The establishment of indexing and research infrastructure is a growing need. As a result of users becoming accustomed to data access via text search (Google, Bing, etc.), this need is constantly increasing for new applications in companies or on the Web.

In order to meet this need, an infrastructure must be put in place, corresponding to the following types of components:

  • an indexing system: analysis of submitted data, to produce a format that facilitates a quick search, such as the index of a book allows you to find content based on a keyword.
  • A search engine: to look for data in the index based on more or less structured queries.
  • A visualization interface: to make sense of data, especially if there is a large volume of data. The visualization tool allows to produce dynamic graphs, visualizations or reports based on the data returned by the search engine.
  • Data collectors: a collector is a component that will collect the data to be indexed at their source (a file, a file tree, a database, a network flow ...)
  • A distributor (dispatcher): the distributor will send the collected data to the indexing system.
rdvtech2

These different components can be provided by a single solution, or by the composition of several complementary solutions.

Other components can be added to this infrastructure to ensure scalability and high availability, for example:

  • A message broker which can serve as a buffer between the distributor (s) and the indexing system. This broker should also be distributed and compatible with the scaling of the infrastructure.
  • A filter that intervenes between the distributor (s) and the indexing system and makes it possible to delete or enrich upstream the data.

Elastic stack (ELK) conceptually

The Elastic stack is composed of several solutions. The best known are the acronym ELK (popular in the community):

  • ElasticSearch: the central solution. It plays the role of indexing system, search engine, and can also act as a data collector and filter thanks to "Ingest Nodes". The ability to collect and filter data directly with ElasticSearch makes it sufficient for small size needs or for lab / development environments.
  • Logstash: the Swiss knife. Its purpose is essentially the collection and distribution of data to indexing systems (to ElasticSearch, for example). It takes many types of data input, and can distribute to many systems (in addition to Elasticsearch). In addition, it also provides very powerful data filtering capabilities Its Swiss knife side allows it to be used with or without Elasticsearch. It is often set up as an agent on systems that have access to the data you want to index.
  • Kibana: Visualization application that accesses the APIs of an Elasticsearch node or cluster and allows you to create rich visualizations.
rdvtech3-300x226

In addition, there are more specialized collection components than Logstash: the Beats. These components are optimized to have a very small footprint on the systems they collect, and that provide ready-to-use data for Logstash or directly by an Elasticsearch cluster.

ElasticSearch, Logstash, Kibana and Beats are the Elastic stack, which (at the time of writing this article) has just released in a version 5.0 consistent that allows to align the version of all these solutions.

There are equivalent stacks provided by other editors like for example that of InfluxData which however, it is more focused on time-series data.

The composable aspect of Elastic's solution allows it to be go together with many other compatible tools.

Elastic stack (ELK) technically

Technically the Elastic stack is composed of different solutions, but coherent with each other from the point of view of APIs and data exchanges. However, they use different technical stacks.

Elasticsearch and Logstash are based on Java (at least in version 8) and require therefore at least the establishment of a JVM to operate.

Kibana embeds a version of NodeJS to provide the visualization interface.

The Beats are coded in Go to provide executables with the footprint as low as possible.

It is important to install these tools in very controlled environments.

The tools use most of the time the lingua franca JSON for the exchange of the data (even if one can choose to send the non-transformed data, or in other formats), and REST APIs to provide the distributed functionality of the search / indexing / monitoring cluster.

Distributed messages such as RabbitMQ or Kafka to serve as a buffer between the systems being collected and the indexing / searching infrastructure. This is often necessary both to ensure high availability and to facilitate maintenance operations of the Elasticsearch cluster.

We tell you!

The Elastic stack has reached a milestone with the new version 5.0.

The complete solution is more consistent, very well documented (one of the great qualities of the project) and provides an installation experience a little less fragmented.

A good first approach is to use it as a first step in the context of diagnostic tools on application logs or system metrics for example.

It is worth noting Elastic.co also provides support for this stack, as well as a proprietary and paid component: XPack which provides additional security, alert, monitoring and reporting components as well as the ability to structure graphs and query them based on these graphs.

By the recent acquisition of the company Prelert suggests that the stack will provide more and more Predictive analytics capabilities (based on Machine Learning) .To monitor very closely!

Of course, if you too have experienced Elastic Stack (ELK) or equivalent systems, do not hesitate to contact us!