While working at Booking.com, I was looking for a solution to logging that matched the ease of use and power as Graphite did for metrics. Reluctant to bring a new technology into production, I talked to co-workers and one mentioned that they were using ElasticSearch in some front-end systems for search and disambiguation. He mentioned hearing there were a few projects using ElasticSearch for storing log data.
This began my love-hate-love relationship with ElasticSearch. I've spent the past 8 years working with ElasticSearch professionally and in my spare time. Graphite and ElasticSearch are two projects that change the game in terms of exploring your data. The countless insights I've gained into system performance, application performance, and system and network security with these tools is unparalleled. Tools like Grafana and Kibana allow you to visualize your data quickly and beautifully. As a system and security engineer, sometimes this isn't enough. I spend most of my day in a terminal and needed something to explore and pivot through the data there.
This is the first part, in a many part series about a tool I created to make ElasticSearch's powerful search interface more accessible from the terminal. This tool has been essential to nearly every incident I've investigated. It was developed with the help, patience, and amazing ideas from co-workers both at Booking.com and now at Craigslist.
Perl Setup
I'm a Perl programmer. You may have strong feelings about that, but Perl has been good to me. The freedom to write code as beautifully, or as ugly, as I need to get the job done is liberating. I recommend using Perl 5.28 or newer with Perlbrew.
You should be comfortable with the command line, so follow the steps to install Perlbrew from it's homepage. After that:
$ perlbrew init
$ perlbrew install -j 8 -n --thread 5.28.2
$ perlbrew switch 5.28.2
$ perlbrew install-cpanm
Now that you have a working, local, user managed Perl, we'll install the toolset.
$ cpanm App::ElasticSearch::Utilities
The utilities and their dependencies will be installed in your local, user managed Perl path.
(Some) Utilities Installed
es-alias-manager.pl
- Alternative to curator for managing aliases for indexeses-apply-settings.pl
- Applies settings to an index based on index name and age.es-copy-index.pl
- Tool for copying all (or based on a search) documents from an index on the same or a different cluster to another index, optionally supports supplying alternate settings/mappings for the destination index if it's being createdes-daily-index-maintenance.pl
- Alternative to curator for maintaining index life spanses-graphite-dynamic.pl
- Script to extract ElasticSearch Performance metrics into Graphite directly or via collectd/diamond.es-status.pl
- A quick "how's the cluster" status overviewes-storage-overview.pl
- Check how much storage each node and/or index is consuming in the cluster.
And finally, the tool I'm going to be talking about: es-search.pl
. This is
a tool designed with the UNIX philosophy in mind to enable workflows where the
output of one query can be fed into another.
Configuration
In order to ensure we have the most fun with the tool, let's setup some
defaults to make our command lines shorter. All of the scripts (and if you're
so inclined, the entirety of the App::ElasticSearch::Utilities
functions)
use this config file to determine how to find, connect, and talk to your
ElasticSearch cluster.
Create ~/.es-utils.yaml
file with something like this:
---
host: localhost
port: 9200
base: syslog
days: 1
timestamp: '@timestamp'
host
- The host of the hostname or IP of the node you'd like to use to connect, default is localhostport
- The port to use to connect, the default is 9200base
- Default index base name, defaults to logstashdays
- Default number of days to search, defaults to 7timestamp
- Default name of the field containing the timestamp for logging events, defaults to @timestamp
Index Bases
The idea behind this tool, is to make things as simple as possible. If you're like me, you probably use index names to differentiate where shards are allocated and ultimately, how long shards will exist on your cluster. On large indices, where data is variably interesting, I tend to use this pattern.
- I want to index HTTP access logs, I'll designate the mappings keying off
the pattern:
*-access-*
- My logs span multiple datacenters, so I'll set allocation rules to make
shards in each datacenter stay in that datacenter. If my datacenter tag
is
sfo
, I'd set a patternsfo-*
to grab those shards - There maybe lower value data in the logs, like requests for images, CSS, or
JavaScript assets. I want these around, but if they're 90% of my logging
volume and they generally become less interesting more quickly and I'll want
shorter retention rules applied to them. These indexes might include a tag in
the index name of
*-bulk-*
to make them distinguishable.
At the end of this madness I might have a list of indexes like:
Index Name | Alias | Retention | Content |
---|---|---|---|
ams-access-2019.05.19 | access-2019.05.19 | 90d | Normal access logs for `ams` servers |
ams-access-bulk-2019.05.19 | access-2019.05.19 | 7d | Uninteresting access logs for `ams` servers |
ams-syslog-2019.05.19 | syslog-2019.05.19 | 90d | Syslog data for `ams` servers |
sfo-access-2019.05.19 | access-2019.05.19 | 90d | Normal access logs for `sfo` servers |
sfo-access-bulk-2019.05.19 | access-2019.05.19 | 7d | Uninteresting access logs for `sfo` servers |
sfo-syslog-2019.05.19 | syslog-2019.05.19 | 90d | Syslog data for `sfo` servers |
If I wanted to search those indexes, I could just use --base access
as they
all will be parsed to the correct bases. If you're not sure what
es-search.pl
might think of what bases you have available, ask it to tell
you:
$ es-search.pl --bases
Bases available for search:
access
ams-access
ams-access-bulk
ams-syslog
sfo-access
sfo-access-bulk
sfo-syslog
syslog
# Bases: 8 from a combined 6 indices.
Handling More Than One Index Base with Ease!
That's all fine and good if all of your indexes contain the same document
types. That's unlikely as you should be splitting different document types up
into separate indices, if not clusters. If you want to work with
es-search.pl
across all those indexes easily, it will need to know the
correct timestamp field. To enable per-base timestamp fields, you can just
add a meta
section to your ~/.es-utils.yaml
file.
---
host: localhost
port: 9200
base: syslog
days: 1
meta:
access:
timestamp: timestamp
ossec:
timestamp: ts
zeek:
timestamp: event_ts
Now es-search.pl
and the rest of the utilities will know that when you
specify --base zeek
the timestamp field to sort on will be event_ts
and
you won't need to think about adding --timestamp event_ts
to the command
line.
Seeing Data
Now that you're configured, we can just run:
$ es-search.pl
= Querying Indexes: syslog-2019.05.19
---
action: connect
hostname: janus
message: 'connect from unknown[102.165.34.33]'
proc: smtpd
proc_id: 30775
program: postfix/smtpd
src: unknown
src_ip: 102.165.34.33
tags:
- decoder_syslog
- mail
- postfix
timestamp: 2019-05-19T02:07:34.861416
total_time: 0.004363
<snip>
# Search Parameters:
# {"bool":{}}
# Displaying 20 of 357 in 0.0584328174591064 seconds.
# Indexes (1 of 1) searched: syslog-2019.05.19
Each document's _source
is YAML printed to the screen. This is not the usual
use case for es-search.pl
, so let's do better. It's also likely that the
documents you're viewing may not contain all the valid fields in the index.
Finding the Fields in the Index
When you start working with ElasticSearch indexes, you may not know all the
fields available for search. es-search.pl
allows you to explore a bit:
$ es-search.pl --base syslog --fields
Fields available for search:
- action
- dev
- dst_geoip.continent
- dst_geoip.country
- dst_geoip.location
- dst_ip
- dst_port
- exe
- file
- hostname
- in_bytes
- message
- out_bytes
- proc
- proc_id
- program
- proto_app
- rec_id
- src
- src_geoip.city
- src_geoip.continent
- src_geoip.country
- src_geoip.location
- src_geoip.postal_code
- src_ip
- src_port
- src_user
- tags
- timestamp
- timing.phase
- timing.seconds
- total_time
# Fields: 32 from a combined 1 indices.
This will help you understand what an index contains. Maybe you wanna see what's in a field? There's two ways, the first with search, the second with aggregations.
Finding Field Values with Search
The simplest, and least taxing way to ask ElasticSearch what a field contains
is to query the index and return the relevant field. To optimize for
documents containing the field, we can use the --exists <fieldname>
filter.
If I just want to see the most recent 20 documents where the field proc
exists and just see the proc
entry, it's as simple as:
$ es-search.pl --exists proc --show proc
= Querying Indexes: syslog-2019.05.19
timestamp proc
2019-05-19T02:04:06.135686 smtpd
2019-05-19T02:04:06.135786 smtpd
2019-05-19T02:04:05.856884 smtpd
2019-05-19T02:03:46.471311 smtpd
2019-05-19T02:03:46.471352 smtpd
2019-05-19T02:03:46.199116 smtpd
2019-05-19T02:03:37.013022 smtpd
2019-05-19T02:03:37.012866 smtpd
2019-05-19T02:03:36.741711 smtpd
2019-05-19T02:03:18.239108 smtpd
2019-05-19T02:03:18.239135 smtpd
2019-05-19T02:03:17.947805 smtpd
2019-05-19T02:03:07.837098 smtpd
2019-05-19T02:03:07.837133 smtpd
2019-05-19T02:03:07.553645 smtpd
2019-05-19T02:03:07.342514 smtpd
2019-05-19T02:03:07.342686 smtpd
2019-05-19T02:03:07.067929 smtpd
2019-05-19T02:02:57.157830 smtpd
2019-05-19T02:02:57.157612 smtpd
# Search Parameters:
# {"bool":{"must":[{"exists":{"field":"proc"}}]}}
# Displaying 20 of 85 in 0.0445699691772461 seconds.
# Indexes (1 of 1) searched: syslog-2019.05.19
This might not give me the best understanding of what the field is, but
already, I know that postfix
log entries are setting this field.
Finding Field Values with Aggregations
We can do a lot better by leveraging aggregations in ElasticSearch. To do so,
we ask es-search.pl
for the top values.
$ es-search.pl --top proc
= Querying Indexes: syslog-2019.05.19
count proc
224 smtpd
27 smtps_smtpd
12 qmgr
9 localsmtp_smtpd
6 cleanup
4 submission_smtpd
3 anvil
3 lmtp
3 pipe
# Search Parameters:
# {"bool":{}}
# Displaying 9 of 693 in 0.00798892974853516 seconds.
# Indexes (1 of 1) searched: syslog-2019.05.19
#
# Totals across batch
#
count proc
224 smtpd
27 smtps_smtpd
12 qmgr
9 localsmtp_smtpd
6 cleanup
4 submission_smtpd
3 anvil
3 pipe
3 lmtp
We now have the top 20 (or fewer if there's not 20 total) values in the proc
field.
Putting It Together
It looks like proc
is the component piece for postfix
syslog data. To be
sure, let's ask ElasticSearch for the top programs with the top 10 procs each.
Since es-search.pl
is designed to make this easy, we type almost exactly
that:
$ es-search.pl --top program --with proc:10 --exists proc
= Querying Indexes: syslog-2019.05.19
count program
224 postfix/smtpd terms.proc smtpd 224
35 postfix/smtps/smtpd terms.proc smtps_smtpd 35
12 postfix/qmgr terms.proc qmgr 12
9 postfix/localsmtp/smtpd terms.proc localsmtp_smtpd 9
6 postfix/anvil terms.proc anvil 6
6 postfix/cleanup terms.proc cleanup 6
6 postfix/submission/smtpd terms.proc submission_smtpd 6
3 postfix/lmtp terms.proc lmtp 3
3 postfix/pipe terms.proc pipe 3
# Search Parameters:
# {"bool":{"must":[{"exists":{"field":"proc"}}]}}
# Displaying 9 of 304 in 0.0130970478057861 seconds.
# Indexes (1 of 1) searched: syslog-2019.05.19
Let's break down that query:
--top program
- Top aggregation, infersterms
, uses the value of--size
which defaults to 20--with proc:10
- Sub aggregation, form is agg_type:field name:sub sizeagg_type
- defaults to terms and can be omitted, but can also be:significant_terms
,max
,min
,sum
,avg
,cardinality
field_name
- is required and is the sub aggregate field namesub_size
- defaults to 3
--exists proc
- Filter the entire aggregation to just documents with the proc field
Wrapping up for now
I think this is a reasonable point to pause. This provides you with enough information to start getting your feet wet with the tool. In the next part, I'll examine building useful queries and how this tool enables pivoting and data exploration.
If you can't wait til next time, run: es-search.pl --manual
to get an in
depth overview of the options available. See below for that man page online:
- GitHub Project Page: reyjrar/es-utils
es-search.pl
man page
- MetaCPAN Project Page: BLHOTSKY/App-ElasticSearch-Utilities