-*- org-mode -*-

* How to parse single line JSON logs for ElasticSearch Cloud
This is going to be a brief blog post, but wanted to jot down a few things as solving this "easy" issue has taken me the better part of 4 hours.
** The Problem
I have a number of weirdly formatted logs that developers would like to be able to easily search through and get insights from. The developers control this log format,
but its an embedded environment and it's "non-trivial" to modify the format. I wrote a Perl script that will read in these developer logs and regex out
key fields I'm interested in, transforming them like so (fake data):

#+BEGIN_SRC shell
# Original log line
# LOG LEVEL # DATE & TIME      # FUNCTION NAME/LINE NUMBER # LOG MESSAGE
[DEBUG] 2020/9/10 - 13:59:23 | some_function_name 166: some log message

# PARSED LOG LINE
{"log_level":"Debug","timestamp":"2020-09-10T13:59:23","function_name":"some_function_name","line_number":"166","message":"some log message"}
#+END_SRC

After setting up this log parser and filebeat, I started processing these logs into a hosted ElasticSearch cloud instance. To my surprised, the JSON fields were
not indexed, meaning I couldn't perform KQL searches like =timestamp:2020-09*= to get all log lines from that month.

** The Solution
To Elastic's credit, it's actually incredibly simple to get this behavior with filebeat, all I needed to do was add the following to the =/etc/filebeat/filebeat.yml=
file under the =processors= field (This is on filebeat versions 7.x):

#+BEGIN_SRC yaml
processors:
 - decode_json_fields:
     fields: ["line_number","message","timestamp","function_name","log_level"]
     process_array: false
     max_depth: 1
     target: ""
     overwrite_keys: false
     add_error_key: true
#+END_SRC

The relevant documentation can be found here: https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html

After creating a new index in ElasticSearch and ingesting logs to this new index, the expected KQL behavior worked.

The reason why I'm making this blog post is that it took me hours to find this documentation, as there seems to be about 1000 different ways to get this
functionality, with a number of different caveats or options depending on your use case. I may just be showing my inexperience with ElasticSearch here,
but decided to write something brief about this because it took me a while to track down.

Note: This post isn't a knock against Elastic and their products. They solve a complex issue and give users a lot of options for how to manage, index, and
search their data. I think given those options though, groking documentation can become time consuming and I wanted to try and offer a shortcut.