Guides
Extract and parse log fields
Log data can come from any number of locations: kernel messages, system logs, application logs or standard output from containers. Logs may not always be emitted in the format that you need them to be, so it's critical to be able to parse and extract fields from logs as needed.
In this guide we'll demonstrate a few patterns for parsing and extracting fields from log data.
Extracting text with regular expressions
Say you had a structured log event that looked like the following, with a top-level url
field. You would like to filter and group these logs based on the incoming UTM source, but that's difficult because the UTM source is not a top-level field.
Input:
method: "GET"
url: "https://hackers.example.com/?utm_source=hackernewsletter&utm_medium=email&utm_term=code"
response_size: 9242
Using the Extract function, we can pull out this utm_source
and populate it as a top-level field in our structured log message.
Add an Extract function to your stream and set the following:
- Field Name:
url
- Expression: goregex:
[?$]utm_source=(?P<utm_source>[^&]+)
- Destination Field: leave blank
The Go regular expression here matches the value for the utm_source
URL parameter and records it in the named capture group utm_source
. The extract method will set a field with a key name matching the named capture group, utm_source
. By leaving the destination field empty, we are telling it to set this field at the same depth in the event as the url
field.
Output:
method: "GET"
url: "https://hackers.example.com/?utm_source=hackernewsletter&utm_medium=email&utm_term=code"
response_size: 9242
utm_source: "hackernewsletter"
You can now perform further filtering or down stream grouping using this new top-level field.
Parsing a JSON encoded field
When you combine structured application logging with a log shipper you can end up with fields that contain encoded JSON. Take the following example where the application emitted a structured JSON log, but the log shipper double-encoded the event as a JSON string. To pull out the values from the application log, we must parse the enclosed JSON string.
Input:
host: "10.3.4.8"
region: "us-east-1"
msg: "{\"level\": \"error\", \"message\": \"Failed to connect to MySQL\", \"timestamp\":\"2024-08-20 17:30:01+00:00\"}"
We can use the Parse function in our stream to parse the enclosed JSON in msg
with the following configuration:
- Field name:
msg
- Format:
JSON
- Destination:
msg
- Overwrite destination if data already exists: enabled
Parse replaces the existing field with the parsed internal fields, so we get the following.
Output:
host: "10.3.4.8"
region: "us-east-1"
msg:
message: "Failed to connect to MySQL"
level: "error"
timestamp: "2024-08-20 17:30:01+00:00"