Support

Guides

Extract and parse log fields

Log data can come from any number of locations: kernel messages, system logs, application logs or standard output from containers. Logs may not always be emitted in the format that you need them to be, so it's critical to be able to parse and extract fields from logs as needed.

In this guide we'll demonstrate a few patterns for parsing and extracting fields from log data.

Extracting text with regular expressions

Say you had a structured log event that looked like the following, with a top-level url field. You would like to filter and group these logs based on the incoming UTM source, but that's difficult because the UTM source is not a top-level field.

Input:

method: "GET"
url: "https://hackers.example.com/?utm_source=hackernewsletter&utm_medium=email&utm_term=code"
response_size: 9242

Using the Extract function, we can pull out this utm_source and populate it as a top-level field in our structured log message.

Add an Extract function to your stream and set the following:

  • Field Name: url
  • Expression: goregex: [?$]utm_source=(?P<utm_source>[^&]+)
  • Destination Field: leave blank

The Go regular expression here matches the value for the utm_source URL parameter and records it in the named capture group utm_source. The extract method will set a field with a key name matching the named capture group, utm_source. By leaving the destination field empty, we are telling it to set this field at the same depth in the event as the url field.

Output:

method: "GET"
url: "https://hackers.example.com/?utm_source=hackernewsletter&utm_medium=email&utm_term=code"
response_size: 9242
utm_source: "hackernewsletter"

You can now perform further filtering or down stream grouping using this new top-level field.

Parsing a JSON encoded field

When you combine structured application logging with a log shipper you can end up with fields that contain encoded JSON. Take the following example where the application emitted a structured JSON log, but the log shipper double-encoded the event as a JSON string. To pull out the values from the application log, we must parse the enclosed JSON string.

Input:

host: "10.3.4.8"
region: "us-east-1"
msg: "{\"level\": \"error\", \"message\": \"Failed to connect to MySQL\", \"timestamp\":\"2024-08-20 17:30:01+00:00\"}"

We can use the Parse function in our stream to parse the enclosed JSON in msg with the following configuration:

  • Field name: msg
  • Format: JSON
  • Destination: msg
  • Overwrite destination if data already exists: enabled

Parse replaces the existing field with the parsed internal fields, so we get the following.

Output:

host: "10.3.4.8"
region: "us-east-1"
msg:
  message: "Failed to connect to MySQL"
  level: "error"
  timestamp: "2024-08-20 17:30:01+00:00"
Previous
Query Datadog Logs with Clickhouse Cloud