Support

Sources

AWS Cloudtrail

Ingest AWS Cloudtrail events to Streamfold.

The AWS Cloudtrail source will continuously pull Cloudtrail events from a configured S3 bucket into Streamfold. Cloudtrail events will be read and the individual events will be published to any stream connected to the source. Connecting Cloudtrail to Streamfold is an easy way to reduce the verbosity of Cloudtrail events so that you can focus on what matters.


Overview

Adding an AWS Cloudtrail source to Streamfold requires specifying the AWS S3 bucket that Cloudtrail events are written to and an SQS queue that is configured to receive event notifications. Streamfold will listen for new messages on the SQS queue whenever a new Cloudtrail file has been written to S3. Streamfold will read the file from S3, extract the individual Cloudtrail events, and emit each event as a new Streamfold event. After successfully reading the file from S3, it will delete the received notification from SQS.

Source Mode

The AWS Cloudtrail source is a pull-based source. Streamfold will connect to and pull data from the source when it is available, instead of a user pushing data to the Streamfold ingress API.

The AWS Cloudtrail source can be in one of three modes:

  • Paused: No data is pulled from the source and no events are emitted.
  • Test: Data is pulled from the source, but the notification message is not deleted from the SQS queue. Once the visibility timeout has been reached, the notification messages will be visible on the SQS queue again.
  • Active: Data is pulled from the source and when successfully processed, the notification message is deleted from the SQS queue.

The test mode is useful when setting up a new Cloudtrail source for the first time. You can configure a stream and test it with live data, but not worry that you will lose data. However, you may see duplicated events if the visibility timeout expires and messages are redelivered.

You can sample events from the source when it is in either test or active mode. Anytime the mode is changed a new deployment is made to the data plane.

Configuration

To configure a Cloudtrail source you'll need to have AWS Cloudtrail enabled in your AWS account. Cloudtrail will periodically write trail files out to an S3 bucket in the same account.

Streamfold detects new files in the S3 bucket by listening for update notifications on an SQS queue. You'll need to create an SQS queue for those notifications and configure your S3 bucket for notifications.

Once you have that setup, you'll need to specify the following values when creating the source in Streamfold.

  • Bucket name
  • SQS ARN
  • Region

After you enter those in the UI you will be presented with an AWS IAM policy document. Follow the instructions to create the policy and role with the included permissions for Streamfold to use the role. After you've created the role you'll need to enter:

  • AWS account ID hosting bucket and queue
  • Name of the IAM role

Examples

The following is an example of a Cloudtrail event. The eventName field identifies the operation type. You can find more examples in the AWS docs.

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "AIDA6ON6E4XEGIEXAMPLE",
        "arn": "arn:aws:iam::444455556666:user/Arnav",
        "accountId": "444455556666",
        "accessKeyId": "AKIAI44QH8DHBEXAMPLE",
        "userName": "Arnav",
        "sessionContext": {
            "sessionIssuer": {},
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2023-07-19T21:11:57Z",
                "mfaAuthenticated": "false"
            }
        }
    },
    "eventTime": "2023-07-19T21:19:22Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "CreateKeyPair",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "192.0.2.0",
    "userAgent": "aws-cli/2.13.5 Python/3.11.4 Linux/4.14.255-314-253.539.amzn2.x86_64 exec-env/CloudShell exe/x86_64.amzn.2 prompt/off command/ec2.create-key-pair",
    "requestParameters": {
        "keyName": "my-key",
        "keyType": "rsa",
        "keyFormat": "pem"
    },
    "responseElements": {
        "requestId": "9aa4938f-720f-4f4b-9637-EXAMPLE9a196",
        "keyName": "my-key",
        "keyFingerprint": "1f:51:ae:28:bf:89:e9:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f",
        "keyPairId": "key-abcd12345eEXAMPLE",
        "keyMaterial": "<sensitiveDataRemoved>"
    },
    "requestID": "9aa4938f-720f-4f4b-9637-EXAMPLE9a196",
    "eventID": "2ae450ff-e72b-4de1-87b0-EXAMPLE5227cb",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "444455556666",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "ec2.us-east-1.amazonaws.com"
    },
    "sessionCredentialFromConsole": "true"
}

Cloudtrail can be very noisy and contain a lot of events from read-only operations like Describe* and List* events. To filter out those events and focus on what matters, this is an example Drop function configuration.

Cloudtrail drop Describe and List calls

Previous
Data Model