Getting GitLab CI/CD data into the Splunk platform

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

This article describes how to send GitLab CI/CD data out of a Gitlab Pipeline into a Splunk platform HTTP Event Collector (HEC) endpoint.

GitLab continuous integration (CI) data can enable DevOps and DevSecOps use cases by unlocking the potential of static code and dependency scanning, secret detection, integration testing, infrastructure management, and other capabilities. More use cases include:

DevOps
- CI/CD
- Build and test software
- Software bill of materials
- Infrastructure management
DevSecOps
- Code scanning
- Secret detection

This example focuses on dependency scanning data, but the same process can be applied to any data in a GitLab pipeline you’d like to get into Splunk platform. This includes infrastructure management logs, CI/CD logs, and secret detection data.

Step 1: GitLab CI variables

Use GitLab CI variables to set your HEC endpoint and HEC token for sending data out of GitLab CI/CD pipelines.

Go to Settings > CI/CD > Variables.
Set the following environment variables:
- SPLUNK_HEC_ENDPOINT: http://<your hec address or ip>:<port>/services/collector/raw (for your HEC ip)
- SPLUNK_HEC_TOKEN: your-splunk-hec-token–goeshere (your HEC token)

Step 2: Pipeline

Use the curl command to send data from a pipeline to the Splunk platform HEC endpoint of your choice. While sending to the HEC, you can also define source and source types for this data.

The following is an example pipeline that runs dependency scanning on a java repository. This pipeline will:

Run dependency scanning.
Make a Software Bill of Materials (SBOM) manifest.
Create a separate SBOM for maven dependencies.

stages:
- test
- cleanup

dependency_scanning:
  stage: test
  artifacts:
    reports:
      dependency_scanning: gl-dependency-scanning-report.json
    when: always  
  after_script:
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @gl-dependency-scanning-report.json'
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @sbom-manifest.json'
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}"  -d @app/gl-sbom-maven-maven.cdx.json'
    
include:- template: Jobs/Dependency-Scanning.gitlab-ci.yml

Step 3: Source type settings

Update your source type gitlab_json to stop events from being truncated. Choose a value that fits your needs forTRUNCATE.

[gitlab_json]
DATETIME_CONFIG =
INDEXED_EXTRACTIONS = json
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
TRUNCATE = 5550000
disabled = false

Each run of the following pipeline creates three events in the Splunk platform: one for the CycloneDX SBOM (one of the common Software Bill of Material (SBOM) formats), one for the Maven SBOM, and one for the dependency scanning.

Step 4: Advanced configuration using jq

Now you've got data in from GitLab CI/CD, you might still need to add some important context such as:

Repository Name
GitLab Project Namespace
Pipeline ID
Build ID

This information is available in each pipeline’s predefined environment variables. Use jqyou to add this information to your json files before sending them to the Splunk platform.

jq '.  += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | .  += {"build_id":"'"$CI_CONCURRENT_ID"'"} | .  += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | .  += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c input-file.json > output-file.json

Your example configuration now looks like:

stages:
- test
- cleanup

dependency_scanning:
  stage: test
  artifacts:
    reports:
      dependency_scanning: gl-dependency-scanning-report.json
    when: always  
  after_script:
    - apk add --update curl jq && rm -rf /var/cache/apk/*
    - jq '.  += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | .  += {"build_id":"'"$CI_CONCURRENT_ID"'"} | .  += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | .  += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c gl-dependency-scanning-report.json > dependency-scanning-report.json
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @dependency-scanning-report.json'
    - jq '.  += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | .  += {"build_id":"'"$CI_CONCURRENT_ID"'"} | .  += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | .  += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c sbom-manifest.json > sbom-manifest-enriched.json
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @sbom-manifest-enriched.json'
    - jq '.  += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | .  += {"build_id":"'"$CI_CONCURRENT_ID"'"} | .  += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | .  += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c app/gl-sbom-maven-maven.cdx.json > app/sbom-maven-maven.cdx.json
    - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}"  -d @app/sbom-maven-maven.cdx.json'

include:
- template: Jobs/Dependency-Scanning.gitlab-ci.yml

You should now see entries for build_id, pipeline_id, repository_name, and repository_organization on your CI/CD events as below:

Extra Credit with `jq`

Further uses of jq can be leveraged to minimize the amount of work required in the Splunk platform to parse certain events. In the case of dependency scanning, vulnerabilities are returned in a complex json object that requires using commands like spath and mvexpand to leverage as individual reported vulnerabilities. The example jq command below splits that dependency log into individual events for each vulnerability (based on .vulnerabilities[]) before sending each of those events to the Splunk platform HEC:

jq '{ version: .version, build_id: .build_id, pipeline_id: .pipeline_id, repository_organization: .repository_organization, repository_name: .repository_name, dependency_files: .dependency_files[], scan: .scan, vulnerability: .vulnerabilities[]}' -c dependency-scanning-report.json > gl-dependency-scanning-report.json

Step 1: GitLab CI variables

Step 2: Pipeline

Step 3: Source type settings

Step 4: Advanced configuration using jq

Extra Credit with jq

Extra Credit with `jq`