Getting GitLab CI/CD data into the Splunk platform
This article describes how to send GitLab CI/CD data out of a Gitlab Pipeline into a Splunk platform HTTP Event Collector (HEC) endpoint.
GitLab continuous integration (CI) data can enable DevOps and DevSecOps use cases by unlocking the potential of static code and dependency scanning, secret detection, integration testing, infrastructure management, and other capabilities. More use cases include:
- DevOps
- CI/CD
- Build and test software
- Software bill of materials
- Infrastructure management
- DevSecOps
- Code scanning
- Secret detection
This example focuses on dependency scanning data, but the same process can be applied to any data in a GitLab pipeline you’d like to get into Splunk platform. This includes infrastructure management logs, CI/CD logs, and secret detection data.
Step 1: GitLab CI variables
Use GitLab CI variables to set your HEC endpoint and HEC token for sending data out of GitLab CI/CD pipelines.
- Go to Settings > CI/CD > Variables.
- Set the following environment variables:
SPLUNK_HEC_ENDPOINT:
http://<your hec address or ip>:<port>/services/collector/raw (for your HEC ip)SPLUNK_HEC_TOKEN:
your-splunk-hec-token–goeshere (your HEC token)
Step 2: Pipeline
Use the curl
command to send data from a pipeline to the Splunk platform HEC endpoint of your choice. While sending to the HEC, you can also define source and source types for this data.
The following is an example pipeline that runs dependency scanning on a java repository. This pipeline will:
- Run dependency scanning.
- Make a Software Bill of Materials (SBOM) manifest.
- Create a separate SBOM for maven dependencies.
stages: - test - cleanup dependency_scanning: stage: test artifacts: reports: dependency_scanning: gl-dependency-scanning-report.json when: always after_script: - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @gl-dependency-scanning-report.json' - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @sbom-manifest.json' - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @app/gl-sbom-maven-maven.cdx.json' include:- template: Jobs/Dependency-Scanning.gitlab-ci.yml
Step 3: Source type settings
Update your source type gitlab_json
to stop events from being truncated. Choose a value that fits your needs forTRUNCATE
.
[gitlab_json] DATETIME_CONFIG = INDEXED_EXTRACTIONS = json LINE_BREAKER = ([\r\n]+) NO_BINARY_CHECK = true category = Custom pulldown_type = 1 TRUNCATE = 5550000 disabled = false
Each run of the following pipeline creates three events in the Splunk platform: one for the CycloneDX SBOM (one of the common Software Bill of Material (SBOM) formats), one for the Maven SBOM, and one for the dependency scanning.
Step 4: Advanced configuration using jq
Now you've got data in from GitLab CI/CD, you might still need to add some important context such as:
- Repository Name
- GitLab Project Namespace
- Pipeline ID
- Build ID
This information is available in each pipeline’s predefined environment variables. Use jqyou
to add this information to your json files before sending them to the Splunk platform.
jq '. += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | . += {"build_id":"'"$CI_CONCURRENT_ID"'"} | . += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | . += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c input-file.json > output-file.json
Your example configuration now looks like:
stages: - test - cleanup dependency_scanning: stage: test artifacts: reports: dependency_scanning: gl-dependency-scanning-report.json when: always after_script: - apk add --update curl jq && rm -rf /var/cache/apk/* - jq '. += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | . += {"build_id":"'"$CI_CONCURRENT_ID"'"} | . += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | . += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c gl-dependency-scanning-report.json > dependency-scanning-report.json - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @dependency-scanning-report.json' - jq '. += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | . += {"build_id":"'"$CI_CONCURRENT_ID"'"} | . += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | . += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c sbom-manifest.json > sbom-manifest-enriched.json - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @sbom-manifest-enriched.json' - jq '. += {"pipeline_id":"'"$CI_PIPELINE_ID"'"} | . += {"build_id":"'"$CI_CONCURRENT_ID"'"} | . += {"repository_organization":"'"$CI_PROJECT_NAMESPACE"'"} | . += {"repository_name":"'"$CI_PROJECT_NAME"'"}' -c app/gl-sbom-maven-maven.cdx.json > app/sbom-maven-maven.cdx.json - 'curl -vvv "${SPLUNK_HEC_ENDPOINT}?source=gitlab_cicd&sourcetype=gitlab_json" -H "Authorization: Splunk ${SPLUNK_HEC_TOKEN}" -d @app/sbom-maven-maven.cdx.json' include: - template: Jobs/Dependency-Scanning.gitlab-ci.yml
You should now see entries for build_id
, pipeline_id
, repository_name,
and repository_organization
on your CI/CD events as below:
Extra Credit with jq
Further uses of jq
can be leveraged to minimize the amount of work required in the Splunk platform to parse certain events. In the case of dependency scanning, vulnerabilities are returned in a complex json object that requires using commands like spath
and mvexpand
to leverage as individual reported vulnerabilities. The example jq
command below splits that dependency log into individual events for each vulnerability (based on .vulnerabilities[]
) before sending each of those events to the Splunk platform HEC:
jq '{ version: .version, build_id: .build_id, pipeline_id: .pipeline_id, repository_organization: .repository_organization, repository_name: .repository_name, dependency_files: .dependency_files[], scan: .scan, vulnerability: .vulnerabilities[]}' -c dependency-scanning-report.json > gl-dependency-scanning-report.json