Ingesting Google Cloud data into Splunk using command line programs
If you are ingesting Google Cloud data into your environment, you're likely familiar with the Splunk Add-on for Google Cloud Platform and the Splunk Dataflow template. Both of these solutions are great for moving logs via Pub/Sub into Splunk. But there are ways of extracting other non-logging Google Cloud data and quickly shipping it to the Splunk platform using common command-line tools. You can then leverage this data to gain useful insights about your Google Cloud environment.
This article follows the Unix philosophy of "do one thing and do it well" by showing you how to use small single-purpose tools, then how to combine them to accomplish more complex tasks and gain useful insights about your Google Cloud environment.
First, read the prerequisites, and the tools you'll need to use. Then, learn the approach you'll use to retrieve a list of assets in a specified GCP project or organization. You can then use this same approach to:
- Export Google's IAM Recommender findings and send them to a Splunk HEC
- List all the static and reserved IP addresses in a project
- List all the SSL/TLS certificates that have been created for use with Google Cloud Load Balancer and Cloud CDN
- List all the virtual machine instances that have been created within a project
- List all the snapshots of persistent disks that have been created within a project
- List all the network routes that have been created within a project
- List all the firewall rules that have been created within a project
- List all the virtual networks that have been created within a project
- Retrieve VPC flow logs from a GCS bucket using a federated query
Finally, this article shows you how to develop some real-world investigative use cases that leverage this data.
Prerequisites
To follow along with the examples in this article, you need to have the Google Cloud CLI installed as it provides the gcloud and bq commands. In addition, you need to have both jq and curl installed. These utilities can be downloaded and installed directly or obtained through common package managers such as homebrew, yum, or apt. Alternatively, you can use an environment such as the Google Cloud Shell, which comes pre-installed with the necessary tools.
You will also need a Splunk Cloud Platform or Splunk Enterprise environment configured with an HTTP Event Collector (HEC) token and an index for the data.
- Export the HEC token to the shell environment, as shown below.
export HEC_TOKEN=<TOKEN> - You'll also need to set the full HEC URL. For example:
export HEC_URL=https://<ADDRESS>:8088/services/collector/eventFor more information on how to construct an HEC URL for Splunk Cloud Platform trials, Splunk Cloud Platform on AWS, and Splunk Cloud Platform on GCP, see Set up and use HTTP Event Collector in Splunk Web.
- Set an existing destination index name, as shown below. Some of the commands might contain timestamps that are quite old, so ensure the index lifetime is generous enough to avoid aging out data immediately. You should use a dedicated index for the purposes of this process.
export SPLUNK_INDEX=gcp-data - To set
curlarguments such as-kor--insecurefor untrusted certificate environments, export theCURL_ARGSvariable as shown below.export CURL_ARGS=-k - Set an explicit Google Cloud project. You can find a list of projects using the
gcloud projects list --format=flattenedcommand. For example:gcloud projects list --format=flattened --- createTime: 2022-01-12T21:34:27.797Z lifecycleState: ACTIVE name: <REDACTED> parent.id: <REDACTED> parent.type: organization projectId: abcd-123456 projectNumber: <REDACTED> --- createTime: 2016-11-29T18:10:06.711Z labels.firebase: enabled lifecycleState: ACTIVE name: <REDACTED> projectId: my-project-123456 projectNumber: <REDACTED> - Look for the
projectIdvalue and set it as an environment variable, as shown below.export GCP_PROJECT=<PROJECT_ID>
Tools
The following is a summary of the tools used throughout the examples:
gcloudis a command-line tool that allows users to manage and interact with GCP resources and services. It is included in the Google Cloud CLI.bqallows interacting with BigQuery, which is GCP's fully-managed, serverless data warehouse. It is also included in the Google Cloud CLI.jqis likesedbut for working with JSON data. It is commonly used to parse, filter, and manipulate JSON data from the command line.splitbreaks a file into smaller files. It is part of the "GNU Core Utilities" package and is usually available by default on Unix-like systems.curlis a command-line tool for transferring data using various protocols, primarily used for making HTTP requests to retrieve or send data to web servers.
Examples
Each of these examples follows roughly the same approach:
- The data is extracted using
gcloud. - The output is parsed and enriched using
jqto create a payload suitable for sending to a HEC endpoint. - After it is formatted for HEC,
curlis invoked to deliver the data to the Splunk platform. - In cases where the output of
gcloudcould be large, introducesplitto break data into chunks.
Retrieve a list of assets in a specified GCP project or organization
- The
gcloud asset listcommand can be used to retrieve a list of assets in a specified GCP project or organization. To handle large asset lists, break them into smaller files using thesplitcommand before sending them to a Splunk HEC.mkdir assets && cd $_; gcloud asset list --project ${GCP_PROJECT} --format=json | jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '{"host": $host, "source": "gcloud", "sourcetype": "google:gcp:asset", "index": $index, "event": .[]}' | split - assets- ; for FILE in *; do echo processing ${FILE}; curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @${FILE}; doneThis creates a directory called
assetsand switches into it, assuming creation is successful.gcloud asset list --project ${GCP_PROJECT} --format=json - Break the commands into components to better understand each step of the pipeline.
mkdir assets && cd $_This invokes
gcloudand requests a list of assets. Using the--format=jsonparameter returns the results as JSON. The results are returned as a list of dictionary objects, where each item in the list is an asset. Here is some example data returned from this command.[ { "ancestors": [ "projects/<REDACTED>" ], "assetType": "apikeys.googleapis.com/Key", "name": "//apikeys.googleapis.com/projects/<REDACTED>/locations/global/keys/<REDACTED>", "updateTime": "2022-10-18T09:15:12.026452Z" }, { "ancestors": [ "projects/<REDACTED>" ], "assetType": "appengine.googleapis.com/Application", "name": "//appengine.googleapis.com/apps/<REDACTED>", "updateTime": "2022-10-21T02:43:20.551Z" }, ... { "ancestors": [ "projects/<REDACTED>" ], "assetType": "storage.googleapis.com/Bucket", "name": "//storage.googleapis.com/us.artifacts.<REDACTED>.appspot.com", "updateTime": "2022-10-22T00:35:56.935Z" } ] - Pipe this lengthy output into
jq, iterate through each item in the JSON list, and output each individual item as a separate, new-line delimited JSON structure.jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '{"host": $host, "source": "gcloud", "sourcetype": "google:gcp:asset", "index": $index, "event": .[]}'This
jqcommand uses the-cflag to ensure each JSON object appears on a single line. A variable namedhostis set to the local system hostname. Additionally, a variable namedindexis set to the Splunk index name you previously set viaexportin the prerequisites section of this article. - Provide the scaffolding for a HEC-compliant data structure with a
host,source,sourcetype,index, andeventfield. The.[]portion of the JSON payload tellsjqto iterate through the list of items in the input stream and apply the transformation across each item.The final result is a new-line delimited collection of JSON objects, as seen below. You can see each line is a distinct HEC event message and JSON data structure.
{"host":"cs-<REDACTED>-default","source":"gcloud","sourcetype":"google:gcp:asset","index":"gcp-data","event":{"ancestors":["projects/<REDACTED>"],"assetType":"apikeys.googleapis.com/Key","name":"//apikeys.googleap is.com/projects/<REDACTED>/locations/global/keys/<REDACTED>","updateTime":"2022-10-18T09:15:12.026452Z"}} {"host":"cs-<REDACTED>-default","source":"gcloud","sourcetype":"google:gcp:asset","index":"gcp-data","event":{"ancestors":["projects/<REDACTED>"],"assetType":"appengine.googleapis.com/Application","name":"//appeng ine.googleapis.com/apps/<REDACTED>","updateTime":"2022-10-21T02:43:20.551Z"}} ... {"host":"cs-<REDACTED>-default","source":"gcloud","sourcetype":"google:gcp:asset","index":"gcp-data","event":{"ancestors":["projects/<REDACTED>"],"assetType":"storage.googleapis.com/Bucket","name":"//storage.googl eapis.com/us.artifacts.<REDACTED>.appspot.com","updateTime":"2022-10-22T00:35:56.935Z"}} - Since asset lists are normally quite lengthy, split this series of new-line delimited JSON into separate file chunks. This can be accomplished using the
splitcommand.split - assets- - By supplying
-as the filename,splitwill read the stdin and output chunks to filenames with names whose prefix begins withassets-. For example:ls -al total 1528 drwxr-xr-x 2 mhite mhite 4096 Jan 31 18:26 . drwxr-xr-x 35 mhite 1001 4096 Jan 31 18:26 .. -rw-r--r-- 1 mhite mhite 407298 Jan 31 18:26 assets-aa -rw-r--r-- 1 mhite mhite 458000 Jan 31 18:26 assets-ab -rw-r--r-- 1 mhite mhite 458000 Jan 31 18:26 assets-ac -rw-r--r-- 1 mhite mhite 226798 Jan 31 18:26 assets-ad - Iterate through each file in the current directory and send the contents as a batch to the HEC endpoint.
for FILE in *; do echo processing ${FILE}; curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @${FILE}; doneThis final series of commands loops through each file, outputs a message to the console indicating the file is being processed, and then invokes
curlto POST a new-line delimited batch of event messages to a destination HEC URL.
These steps are the basic recipe for most of the following examples. You can refer back to this initial example for the general approach being used.
Export Google's IAM Recommender findings and send them to a Splunk HEC
Google's IAM Recommender service analyzes an organization's Identity and Access Management (IAM) policies and recommends actions to help improve the security posture. For example, it can spot issues such as over-privileged roles and users.
Use the following command to export recommender findings and send them to a Splunk HEC:
gcloud recommender insights list --project=${GCP_PROJECT} --insight-type=google.iam.policy.Insight --location=global --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:recommender:insight", "index": $index, "time": (.lastRefreshTime | fromdateiso8601), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
Notice the (.lastRefreshTime | fromdateiso8601) portion of the jq command. This allows you to read the lastRefreshTime field from the input stream and convert it from an ISO-8601 timestamp into an epoch timestamp. You'll then assign this to the time field of the HEC event.
List all the static and reserved IP addresses in a project
The command gcloud compute addresses list lists all the static and reserved IP addresses in a project.
gcloud compute addresses list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:address", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
List all the SSL/TLS certificates that have been created for use with Google Cloud Load Balancer and Cloud CDN
The command gcloud compute ssl-certificates list lists all the SSL/TLS certificates that have been created for use with Google Cloud Load Balancer and Cloud CDN.
gcloud compute ssl-certificates list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:certificate", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
List all the virtual machine instances that have been created within a project
The command gcloud compute instances list lists all the virtual machine instances that have been created within a project. We will leverage the split command again as this can be quite an extensive list.
mkdir instances && cd $_; gcloud compute instances list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:instance", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
split - instances- ; for FILE in *; do echo processing ${FILE}; curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @${FILE}; done
List all the snapshots of persistent disks that have been created within a project
The command gcloud compute snapshots list lists all the snapshots of persistent disks that have been created within a project.
gcloud compute snapshots list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:snapshot", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
List all the network routes that have been created within a project
The command gcloud compute routes list lists all the network routes that have been created within a project.
gcloud compute routes list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:route", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
List all the firewall rules that have been created within a project
The command gcloud compute firewall-rules list lists all the firewall rules that have been created within a project.
gcloud compute firewall-rules list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:firewall", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
List all the virtual networks that have been created within a project
The command gcloud compute networks list lists all the virtual networks that have been created within a project.
gcloud compute networks list --project=${GCP_PROJECT} --format=json |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:network", "index": $index, "time": (.creationTimestamp | sub("\\.[0-9]{3}"; "") | strptime("%Y-%m-%dT%H:%M:%S%z") | mktime), "event": .}' |
curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @-
Retrieve VPC flow logs from a GCS bucket using a federated query
BigQuery is a fully-managed, serverless data warehouse that allows you to run SQL-like queries on large datasets. One of the features of BigQuery is the ability to federate queries to other data sources such as S3, GCS, or Azure Blob Storage. The following example shows how to retrieve VPC flow logs from a GCS bucket by way of a federated query.
It's important to note that when you use federated queries, you incur additional costs and latency. Additionally, BigQuery charges on a bytes-scanned model, so only perform this example against a small data set. No one likes surprise cloud bills!
mkdir flow && cd $_; bq query --format=json --nouse_legacy_sql 'SELECT * FROM `mh_bq_test.flows`' |
jq -c --arg host $(hostname) --arg index ${SPLUNK_INDEX} '.[] | {"host": $host, "source": "gcloud", "sourcetype": "google:gcp:vpc:flow", "index": $index, "time": (.timestamp | strptime("%Y-%m-%d %H:%M:%S")| mktime), "event": .}' |
split - flow- ; for FILE in *; do echo processing ${FILE}; curl ${CURL_ARGS} ${HEC_URL} -H "Authorization: Splunk ${HEC_TOKEN}" --data-binary @${FILE}; done
To learn more about setting up external tables in BigQuery, see the Google Cloud documentation.
More ideas for insights from this data
So far, you've seen how single-purpose tools like gcloud, jq, and curl can be used together to bring data from Google Cloud into Splunk platform. However, the ultimate goal of transferring data to Splunk platform is to gain insights from it. Let's consider some real-world investigative use cases that leverage this data.
Access a list of virtual machines along with who created each one
Assuming you are also ingesting Google Cloud audit logs, you can enrich the data you collected in the previous examples with related data from the cloud audit logs. The following SPL (Search Processing Language) can achieve this.
(index=gcp-data sourcetype="google:gcp:instance") OR (index="gsa-gcp-log-index" protoPayload.methodName="v1.compute.instances.insert" OR protoPayload.methodName="beta.compute.instances.insert")
| eval resourceName=case(index="gcp-data",mvindex(split('selfLink',"https://www.googleapis.com/compute/v1/"),1),index="gsa-gcp-log-index",'protoPayload.resourceName')
| eval creator=case(index="gsa-gcp-log-index",'protoPayload.authenticationInfo.principalEmail')
| stats values(index) values(creator) by resourceName
| rename values(*) -> *
| where index="gcp-data" AND index="gsa-gcp-log-index"
| fields - index
Assuming gsa-gcp-log-index contains audit logs, this query performs the equivalent of an INNER JOIN against virtual machine names from your instance list (sourcetype="google:gcp:instance") and the audit log recording the machine's creation event. After the join is performed, you can display a list of current virtual machines alongside their creator email address.
This query assumes you have audit logs that go back far enough to find the initial API call used to create a machine.

Establish a record of API calls made by an account listed in an IAM Recommender "Permission Usage" finding
The SPL shown below provides a summary of API calls made by these accounts during a designated time frame. By presenting this information alongside accounts under investigation, you can gain insight into their purpose or typical behavior.
(index="gcp-data" sourcetype="google:gcp:recommender:insight" insightSubtype=PERMISSIONS_USAGE) OR (index="gsa-gcp-log-index")
| eval account=case(index="gcp-data",mvindex(split('content.member',":"),1),index="gsa-gcp-log-index",'protoPayload.authenticationInfo.principalEmail')
| eval methods=case(index="gsa-gcp-log-index",'protoPayload.methodName')
| stats values(index) values(methods) by account
| rename values(*) -> *
| where index="gcp-data" AND index="gsa-gcp-log-index"
| fields - index

Other potential use cases
- Find other interesting
gcloudcommands with "list" options. - To facilitate multiple runs over time, create timestamp checkpoint files and compare against
creationTimestampgcloudfields to avoid duplicates in an index. - Load data into a lookup table for better use within the Splunk platform.
- Find a solution to keep fractional timestamps intact during
jqextraction. - Consider using alternative ingestion methods, like writing new-line delimited JSON to disk and using a Universal Forwarder or OpenTelemetry Agent to send to Splunk, instead of HEC.
- Try these same techniques with other cloud provider CLIs such as
aws-cli,az,linode-cli, anddoctl.
Use the examples as inspiration for your own use cases.
Additional resources
These resources might help you understand and implement this guidance:
- JQ: Reference manual
- JQ: Cheat sheet
- Splunk Help: Format events for HTTP Event Collector

