Avoiding common pitfalls for getting data in

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Getting data into the Splunk platform is straightforward most of the time, but there are common pitfalls that can be avoided if you know about them. This article discusses three common ones.

Configure the HTTP Event Collector (HEC) input timestamp

The Splunk platform processes data coming in via HEC a little differently than it does a regular file input. This is related to how data is processed by the HEC data processing pipelines and the HEC endpoint used to send data to the Splunk platform.

If you use the raw HEC endpoint, it’ll be easy enough to parse the timestamp fields as you would any other source type (that is, by using the TIME_PREFIX and TIME_FORMAT settings).

When using the event HEC endpoint, pay attention to the placement of the time field for a given event. In this case, if your time field lives in the events sent to the Splunk platform and not in the HEC payload, the Splunk platform might default to using the time of ingestion instead of the time of the event itself. Consider the following HEC payload:

{
  "event": { 
    "message": "Something happened",
    "severity": "INFO”,
    "time": 1437522387
  }
}

The above event will be indexed using the time of ingestion instead of the time of the event. You’ll need to set the time field at the HEC payload level, or use the raw endpoint to ensure ease of timestamp parsing.

Alternatively, you can use INGEST_EVAL to set the _time field. The following is an example of how you might accomplish that for the example event above:

INGEST_EVAL=_time=strptime(replace(_raw,".+time\":\s\"(\d+),"\1"),"%s")

Additional information on the INGEST_EVAL command can be found here.

Share configurations system-wide

When creating custom configurations for your source types, make sure you share them at the appropriate level; otherwise things might not work correctly.

The following screenshot shows the permissions settings in the UI, but you can also set them using the Splunk metadata config files.

Index-time versus search-time field extractions

You can configure the Splunk platform to extract fields at search time or at index time. However, if you configure both, the Splunk platform will extract fields twice, leading to double the values in your search results, as shown here.

To avoid this, make sure you configure field extractions only at search time (which is generally preferable) or only at index time. This will apply on a per-source type basis.

For a given source type, in props.conf, use either:

INDEXED_EXTRACTIONS = JSON
KV_MODE = none

INDEXED_EXTRACTIONS = 
KV_MODE = json

Next steps

These additional Splunk resources might help you get data into your Splunk deployment more easily:

Flowchart: Splunk HTTP Event Collector (HEC) Pipelines
.Conf Talk: Data onboarding: Where do I begin?
Community Thread: Best practices for defining source types
Splunk Lantern: Data Descriptors
Product Tip: Configuring new source types
Product Tip: Getting to know your data
Product Tip: Improving data onboarding with props.conf configurations