Data models are conceptual maps used in Splunk Enterprise Security to have a standard set of field names for events that share a logical context, such as:
- Malware: antivirus logs
- Performance: OS metrics like CPU and memory usage
- Authentication: log-on and authorization events
- Network Traffic: network activity
When data is normalized to these models and accelerated to speed up search processing, key/value pairs are stored in time-series index (TSIDX) files, which are essentially summaries of the data. Splunk Enterprise Security depends heavily on these accelerated models. When you use
| tstats summariesonly=t in Splunk Enterprise Security searches, you restrict results to accelerated data.
You want to learn best practices for managing data models correctly to get the best performance and results out of your deployment.
Set up your data models
In Splunk Enterprise Security, go to Configure > CIM Setup
- Use the CIM add-on to change data model settings like acceleration, index allow list, and tag allow list.
- Indexes allow list. Improve performance by constraining the indexes that each data model searches. The default is all indexes.
- Tags allow list. Restrict the tag attribute of a data model to specific tag values to improve performance. By default, allow lists use the tags for the child datasets in the data model.
- Enable acceleration for the data model to return results faster for searches, reports, and dashboard panels that reference the data model
Examine data model contents
datamodel command to examine the source types contained in the data model. Easily view each data model’s size, retention settings, and current refresh status.
| tstats count FROM datamodel=Network_Traffic.All_Traffic BY sourcetype
You can also search all events in a data model with the
Audit data models
To determine which data models are using the most storage or processor time, go to Audit > Data Model Audit.
Most Splunk Enterprise Security correlation searches and dashboard searches are based on accelerated data model events. Use the dashboard requirements matrix to determine which data models support each dashboard. The data model names in the dashboard requirements matrix are linked to the data model’s CIM documentation, which you can use to determine the tags, field names and field values your events must use to be CIM-compliant.
If a panel in a dashboard is missing data, click the panel’s Open in Search link to see which data model is used; this can help you understand why the data is missing. Common causes are:
- The data is not in Splunk: install and enable add-ons to input the data.
- The data is present in Splunk but is not normalized correctly.
Custom data models
In addition to the data models available as part of the Common Information Model add-on, Splunk Enterprise Security uses custom data models.
- Assets and Identities.
- Domain Analysis.
- Incident Management.
- Risk Analysis.
- Threat Intelligence. In Splunk Enterprise Security, threat intelligence is downloaded regularly from external and internal sources. The data is parsed into KV store collections with “_intel” suffixes. Those collections are used as lookups during threat generation searches. These searches run by default every five minutes and scan for threat activity related to any of the threat collections. When threat matches are found, events are generated in the threat_activity index and appear in the Threat Intelligence data model. The data model is scanned by the Threat Activity Detected correlation search and new notables for threat activity are created.
- User and Entity Behavior Analytics. Splunk User Behavior Analytics (UBA) is a separate solution that extends your ability to detect insider threats. The User Behavior Analytics (UBA) add-on Splunk_TA_ueba is included in the Splunk Enterprise Security install and allows you to:
- Send threats and anomalies from UBA to ES to adjust risk scores and create notable events
- Send correlation search results from ES to UBA to be processed for anomalies
- Retrieve user and device association data from UBA to view it in ES
You can learn more about all of these data models here.
Accelerated data model storage
In addition to index storage requirements, Splunk Enterprise Security requires space for accelerated data models. Make sure you understand the following:
- Acceleration requires approximately 3.4 x (daily input volume) of additional space per year, or more if replicated in an indexer cluster.
- Example: Input volume of 500 GB per day with one year retention = 500 GB * 3.4 = 1700 GB additional space for accelerated data model storage.
- Space is added across all indexers.
- Example: If there are 5 indexers, 1700 GB / 5 = ~ 340GB per indexer additional space is required.
- The storage volumes allocated for acceleration should be tuned for best performance and replicated if in a cluster.
- By default, acceleration storage is allocated in the same location as the index containing the raw events being accelerated. Use the tstatsHomePath setting in indexes.conf if you need to specify alternate locations for your accelerated storage.
If you found this article useful and want to advance your skills, Splunk Education offers a 13.5-hour, instructor-led course on administering Splunk Enterprise Security. The hands-on labs in the course will teach you how to:
- Examine how Splunk Enterprise Security functions, including data models, correlation searches, notable events, and dashboards
- Create custom correlation searches
- Customize the Investigation Workbench
- Learn how to install or upgrade Splunk Enterprise Security
- Learn the steps to setting up inputs using technology add-ons
- Fine tune Splunk Enterprise Security Global Settings
- Customize risk and configure threat intelligence
Click here for the course catalog where you can read the details about this and other Splunk Enterprise Security courses, as well as register.