Writing better searches with the Common Information Model
Your organization uses dozens of different software vendors, and there is no consistency among field names and their meanings in the data that comes from the vendors' logs. The variety makes writing searches to analyze similar data types when looking for security events very difficult. You need a a way to normalize your data to match a common standard and simplify your searches.
Solution
The Splunk Common Information Model (CIM) is a semantic model focused on extracting values from data. It is a taxonomy schema that allows you to map vendor fields to common fields that are the same for each data source in a given domain. Essentially, when you add your data through a supported technical add-on (TA), it acts as a translator from vendor language to Splunk language. The CIM is implemented as an add-on that can be downloaded from Splunkbase. It provides the following:
- Data models. These are predefined domains of interest, such as endpoint or authentication, that map to your data.
- Data normalization. Each domain has assigned fields and tags to normalize data at search time.
- Data sets. These are specific subsets of the data models, such as privileged escalation authentication, which is a subset of the authentication data model.
- Data model acceleration. The CIM creates a summary index for data, which speeds up searches when compared to searching across raw data.
How does the CIM help you create better searches?
- Faster. Data model acceleration uses summary indexes for data, which speed up searches. Searches are also more efficient, as shown in the table below.
- Easier. With CIM, you can query across multiple source types simultaneously, which is especially helpful if you don't know which source you need.
- Accurate. There are fewer operators and fields, which makes your searches less error prone and less likely to waste analysts' time with false positives.
- Less Work. Using common fields across all sources means that you have less content to write and maintain since you no longer need to worry about vendor-specific naming conventions.
- Complete. The CIM covers all data sources that have associated, Splunk-supported TAs. To map data correctly to the CIM, the data must be ingested through these add-ons.
- Expanded Deployments. In Splunk Enterprise Security, the majority of detections use data-model based searches. If you want to upgrade to Splunk Enterprise Security, you should have the CIM.
Sample Search | Without CIM | With CIM |
---|---|---|
Blocked malware search | (sourcetype=symantec:ep:* “Virus found” AND |
(tag=malware tag=attack action=blocked) |
Windows process started search |
|
(tag=processes tag=report action=allowed) |
When will the CIM not improve my searches?
Not all events should be mapped to a data model in the CIM. Here are a few reasons you wouldn't map an event:
- Field extractions at search time have a cost, and not all events fit the data model definitions to contribute to threat detections. By determining what those unusable events are and not mapping them to data models and extracting fields, you can improve search performance.
- Not all data sources have enough fields that are mappable to a CIM domain. Some data sources are too product-specific.
- Some events have a lack of semantic value. For example, the following are very noisy events that are often disabled by administrators::
Sysmon Event ID 7: Image loaded
gws_login_verification: actor was presented with login verification
- If a use case is data-source specific or unique to a single product, it isn't applicable to a common model.
How to test your CIM implementation
You can only improve your searches if the CIM is working correctly. The Pytest Splunk add-on is a dynamic test tool for Splunk technical add-ons. You can download it from GitHub and use it to check the following:
- Each event that is mapped to a data model includes the fields required for that data model. Fields can be required, recommended, or optional.
- You have the right format in a given field. For example, it can check that an IP address field contains an IPv4 or IPv6 address and not a string.
Next steps
Now that you understand the basics of the Common Information Model and how to can improve your searches, watch the full demo in this .Conf22 Talk (Finding Threats Better With Splunk® Common Information Model (CIM) in Your Searches and Custom Add-ons). Then, download the add-on and get started in your deployment.
These additional Splunk resources might help you understand and implement this product tip:
- Splunk Docs: Common Information Model Add-on Manual
- Splunk Add-on: Common Information Model (CIM)
- Pytest Splunk Add-on: Documentation
- Product Tip: Normalizing values to a common field name with the Common Information Model (CIM)