Validating data source integrity
This task analyzes each data input for Splunk User Behavior Analytics (UBA) to ensure the integrity of the log data being ingested and the overall health of the input process itself. This procedure is repeated for each data source that is configured for UBA at every service interval. This procedure is valid as of UBA version 5.3.0.
This article is part of the Splunk User Behavior Analytics Owner's Manual, which describes the recommended ongoing maintenance tasks that the owner of a UBA implementation should ensure are performed to keep their implementation functional. To see more maintenance tasks, click here to see the complete manual.
Why is this important?
Splunk User Behavior Analytics is powered by log data ingested from Splunk Enterprise. UBA receives specifically formatted log files from Splunk Enterprise and then translates them into its proprietary databases for machine learning analysis. The consistency of formatting of these log files is critical, as any change to the formatting of ingested logs can negatively impact the effectiveness of the UBA machine learning models.
In addition to the formatting of logs, the overall integrity of the ingested data is important to maintaining a healthy UBA environment. As UBA relies on Splunk Enterprise for ingest, any change to details such as index name, source type, etc. for log data within Splunk Enterprise can negatively impact the models within UBA and render UBA output meaningless.
Because of this, it is important to regularly check that the data sources for UBA are formatted correctly and working properly to ensure the continued reliability of the UBA system.
Schedule
Every week
Prerequisites
An admin account is required to access the required data input settings to perform this task.
Notes and warnings
It is important to continuously monitor changes to the Splunk data landscape, even in the interim periods between integrity validation checks. It is always recommended that UBA admins maintain a continuous dialogue with Splunk Enterprise admins to stay abreast of any changes to data housed within Splunk Enterprise that is consumed by UBA.
Procedure
Step 1: Check data source integrity
- On the main page of your UBA environment, in the menu bar at the top right of the page, click Manage, then select Data Sources from the drop-down menu that appears.
- On the Data Sources page, check for any data sources that have a status of 'Failed' in the data sources list. Any data source that has a failed status should be investigated and remediated to ensure the proper function of the UBA cluster.
If you require assistance with this activity, contact your Splunk account team to engage Professional Services support.
- On the Data Sources page, check for any data sources that have a status of 'Stopped' in the data sources list and perform the following steps for them:
- Verify that the data source has been stopped intentionally. If it has been intentionally stopped, consider deleting it if it is no longer being used.
- If the data source has not been stopped intentionally, investigate to determine the cause of the stoppage and then remediate to ensure the proper function of the UBA cluster.
If you require assistance with this activity, contact your Splunk account team to engage professional services support.
- On the Data Sources page, perform the following steps for each data source that has a status of 'Processing':
- In the Data Sources table, click the line for the individual data source.
- On the Data Sources details page, look for a panel for 'Skipped Events'. If there is a 'Skipped Events' Panel, calculate the skip ratio for the data source by dividing the number presented in the 'Skipped Events' panel by the number presented in the 'Events' panel and multiplying the resulting number by 100. If the result of this calculation is greater than 30, record the name of the data source for remediation at the end of this activity.
A number greater than 30 means that over 30 percent of events for the data source are being skipped by UBA, which might affect the accuracy of detections.
- Take the URL presented in the 'URL' panel and verify that it points to the correct Splunk instance. This panel should contain the URL of a Splunk search head that can be accessed by the UBA Cluster node. If the URL is incorrect, record the name of the data source for remediation at the end of this activity.
UBA uses a Splunk search head to ingest data. If the incorrect URL is used, data might be missing from UBA.
- Copy the SPL query presented in the 'Splunk Query' panel and run that search on the Splunk search head presented in the URL panel. Verify that the search returns results from the correct source type, as expected based on the UBA data source name. If the search results are incorrect, record the name of the data source for remediation at the end of this activity.
UBA runs searches using the SPL noted in the Splunk Query panel. If this query is not correct, it will affect data ingestion.
- Hit the back button to return to the main Data Sources page.
- After the previous steps have been completed for all data sources, move on to the next step.
Step 2: Check identity collection integrity
- On the Data Sources page, review the list of data sources and find the line that has the words 'HR Data' in the Type column. Click that line.
- On the Data Source details page for the HR data source, copy the query that is presented in the Query panel and run that search on the Splunk search head presented on the URL panel.
- Review the results of the search. Consult with your HR team to identify an employee that has started recently, and review the search results to verify that the HR record for that employee is present within the search results. If the new employee record is not present in the search results, record the name of the HR Data source for remediation.
Step 3: Remediation
Delete all data sources identified for remediation. Then, recreate them based on the best practice instructions provided in Get data into Splunk UBA.
If you require assistance with this activity, contact your Splunk account team to engage professional services support.
Next steps
These resources might help you understand and implement this guidance:
- Splunk Docs: Administer Splunk User Behavior Analytics
- Product Tip: Splunk User Behavior Analytics Owner's Manual