Algorithmically generated domain names
Information entropy allows you to determine how much randomness is present in a string, and randomness in a URL is often an indicator of a malicious site. You hypothesize that malware in your network is using randomized domain names to communicate with other malicious infrastructure. You might want to see what unusually random domains are communicating across your network.
Required data
Firewall data. This sample search uses Palo Alto Networks data. You can replace this source with any other firewall data used in your organization.
Procedure
Run the following search. You can optimize it by specifying an index and adjusting the time range.
sourcetype=pan:threat url=* |where length(ut_domain)>10 | stats count values(src_ip) AS src_ip BY url | eval list="mozilla" | `ut_parse_extended(url, list)` | `ut_shannon(ut_domain)` | stats count perc90(ut_shannon) AS perc90_sha values(ut_domain) AS domain_samples BY src_ip | search perc90_sha>3.5 |eval domain_samples = mvindex(domain_samples, 1,5) |sort - count
Search explanation
The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.
Splunk Search | Explanation |
---|---|
|
Search only threat events from Palo Alto Networks data. |
|
Search data with a value in the |
|
Show the event count for each value in the |
|
Search the Mozilla catalog for top level domains. This |
|
Parse the URLs based on the Mozilla top level domain list. The punctuation surrounding a Splunk macro is always a back tick (`), not a single quote ('). |
|
Calculate the entropy score for each URL. The punctuation surrounding a Splunk macro is always a back tick (`), not a single quote ('). |
|
Count the number of times each source IP address appeared. Calculate the Shannon entropy value at the 90th percentile for the URLs from each source IP address and display it in a column called perc90_sha. Return the values in the URL (ut_domain) field for each source IP address and display them in a column called domain_samples. |
|
Look for URLs whose entropy score is greater than 3.5. |
|
Return five domains for each multivalue |
|
Sort the table with the most commonly occurring source IP address first. |
Next steps
The results show highly random domain names accessed on your network. You can investigate any of the domains, or set up alerts for a certain Shannon Entropy threshold so you know when users are accessing sites that have a high probability of being malicious.
Finally, you might be interested in other processes associated with these use cases: