Algorithmically generated domain names

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Information entropy allows you to determine how much randomness is present in a string, and randomness in a URL is often an indicator of a malicious site. You hypothesize that malware in your network is using randomized domain names to communicate with other malicious infrastructure. You might want to see what unusually random domains are communicating across your network.

Required data

Firewall data. This sample search uses Palo Alto Networks data. You can replace this source with any other firewall data used in your organization.

Procedure

Run the following search. You can optimize it by specifying an index and adjusting the time range.

sourcetype=pan:threat url=*
|where length(ut_domain)>10
| stats count values(src_ip) AS src_ip BY url
| eval list="mozilla"
| `ut_parse_extended(url, list)`
| `ut_shannon(ut_domain)`
| stats count perc90(ut_shannon) AS perc90_sha values(ut_domain) AS domain_samples BY src_ip
| search perc90_sha>3.5
|eval domain_samples = mvindex(domain_samples, 1,5)
|sort - count

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search	Explanation
`sourcetype=pan:threat`	Search only threat events from Palo Alto Networks data.
`url=*`	Search data with a value in the `url` field.
`\| stats count values(src_ip) AS src_ip BY url`	Show the event count for each value in the `src_ip` field and display it in a column called `src_ip`.
`\| eval list="mozilla"`	Search the Mozilla catalog for top level domains. This `eval` function is required for the next line in the search (`ut_parse_extended`) to work.
\| `ut_parse_extended(url, list)`	Parse the URLs based on the Mozilla top level domain list. The punctuation surrounding a Splunk macro is always a back tick (`), not a single quote (').
\| `ut_shannon(ut_domain)`	Calculate the entropy score for each URL. The punctuation surrounding a Splunk macro is always a back tick (`), not a single quote (').
`\| stats count perc90(ut_shannon) AS perc90_sha values(ut_domain) AS domain_samples BY src_ip`	Count the number of times each source IP address appeared. Calculate the Shannon entropy value at the 90th percentile for the URLs from each source IP address and display it in a column called perc90_sha. Return the values in the URL (ut_domain) field for each source IP address and display them in a column called domain_samples.
`\| search perc90_sha>3.5`	Look for URLs whose entropy score is greater than 3.5.
`\|eval domain_samples = mvindex(domain_samples, 1,5)`	Return five domains for each multivalue `domain_samples` field, starting with the first one.
`\|sort - count`	Sort the table with the most commonly occurring source IP address first.

Next steps

The results show highly random domain names accessed on your network. You can investigate any of the domains, or set up alerts for a certain Shannon Entropy threshold so you know when users are accessing sites that have a high probability of being malicious.

Finally, you might be interested in other processes associated with these use cases: