Skip to main content
Splunk Lantern

Detecting data exfiltration activities

When attackers are looking to identify and exfiltrate data from a target organization, they carry out attacks which contain three main activities: identification, collection, and staging data for exfiltration.

  • Identification includes scanning systems and observing user activity.
  • Collection includes the transfer of large amounts of data from various repositories.
  • Staging, or preparation, includes moving data to a central location and compressing it, also optionally encoding or encrypting it.

These searches allow you to detect and monitor suspicious behavior related to these activities.

How to use Splunk software for this use case

► Email files written outside of the Outlook directory

 

To complete this process, your deployment needs to ingest information on filesystem activity from your hosts. This is typically populated via endpoint detection-and-response products, such as Carbon Black, or by other endpoint data sources, such as Sysmon. You should also ensure you are ingesting normalized endpoint data, populating the Filesystem node of the Endpoint data model in the Common Information Model (CIM). For information on installing and using the CIM, see the Common Information Model documentation.

This search detects email files created outside of the normal Outlook directory.

False positives from this search might occur since administrators and users sometimes prefer backing up their email data by moving email files into a different folder. These attempts will be detected by this search.

| tstats summariesonly=false allow_old_summaries=true count, values("Filesystem.file_path") AS file_path, min(_time) AS firstTime, max(_time) AS lastTime FROM datamodel=Endpoint.Filesystem WHERE (("Filesystem.file_name"=*.pst OR "Filesystem.file_name"=*.ost) "Filesystem.file_path"!="C:\\Users\\*\\My Documents\\Outlook Files\\*" "Filesystem.file_path"!="C:\\Users\\*\\AppData\\Local\\Microsoft\\Outlook*") BY "Filesystem.action", "Filesystem.process_id", "Filesystem.file_name", "Filesystem.dest" 
| rename "Filesystem.*" AS "*" 
| convert timeformat="%Y-%m-%dT%H:%M:%S" ctime(firstTime) 
| convert timeformat="%Y-%m-%dT%H:%M:%S" ctime(lastTime)
► Hosts receiving a high volume of network traffic from an email server

To complete this process, your deployment needs to ingest network traffic data. Your email servers must be categorized as "email_server" for the search to work. You should also ensure you are ingesting normalized endpoint data, populating the Network_Traffic node of the Endpoint data model in the Common Information Model (CIM). For information on installing and using the CIM, see the Common Information Model documentation.

This search looks for an increase of data transfers from your email server to your clients. This could be indicative of an attacker collecting data using your email server.

False positives will vary based on how you set the deviation_threshold and data_samples values. You should adjust these values based on your network traffic to and from your email servers. The deviation_threshold field is a multiplying factor to control how much variation you're willing to tolerate. The minimum_data_samples field is the minimum number of connections of data samples required for the statistic to be valid.

| tstats summariesonly=false allow_old_summaries=true sum("All_Traffic.bytes_in") AS bytes_in FROM datamodel=Network_Traffic WHERE "All_Traffic.dest_category"=email_server BY "All_Traffic.src_ip", _time span=1d 
| rename "All_Traffic.*" AS "*" 
| eventstats avg(bytes_in) AS avg_bytes_in stdev(bytes_in) AS stdev_bytes_in 
| eventstats count AS num_data_samples avg(eval(if(_time < relative_time(now(), "@d"), bytes_in, null))) AS per_source_avg_bytes_in stdev(eval(if(_time < relative_time(now(), "@d"), bytes_in, null))) AS per_source_stdev_bytes_in BY src_ip 
| WHERE ('_time' >= relative_time(now(),"@d")) 
| eval minimum_data_samples=4, deviation_threshold=3 
| WHERE (((bytes_in > ((deviation_threshold * stdev_bytes_in) + avg_bytes_in)) AND (bytes_in > ((deviation_threshold * per_source_stdev_bytes_in) + per_source_avg_bytes_in))) AND (num_data_samples >= minimum_data_samples)) 
| eval num_standard_deviations_away_from_server_average=round((abs((bytes_in - avg_bytes_in)) / stdev_bytes_in),2), num_standard_deviations_away_from_client_average=round((abs((bytes_in - per_source_avg_bytes_in)) / per_source_stdev_bytes_in),2) 
| table src_ip, _time, bytes_in, avg_bytes_in, per_source_avg_bytes_in, num_standard_deviations_away_from_server_average, num_standard_deviations_away_from_client_average
 
► Email servers sending a high volume of traffic to hosts

To complete this process, your deployment needs to ingest network traffic data. Your email servers must be categorized as "email_server" for the search to work. You should also ensure you are ingesting normalized endpoint data, populating the Network_Traffic node of the Endpoint data model in the Common Information Model (CIM). For information on installing and using the CIM, see the Common Information Model documentation.

This search looks for an increase of data transfers from your email server to your clients. This could be indicative of a malicious actor collecting data using your email server.

False positives will vary based on how you set the deviation_threshold and data_samples values. You should adjust these values based on your network traffic to and from your email servers. The deviation_threshold field is a multiplying factor to control how much variation you're willing to tolerate. The minimum_data_samples field is the minimum number of connections of data samples required for the statistic to be valid.

| tstats summariesonly=false allow_old_summaries=true sum("All_Traffic.bytes_out") AS bytes_out FROM datamodel=Network_Traffic WHERE "All_Traffic.src_category"=email_server BY "All_Traffic.dest_ip", _time span=1d 
| rename "All_Traffic.*" AS "*" 
| eventstats avg(bytes_out) AS avg_bytes_out stdev(bytes_out) AS stdev_bytes_out 
| eventstats count AS num_data_samples avg(eval(if(_time < relative_time(now(), "@d"), bytes_out, null))) AS per_source_avg_bytes_out stdev(eval(if(_time < relative_time(now(), "@d"), bytes_out, null))) AS per_source_stdev_bytes_out BY dest_ip 
| WHERE ('_time' >= relative_time(now(),"@d")) 
| eval minimum_data_samples=4, deviation_threshold=3 
| WHERE (((bytes_out > ((deviation_threshold * stdev_bytes_out) + avg_bytes_out)) AND (bytes_out > ((deviation_threshold * per_source_stdev_bytes_out) + per_source_avg_bytes_out))) AND (num_data_samples >= minimum_data_samples)) 
| eval num_standard_deviations_away_from_server_average=round((abs((bytes_out - avg_bytes_out)) / stdev_bytes_out),2), num_standard_deviations_away_from_client_average=round((abs((bytes_out - per_source_avg_bytes_out)) / per_source_stdev_bytes_out),2) 
| table dest_ip, _time, bytes_out, avg_bytes_out, per_source_avg_bytes_out, num_standard_deviations_away_from_server_average, num_standard_deviations_away_from_client_average
 
Next steps

The content in this article comes from Splunk Enterprise Security (ES). As a Splunk premium security solution, ES solves a wide range of security analytics and operations use cases including continuous security monitoring, advanced threat detection, compliance, incident investigation, forensics and incident response. Splunk ES delivers an end-to-end view of an organization's security posture with flexible investigations, unmatched performance, and the most flexible deployment options offered in the cloud, on-premises, or hybrid deployment models. If you have questions about this use case, see the Security Research team's support options on GitHub.

In addition, these Splunk resources might help you understand and implement this use case:

Still need help with this use case? Most customers have OnDemand Services per their license support plan. Engage the ODS team at OnDemand-Inquires@splunk.com if you require assistance.