Skip to main content
Splunk Lantern

Detecting Personally Identifiable Information (PII) in log data

The General Data Protection Regulation (GDPR) is Europe’s framework for protecting security and privacy for Personally Identifiable Information (PII). GDPR was introduced in May 2018, and it applies to any legal entity which stores, controls, or processes personal data for EU citizens. It focuses on two categories of data: 

  1. Personal data, such as an IP address or username
  2. Sensitive personal data, such as biometric or genetic data.

Assets and applications that are in-scope for GDPR store and process personal data. Storage and processing is generally intentionaly, but some software can inadvertently provide sensitive information in log files, resulting in potential exposure to those reviewing the log files and creating data exposure failure in compliance. It is also possible that the Splunk Enterprise Security system itself can represent a risk under GDPR, because log data and events may potentially contain PII.

GDPR permits retaining data for “legitimate interest” (as per article 6), which may allow the retention of log files for security purposes. Therefore, you need an effective security control to ensure that any data retained for legitimate business interests is protected and handled through proper data guidance. If your business were to face a personal information data breach and individuals are impacted, those individuals have the right to demand compensation for material and non-material damage caused by the breach. Your business would need to prove that you have understood and addressed the risk appropriately and deployed proper countermeasures (as per article 82 of the GDPR) to protect in-scope data. Demonstrating that best practice was adhered to - that is, that only authorized individuals and proper data controls were used for accessing personal data - can help mitigate potential impact to your organization.

How to use Splunk Enterprise Security to detect PII

You can use Splunk Enterprise Security use cases to manage GDPR for in-scope systems to ensure compliance.  Run or schedule the following search to detect personally identifiable information (PII) in log files. You can optimize it by specifying an index and adjusting the time range:

NOT sourcetype=stash 
| `get_integer_seq` 
| lookup luhn_lite_lookup integer_seq OUTPUTNEW pii,pii_clean 
| eval pii_length=len(pii_clean) 
| lookup iin_lookup iin as pii_clean,length as pii_length OUTPUTNEW iin_issuer 
| search iin_issuer=* 
| `get_event_id` 
| fields + event_id,_raw,host,pii,iin_issuer 
| eval pii_hash=sha1(pii)

Investigate why and how the identified application needs to transmit PII data. For in-house or custom apps, you can mask data or have it removed from logs if not needed for critical business purposes.

Next steps