Know Your Customer (KYC) standards are used in many Financial Services Industry (FSI) institutions. KYC processes include establishing customer identities, understanding the nature of customers’ activities, validating the legitimacy of customer funds, and assessing customer risk. Regulations in most countries demand KYC implementation to make sure controls, processes, and procedures are in place to identify bad actors and to protect legitimate customers. You want to use Splunk software to simplify and automate some of the processes required to meet KYC requirements.
How to use Splunk software for this use case
Verify the identity
The first step in knowing your customer is to check for synthetic identities or accounts to track for or possible money laundering, terrorist funding and whatever else a bad actor might try to accomplish. After collecting Personal Identifiable Information (PII) on your customers and implementing a synthetic identity checker, either off-the-shelf or custom-built, you can use Splunk software to do any of the following:
- Use the Splunk platform to monitor for errors, latency, and other troubleshooting issues with your synthetic identity checker.
- Use Splunk Infrastructure Monitoring and Splunk APM to make sure the infrastructure and transactions for the synthetic accounts run without issues.
- If the synthetic identity checker goes through web server logs, regardless of channel, use the Splunk platform to:
- Monitor client IPs of the applicant to see if they are within the vicinity of their home address (assuming they are not on a foreign VPN).
- Check if the client or other household members have tried to open up accounts recently with the same FSI.
- Create rules as needed. For example, you can adjust thresholds to help discover new anomalies as new indicators of identity theft and synthetic identities arise.
Conduct due diligence
Verification of identity goes beyond just knowing that the customer is who they claim to be. Due diligence involves background checking, which establishes the value of the account and the risk level of the customer. For instance, someone who owns a large business or is an elected government leader has a different risk level than a small-value account held by an individual not in the public eye.
Due diligence workflow logs can be monitored by the Splunk platform for analytics and troubleshooting. As the data is uncovered, you will most likely store it in a local database as a system of record for customer information. You can expose that data to the Splunk platform via lookup capabilities to enrich customer information in further investigations.
First time seen for an activity and outlier detection are two excellent data points to monitor.
First time seen could mean:
- the first time the user tried to open an account
- the first time they logged into an account
- the first time they performed a transaction against the account
Each one of these are noteworthy events when taken into context. For instance, if a prospective customer is applying to create a new account, it is worth checking if they have applied before and the date of the first time they applied. This may give context, if they were rejected for KYC regulation reasons the first time they applied.
Another example is the first time a customer performed a withdrawal action against the account, where the account was dormant after opening and almost all the money was withdrawn after 18 months. So, the first time seen for a transaction is 18 months after opening and it appears as if the customer decided to defund the account. Could this be an example of an account takeover or a holding position to further launder money? In either case, the first time seen is an important part of KYC.
Let’s get into how Search Processing Language (SPL) can be used to implement this.
Here is some SPL on first time seeing a customer after they’ve opened up an account. We are calling the Last Touched field as the first time they opened an account and then seeing which customers have taken at least 6 months to perform any action at the bank. This is not necessarily bad behavior, but it adds to the risk score of the customer.
index=transacations sourcetype=account_opening |eval prev_epoch=strptime(last_touched, "%m/%d/%Y %H:%M:%S") |sort - last_touched |join customer [ index=transactions sourcetype=banking] |where epoch>relative_time(prev_epoch, "+6mon") |fields - prev_epoch, balance |rename accountID AS current_accountID action AS current_action account_type AS current_account_type |eval current_balance=tostring(round(current_balance, 2),"commas"), other_balance=tostring(round(other_balance, 2),"commas") |convert timeformat="%m/%d/%Y %H:%M:%S" ctime(epoch) AS current_time |fields - epoch
This may look a little involved, so let’s break it down.
- The first and second lines gather all transactions for account opening and we convert the last touched field, which is a human readable timestamp into epoch time, which is the number of seconds since January 1, 1970. It’s easier to do timestamp math with integers than it is with human readable text.
- Next, we join that data with current customer banking transactions. The
whereclause does the work for our outlier as it finds all events that have a current epoch time that is greater than the account opening epoch time. If it is greater than 6 months, this meets our criteria for “first time seen” for a customer who did not touch their account after opening for at least 6 months.
- The rest of the SPL is just formatting to make the output table prettier to turn the epoch time back into a human readable timestamp and convert the amount involved into a simple integer.
Outliers are the other hallmark of continuous monitoring. SPL has a command called
eventstats that finds statistics for all events in the search as a whole, and it does not alter the results of the search. This is useful to find the average and standard deviation of all events in a filtered dataset. Let’s use this in an example.
index=payments |stats avg(amount) AS avg_accountID BY accountID |eventstats avg(avg_amountID) AS avg_amount stdev(avg_amountID) AS stdev_amount |where avg_accountID>(3*stdev_amount + avg_amount)
In this search, we first calculate the average payment by account ID and then calculate the average amount and standard deviation of the amount value of all payments in a user selected or saved search selected time range, neither of which are shown here. The outlier is any payment that is greater than the average amount plus 3 times the standard deviation of all payments in the dataset. This is a rather simple way to find outliers in customer behavior, as you can also use an average with static multiplier, moving averages with standard deviation, and a host of other statistical techniques. For more advanced ways to detect outliers, you can use the free Splunk Machine Learning Toolkit (MLTK) and utilize machine learning methods to find outliers including Density Function, Local Outlier Factor, and One Class SVM.
For both first time seen and simple outlier detection, hard coding numbers and field names into a search that you’ll use many times doesn't make sense. That’s where Splunk macros become useful. There are macros for first time seen and the outlier detection example on Splunkbase in a bundle called TA For SplunkStart Basic Security Essentials that you can download for free and extract from the macros.conf file.
When finding outliers, using
eventstats over a total population is not a good idea when the population has different behaviors due to the intrinsic nature of how they perform routine transactions. For instance, one customer may routinely transfer 500 dollars per month via wire transfer, while another may routinely transfer 50,000 dollars per month. None of this behavior is out of the ordinary in respect to what they do on their own, but if you group the two customers together to find an average amount, it is meaningless and heavily skewed towards the bigger transfer.
To get around this with the outlier approach, it may be a good idea to regularly collect transactional data per customer on a daily or weekly basis to get a baseline. Here’s an example of what you may get using the
collect commands to collect data via a scheduled saved search for an average amount transferred stored in a summary index.
In this example, you can now use
stats to find the average amount transferred for any account ID to get a baseline of previous transactions and compare it to the most recent amount to see if there is an outlier. This allows you to perform continuous monitoring and compare recent transactions to expected behavior for your customer. Every time the customer performs a transaction, a saved search can add the new value to the summary index and also compare the current amount to the historical average for the customer to find an outlier.
Here’s a sample search for this situation that appends the average and standard deviation of the summary index for an account ID to the current transaction, and compares the current payment (average of payments in the current time range) with the historical averages plus a multiplier of the standard deviation.
index=payments accountID=”456” | append [ search index=payments_summary accountID=”456” earliest=-1Y | stats avg(amount) AS avg_payment stdev(amount) AS stdev_payment] |stats avg(amount) AS current_payment values(avg_payment) AS avg_payment values(stdev_payment) AS stdev_payment values(accountID) AS accountID |where current_payment > avg_payment + (3*stdev_payment)
For efficiency, if you have millions of accounts, it may make more sense to store the summary data of transactions per day or per week for each account ID into the Splunk key value storage called KVStore. The KVStore is a general purpose database that ships with the Splunk platform for create, modify, and delete capabilities used to enrich data with other searches.
Outliers do not always indicate an issue. Each outlier should have a risk score associated with it that is further evaluated against all risk scores for the customer, such as those developed during due diligence. The accumulation of risk scores by account ID prevents false positives and ensures more confidence in possible nefarious behavior. Splunk software can help you apply risk scores to customer information in the following ways:
- Use a lookup to find the due diligence data, which in turn is fed into another lookup to find a numerical weight to multiply the initial risk score. As the search runs, it saves all its information to a risk index and it can initiate alerts, if necessary.
- Automate knowing your customers by continuously monitoring your customer for anomalies and outliers, adding risk scores to the results.
- If you have Splunk Enterprise Security, use the free, supported Splunk App For Fraud Analytics to discover account takeover and account abuse, which are two of the hallmarks of monitoring and protecting your customers. The app uses Splunk Enterprise Security's Risk-Based Alerting (RBA) for accumulated risk scores.
Knowing your customer for security and compliance reasons can also help you improve the customer experience and help with customer investment strategies. For example, if a deposit has increased one hundred times the normal average, not only will this be an outlier, but it may be a legitimate time to engage the customer for better returns on their investment, assuming this outlier is benign. Using insights from the Splunk platform in this additional way increases the value you get from your investment.
The content in this use case comes from a previously published blog, one of the thousands of Splunk resources available to help users succeed. These additional Splunk resources might help you understand and implement this specific use case:
- Use Case: Defining and detecting Personally Identifiable Information (PII) in log data
- Use Case: Using modern methods of detecting financial crime
- Blog: Mind the gap! Understanding end-to-end customer journeys to deliver great customer experience
- Blog: A Splunk approach to baselines, statistics and likelihoods on big data
- App: Splunk App For Fraud Analytics
- Add-on: TA For SplunkStart Basic Security Essentials