Your organization uses NetFlow data, which you are ingesting into Splunk. These data show a handful of machines reaching out via HDFS (Hadoop Distributed File System) to your Hadoop cluster, randomly querying for specific files that don't exist.
A Splunk customer ran the following search:
| protocol=hdfs response=404 | sort BY rare filename
After running this search, the customer found that the machines in question had malware on them and were searching for sensitive documents with names like salary.xls and personal.doc. The customer's endpoint detection and response solution did not detect the malware. Running this search improved their mean time to respond.
These additional Splunk resources might help you understand and implement this use case in your organization: