Skip to main content
 
 
Splunk Lantern

*Nix hosts with NFS connectivity issues

 

Applications that rely on the presence of a directory path to read and write data encounter problems if that path is not present or functioning correctly. You know that directories mounted to a Network File System (NFS) file share might encounter problems due to a variety of reasons, so you want to monitor them. 

Procedure

Run the following search. You can optimize it by specifying an index and adjusting the time range.

source="/var/log/messages" nfs ("not responding" OR "still trying")
|rex "server (?<nfs_host>\S+)(\s*(?<message>.*))?"
|table _time host nfs_host message" 

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

source="/var/log/messages" 

Search the global system messages log file. 

nfs ("not responding" OR "still trying")

Get NFS log messages that include the text not responding or still trying

|rex "server (?<nfs_host>\S+)(\s*(?<message>.*))?"

Use regex to capture the nfs__host name and the error message and capture the result into the message field.

|table _time host nfs_host message"

Display the results in a table with columns in the order shown.

Next steps

Sample results for this search are shown in the table below. Use the results of this procedure to detect any machines in your environment where an NFS mount can't be reached.  

_time client NFS server message

2020-09-03T08:34:49.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:33:49.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:32:48.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:30:47.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:29:47.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

There are two types of mounts, soft and hard. The behavior associated with NFS errors and the messaging provided vary with the way the mount is done. Generally soft mounts time out while hard mounts do not. Hard mounts are preferred because they have more robust data protection behavior, but processes on the client hang until a response is received. If the clients are only reading the NFS mount, like a web server does when accessing static content, then a soft mount may be preferable.

Errors on the NFS server can range from dependent processes not running to the server being too busy. When troubleshooting, you should also check the network to make sure that the client can reach the NFS server. Then, run a similar search to the one given here but filter for the NFS server and look for error states in the raw logs. 

Finally, you might be interested in other processes associated with the Maintaining *nix systems use case.