Skip to main content
Splunk Lantern

*Nix hosts with NFS connectivity issues

You might want to monitor for network file share (NFS) connectivity issues when doing the following:

Prerequisites 

In order to execute this procedure in your environment, the following data, services, or apps are required:

Example

Applications that rely on the presence of a directory path to read and write data encounter problems if that path is not present or functioning correctly. You know that directories mounted to a Network File System (NFS) file share might encounter problems due to a variety of reasons, so you want to monitor them. 

To optimize the search shown below, you should specify an index and a time range.

  1. Run the following search:
source="/var/log/messages" nfs ("not responding" OR "still trying")
|rex "server (?<nfs_host>\S+)(\s*(?<message>.*))?"
|table _time host nfs_host message" 
|rename host AS client nfs_host AS "NFS Server"

Search explanation

The table provides an explanation of what each part of this search achieves. You can adjust this query based on the specifics of your environment.

Splunk Search Explanation

source="/var/log/messages" 

Search the global system messages log file. 

nfs ("not responding" OR "still trying")

Get NFS log messages that include the text “not responding” or “still trying”. 

|rex "server (?<nfs_host>\S+)(\s*(?<message>.*))?"

Use regex to capture the nfs__host name and the error message and capture the result into the message field.

|table _time host nfs_host message"

Display the results in a table with columns in the order shown.

|rename host AS client nfs_host AS "NFS Server"

Rename for readability

Result

Sample results for this search are shown in the table below. Use the results of this procedure to detect any machines in your environment where an NFS mount can't be reached.  

_time client NFS server message

2020-09-03T08:34:49.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:33:49.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:32:48.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:30:47.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

2020-09-03T08:29:47.000-0700

ip-172-31-27-100

172.31.1.157

not responding, timed out

There are two types of mounts, soft and hard. The behavior associated with NFS errors and the messaging provided vary with the way the mount is done. Generally soft mounts time out while hard mounts do not. Hard mounts are preferred because they have more robust data protection behavior, but processes on the client hang until a response is received. If the clients are only reading the NFS mount, like a web server does when accessing static content, then a soft mount may be preferable.

Errors on the NFS server can range from dependent processes not running to the server being too busy. When troubleshooting, you should also check the network to make sure that the client can reach the NFS server. Then, run a similar search to the one given here but filter for the NFS server and look for error states in the raw logs. 

  • Was this article helpful?