Monitoring Cisco network devices using gRPC
gRPC is a widely supported protocol for streaming data from a supporting network device like the Cisco Nexus 9000 series NX-OS devices. This protocol allows you to stream various types of metric data, depending on your device configuration. gRPC efficiently handles large-scale data transmission, including operation requests and telemetry. Additionally, gRPC with gNMI supports network device manipulations, offering a modern alternative to older protocols like NETCONF or RESTCONF.
Design considerations for gRPC
Collecting metric data via gRPC can generate large volumes of granular information. To manage this, it's recommended to collect this data before ingestion into the Splunk platform. Telegraf can be configured to prepare and send this data to the Splunk platform, enabling correlation with other machine data for use cases in security, infrastructure, or service monitoring.
Cisco devices use the YANG data model to define and expose metrics and events, so data modeling in the Splunk platform can be performed on the remote procedure calls that invoke network elements. This model is also used with the NETCONF protocol and is familiar to many network administrators.
For more information on gRPC and related models, you can refer to these resources:
Implementing gRPC
This article outlines the steps to configure Cisco devices, Telegraf, and the Splunk platform for gRPC data ingestion. You'll follow these high-level steps:
- Configure gRPC on Cisco devices.
- Configure a Telegraf instance to input Cisco telemetry and connect to the Splunk platform via either the HTTP Event Collector or the Universal Forwarder.
- Use the Splunk platform to correlate data.
Step 1: Configure gRPC on Cisco devices
gRPC is configured on your Cisco Nexus network devices by enabling the dial-out (streaming) telemetry available when using the OpenConfig support in Cisco configuration. If your device supports "Native" mode, it can accept pull requests for data (dial-in), but OpenConfig enables data streaming (dial-out). After creating your Telegraf server component with the appropriate input, you then configure the Cisco devices to send to that Telegraf input.
-
Determine if your device has OpenConfig support using the Cisco Feature Navigator tool, as shown in the following screenshot. This example uses the dropdown menus to look at a Cisco 9000 router that has iOS version 7.11.2, where you can see the support details for OpenConfig.
-
If OpenConfig is not installed already, download the OpenConfig package from the Cisco repository. For more details on ensuring OpenConfig is installed and functional, see this guidance.
-
The OpenConfig package contains an RPM file with a name like “mtx-openconfig-all-2.0.0.0-10.5.1.lib32_64_n9000.rpm”. Log into your iOS environment, install the RPM file and verify the installation:
n9300v-telemetry# install add mtx-openconfig-all-1.0.0.182-9.3.5.lib32_n9000.rpm activate Adding the patch (/mtx-openconfig-all-1.0.0.182-9.3.5.lib32_n9000.rpm) [####################] 100% Install operation 1 completed successfully Activating the patch (/mtx-openconfig-all-1.0.0.182-9.3.5.lib32_n9000.rpm) [####################] 100% Install operation 2 completed successfully n9300v-telemetry# show version Active Package(s): mtx-openconfig-all-1.0.0.182-9.3.5.lib32_n9000 n9300v-telemetry#
- After installing the package on all of the relevant Cisco devices, use
show
commands to start and view gRPC configurations:n9300v-telemetry# show run grpc !Command: show running-config grpc !No configuration change since last restart !Time: Tue Jul 14 16:56:37 2020 version 9.3(5) Bios:version feature grpc grpc gnmi max-concurrent-calls 16 grpc use-vrf default grpc certificate gnmicert n9300v-telemetry#
Step 2: Configure Telegraf to input Cisco telemetry and connect to the Splunk platform over HTTP Event Collector (HEC) or Universal Forwarder (UF)
Telegraf is an open source server agent that is used for collecting metrics and events from your Cisco devices. Compiled with the appropriate plugins, Telegraf can subscribe to a Cisco device that will stream telemetry data over gRPC.
First, you'll need to install Telegraf on your server or use an existing instance. Follow the Telegraf installation guide to do this. In brief, the steps you'll perform are:
- Download Telegraf to your server instance.
- Generate a custom configuration file that includes the appropriate inputs configuration from Cisco devices, and outputs configuration to send the data to the Splunk platform.
- Compile the Telegraf binary that is then installed on the server instance.
Within these installation steps, you'll need to use the inputs configuration for Cisco devices, shown in the following screenshot. This configuration has many settings to configure the type of data that is supported, and how it is handled by Telgraf:
Here is a portion of the telegraf.conf file showing the configuration of inputs to gather useful metrics:
[global_tags] [agent] interval = "30s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "30s" flush_jitter = "0s" precision = "" hostname = "cisco" omit_hostname = true [[outputs.http]] url = "http://198.18.133.23:8088/services/collector" data_format="splunkmetric" splunkmetric_hec_routing=true [outputs.http.headers] Content-Type = "application/json" Authorization = "Splunk abcd1234" [[outputs.file]] files = ["stdout", "/tmp/gnmi.out"] rotation_max_size = "5MB" rotation_max_archives = 3 data_format = "json" [[inputs.gnmi]] addresses = ["xr-1:57400","xr-2:57400","xr-5:57400","xr-6:57400","xr-7:57400","xr-8:57400"] ## define credentials username = "cisco" password = "cisco123" ## redial in case of failures after redial = "10s" #*########################################################################### #* IF-MIB TO GNMI #*########################################################################## [[inputs.gnmi.subscription]] name = "infra-statistics" origin = "Cisco-IOS-XR-infra-statsd-oper" path = "infra-statistics/interfaces/interface/latest/generic-counters" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "if-statistics" origin = "Cisco-IOS-XR-pfi-im-cmd-oper" path = "interfaces/interface-xr/interface/interface-statistics/full-interface-stats/" subscription_mode = "sample" sample_interval = "30s" #*########################################################################### #* CISCO PROCESS MIB TO GNMI #*########################################################################### [[inputs.gnmi.subscription]] name = "CPUMetricData" origin = "Cisco-IOS-XR-wdsysmon-fd-oper" path = "system-monitoring/cpu-utilization/total-cpu-one-minute" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "CPUMetricData" origin = "Cisco-IOS-XR-wdsysmon-fd-oper" path = "system-monitoring/cpu-utilization/total-cpu-five-minute" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "CPUMetricData" origin = "Cisco-IOS-XR-wdsysmon-fd-oper" path = "system-monitoring/cpu-utilization/total-cpu-fifteen-minute" subscription_mode = "sample" sample_interval = "30s" #*########################################################################### #* BGP4-MIB / CISCO-BGP4-MIB TO GNMI #*########################################################################### [[inputs.gnmi.subscription]] name = "instanceSpecificBGPData-update-messages-in" origin = "Cisco-IOS-XR-ipv4-bgp-oper" path = "bgp/instances/instance/instance-active/default-vrf/afs/af/neighbor-af-table/neighbor/update-messages-in" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "instanceSpecificBGPData-update-messages-out" origin = "Cisco-IOS-XR-ipv4-bgp-oper" path = "bgp/instances/instance/instance-active/default-vrf/afs/af/neighbor-af-table/neighbor/update-messages-out" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "instanceSpecificBGPData-connection-established-time" origin = "Cisco-IOS-XR-ipv4-bgp-oper" path = "bgp/instances/instance/instance-active/default-vrf/afs/af/neighbor-af-table/neighbor/connection-established-time" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "instanceSpecificBGPData-prefixes-advertised" origin = "Cisco-IOS-XR-ipv4-bgp-oper" path = "bgp/instances/instance/instance-active/default-vrf/afs/af/neighbor-af-table/neighbor/af-data/prefixes-advertised" subscription_mode = "sample" sample_interval = "30s" [[inputs.gnmi.subscription]] name = "instanceSpecificBGPData-prefixes-accepted" origin = "Cisco-IOS-XR-ipv4-bgp-oper" path = "bgp/instances/instance/instance-active/default-vrf/afs/af/neighbor-af-table/neighbor/af-data/prefixes-accepted" subscription_mode = "sample" sample_interval = "30s"
At this stage, you'll need to decide whether to send metrics to your Splunk platform instance using the HTTP Event Collector (HEC), or using the Universal Forwarder.
Option 1: Using HEC
Use the outputs configuration for HTTP to send metrics to your Splunk platform instance over the HTTP Event Collector, as shown in the following screenshot:
Here is an example telegraf.conf outputs configuration file that sends this data to the Splunk platform over HEC:
[global_tags] # dc = "us-east-1" # will tag all metrics with dc=us-east-1 # rack = "1a" ## Environment variables can be used as tags, and throughout the config #user = "telegraf" index = "main" [agent] interval = "30s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" debug = false quiet = false logtarget = "file" logfile = "/var/log/telegraf/telegraf.log" logfile_rotation_interval = "0d" logfile_rotation_max_size = "1MB" logfile_rotation_max_archives = 5 hostname = "" omit_hostname = false [[outputs.http]] ## URL is the address to send metrics to url = "https://my-splunk-instance:8088/services/collector" ## HTTP method, one of: "POST" or "PUT" method = "POST" # DEV ONLY insecure_skip_verify = false data_format = "splunkmetric" splunkmetric_hec_routing = true ## Additional HTTP headers [outputs.http.headers] Content-Type = "application/json" Authorization = "Splunk use-your-splunk-token" X-Splunk-Request-Channel = "use-your-splunk-token"
Option 2: Use the Universal Forwarder
You can configure your Telegraf instance to log the data locally then use the Universal Forwarder (UF) to forward that data to the Splunk platform using traditional UF configuration principles, in the same way you would for other log monitoring.
To do this, use the Telegraf file output plugin. Configure the local file logging output like this:
# Send telegraf metrics to file(s) [[outputs.file]] ## Files to write to, "stdout" is a specially handled file. files = ["/tmp/metrics.out"] ## Data format to output. ## Each data format has its own unique set of configuration options, read ## more about them here ##https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md data_format = "splunkmetric" hec_routing = false
Configure a props.conf file in the same way you would configure one for other logs you might be gathering with the UF to ready the data for ingestion:
[telegraf] category = Metrics description = Telegraf Metrics pulldown_type = 1 DATETIME_CONFIG = NO_BINARY_CHECK = true SHOULD_LINEMERGE = true disabled = false INDEXED_EXTRACTIONS = json KV_MODE = none TIMESTAMP_FIELDS = time TIME_FORMAT = %s.%3N
Use the Splunk platform to view gRPC data
After your network devices are configured to send data to Telegraf and then to the Splunk platform, you can correlate this data with other important parts of your indexed machine data to allow you to make better decisions and drive actions in your environment.
You can use Search Processing Language (SPL) to analyze the data. Adjust the following SPL to fit your environment:
| mpreview index "mertics_data | search "metric_name:infra-statistics.packets_received"=*"
The following screenshot shows a search on the data gathered from the telegraf.conf file shown previously:
After running this search and expanding an event containing its metric, you'll see this view:
With the metric data residing in a Splunk platform metrics index, you can run this SPL queries on this data. The following query calculates the average number of packets received across all your devices over a one-minute interval, allowing you to monitor network performance:
| mstats avg("infra-statistics.packets_received") AS "packets_received" WHERE index="metrics_data" span=1m
This results in a table view:
You can switch to a Visualization view to visualize these metrics in dashboards to gain insights into your network's performance:
The following query retrieves the packets received as before, but as a rate per second, then rounds that rate to allow for efficient determinations that can be acted upon. Also added is the Interface Name from the metrics, which helps identify where in your Cisco infrastructure this traffic is hitting and from what source:
| mstats rate_avg("infra-statistics.packets_received") AS "packets_received/s" WHERE index="mertics_data" BY source, interface_name, span=1m | eval "packets_received/s" = round('packets_received/s', 2)
This results in this table view:
As well as developing your own SPL queries, you can also navigate the Analytics tab of your Splunk platform instance and use an interface to view and visualize this metric data. In the Analytics, identify the index that contains the metric data, then select from the data present in that index. You don't have to know the metric names beforehand.
After selecting the index and dimensions, you are shown all of the possible metrics and can select what you want to visualize:
You are then shown that metric data in a timeseries format without having to develop your own SPL:
Next steps
These resources might help you understand and implement this guidance:
- Cisco Blogs: Which YANG model to use
- Cisco Docs: gNMI configuration guide
- Cisco White Paper: Cisco Nexus 9000 white paper
- Github: gNMI specification