Notifications

745 views

Overview


MID servers can be used to run SNMP queries to target CIs. The MID server SNMP can be used by multiple applications including discovery, service mapping, and orchestration. This article aims at covering some of the tools which can be used to troubleshoot MID server SNMP issues.

Troubleshooting Tools


Very often, the issue with an application using SNMP (discovery, orchestration, etc) is that the SNMP data is not returned completely or at all. If the data is returned, then investigation would need to focus on a different area of the application such as script include or business rule. Therefore, a good starting point to investigate the an issue for an application which depends on SNMP is to check if the data is collected successfully.

Two of the main reasons an SNMP query may not collect the desired data are:

  1. Invalid SNMP credential
  2. SNMP query timeout

Some of the tools which can be used to confirm whether the data is returned or not are:

  1. MID Server logs
  2. SNMP walk tools
  3. Wireshark

Examples


Review MID Server logs

  1. To get more detailed information on the MID server logs for SNMP queries add parameter mid.log.level = debug.
  2. Create mid server properties com.service_now.mid.probe.SNMP = DEBUG and com.service_now.monitor.snmp = DEBUG
  3. Reproduce issue and review MID server log files. Review the following two docs on how to collect the MID server files:

Two example logs are shown next. The first example is from a successful query where all the OIDs for an SNMP - Classify probe were returned, while the second is from a partially successful query where only a fraction of the OIDs were returned. The first the classify probe was run with the default timeout of 1500 ms. The second probe had the timeout set to 10 ms to simulate a timeout.

Example log showing successful SNMP query:

08/29/18 11:32:52 (911) Worker-Interactive:SNMP Worker starting: SNMP source: 
08/29/18 11:32:52 (926) Worker-Interactive:SNMP DEBUG: Timeout: 1500, Retries: 2
08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: Using GETBULK
08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10
08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.4.1.9.9.46.1.3.1.1.3], max rows: 10
08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.2.2.1.1, 1.3.6.1.2.1.2.2.1.2, 1.3.6.1.2.1.2.2.1.3, 1.3.6.1.2.1.2.2.1.6, 1.3.6.1.2.1.2.2.1.7, 1.3.6.1.2.1.2.2.1.8], max rows: 10
08/29/18 11:32:53 (114) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.47.1.1.1.1.11, 1.3.6.1.2.1.47.1.1.1.1.13, 1.3.6.1.2.1.47.1.1.1.1.2, 1.3.6.1.2.1.47.1.1.1.1.12, 1.3.6.1.2.1.47.1.1.1.1.4], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.25.3.2.1.2, 1.3.6.1.2.1.25.3.2.1.3], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.43.5.1.1.17], max rows: 10
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 405c1f5cdb54a7008597d8c75e961967, canceled: false
08/29/18 11:32:53 (176) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 405c1f5cdb54a7008597d8c75e961967
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 61 OIDs 
08/29/18 11:32:53 (192) Worker-Interactive:SNMP Worker completed: SNMP source:  time: 0:00:00.250

Example log showing failed SNMP query:

08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Timeout: 10, Retries: 2
08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Snmp4jSessionFactory: connection created for key SnmpSessionPoolKey[target: &port:161&fixed_cred:&tag:]
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: Using GETBULK
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10
08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: First attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out.
08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: Second attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5
08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Second attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out.
08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Third attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5, forcing GETNEXT pdu type
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 561ea3acdbdca7008597d8c75e96191a, canceled: false
08/30/18 07:29:04 (215) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 561ea3acdbdca7008597d8c75e96191a
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 12 OIDs 
08/30/18 07:29:04 (231) Worker-Interactive:SNMP Worker completed: SNMP source:  time: 0:00:00.218

In the above example we can see that some probes timeout due to the low timeout configured.

SNNP Walk tool

Using an SNMP tool we can confirm whether the results are returned as expected. Failure or partial success in retrieving OIDs would further confirm no issues with the MID server SNMP implementation, while consistent success using a third party tool would suggest the MID server logs need to be reviewed to look for any potential issues. In the following example, from the MID server a query is executed for OID 1.3.6.1.2.1.1.1. This OID is the sysDescr and will return a description of the device. Note that the commands may change depending on the SNMP toll used.

The following example uses SnmpWalk.exe, however the credential was set to "publi" which is an incorrect community string for this device. The correct public string for this example should be public

C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"publi" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0

%Failed to get value of SNMP variable. Timedout.

As seen above there is no credential failure error. Instead of an error the query eventually times out.

In the following example the public string was corrected, public.

C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"public" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0

OID=.1.3.6.1.2.1.1.1.0, Type=OctetString, Value=Linux Linux-Tomcat 3.10.0-327.el7.x86_64 31 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64

As seen above, once the public string was corrected then the sysDescr was returned, only part of it is shown above, instead of timing out. 

Note: It is important to run the test from the same host where the MID server is installed and with same configuration for the credential

Network Traffic Monitoring Tool(Wireshark example)

Using a network traffic monitoring tool would help in determining where the issue is found. For example, we could confirm whether the packets are sent and if they are ever returned.

Setup:

  1. Download and install Wireshark from https://www.wireshark.org/download.html.
  2. Once installed, double-click the application icon to start the application.
  3. Select the interface that will be used to collect traffic.
  4. In the following image, Ethernet is selected.

In the following example we review the traffic for an SNMP query of table mgmt.mib-2.printmib.prtMarkerColorant.prtMarkerColorantTable prtMarkerColorantValue.

We can see from the ecc_queue record what was returned:

Filter "udp && ip.addr == <target_ip>" to filter for only the SNMP traffic to the target device (In the screenshot replaced with loopback IP after packets were collected).

The following screenshot shows data returned by the device in detail for one of the OIDs.

 

Solutions


Confirm Credentials

Incorrect credentials are more often then not the root cause. SNMP v1/v2 will be simpler to configure as it only uses the community string. For SNMP v3 confirm the Username, Authentication Protocol, Authentication Key, Privacy Protocol and Privacy Key configured all match what is configured in the target device. A third party SNMP walk tool can be used as well to confirm the credential is correct.

Increase SNMP Timeout

The device at times may not be capable to reply within the timeout configure, or there could be a network issue. In most cases increasing the timeout would increase the changes of being able to retrive the OIDs. SNMP timeout can be configure per MID server or directly on a probe.

View the following documents for the available parameters for probes and MID servers:

Additional Information


Article Information

Last Updated:2018-08-30 05:39:32
Published:2018-08-30
ecc_queue_Colors.pngMagenta.pngSNMP Traffic.pngSNMP_Magenta.png