Notifications

2706 views

Description

Overview


MID servers can be used to run SNMP queries to target CIs. The MID server SNMP can be used by multiple applications including discovery, service mapping, and orchestration. This article aims at covering some of the tools which can be used to troubleshoot MID server SNMP issues.

Troubleshooting Tools


Very often, the issue with an application using SNMP (discovery, orchestration, etc) is that the SNMP data is not returned completely or at all. If the data is returned, then investigation would need to focus on a different area of the application such as script include or business rule. Therefore, a good starting point to investigate the an issue for an application which depends on SNMP is to check if the data is collected successfully.

Two of the main reasons an SNMP query may not collect the desired data are:

  1. Invalid SNMP credential
  2. SNMP query timeout

Some of the tools which can be used to confirm whether the data is returned or not are:

  1. MID Server logs
  2. SNMP walk tools
  3. Wireshark

Examples


Review MID Server logs

  1. To get more detailed information on the MID server logs for SNMP queries add parameter mid.log.level = debug.
  2. Reproduce issue and review MID server log files. Review the following two docs on how to collect the MID server files:

Two example logs are shown next. The first example is from a successful query where all the OIDs for an SNMP - Classify probe were returned, while the second is from a partially successful query where only a fraction of the OIDs were returned. The first the classify probe was run with the default timeout of 1500 ms. The second probe had the timeout set to 10 ms to simulate a timeout.

Example log showing successful SNMP query:

08/29/18 11:32:52 (911) Worker-Interactive:SNMP Worker starting: SNMP source: 
08/29/18 11:32:52 (926) Worker-Interactive:SNMP DEBUG: Timeout: 1500, Retries: 2
08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: Using GETBULK
08/29/18 11:32:53 (004) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10
08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.4.1.9.9.46.1.3.1.1.3], max rows: 10
08/29/18 11:32:53 (051) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.2.2.1.1, 1.3.6.1.2.1.2.2.1.2, 1.3.6.1.2.1.2.2.1.3, 1.3.6.1.2.1.2.2.1.6, 1.3.6.1.2.1.2.2.1.7, 1.3.6.1.2.1.2.2.1.8], max rows: 10
08/29/18 11:32:53 (114) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.47.1.1.1.1.11, 1.3.6.1.2.1.47.1.1.1.1.13, 1.3.6.1.2.1.47.1.1.1.1.2, 1.3.6.1.2.1.47.1.1.1.1.12, 1.3.6.1.2.1.47.1.1.1.1.4], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.25.3.2.1.2, 1.3.6.1.2.1.25.3.2.1.3], max rows: 10
08/29/18 11:32:53 (161) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.43.5.1.1.17], max rows: 10
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 405c1f5cdb54a7008597d8c75e961967, canceled: false
08/29/18 11:32:53 (176) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml
08/29/18 11:32:53 (176) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.405c1f5cdb54a7008597d8c75e961967.xml
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 405c1f5cdb54a7008597d8c75e961967
08/29/18 11:32:53 (192) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 61 OIDs 
08/29/18 11:32:53 (192) Worker-Interactive:SNMP Worker completed: SNMP source:  time: 0:00:00.250

Example log showing failed SNMP query:

08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Timeout: 10, Retries: 2
08/30/18 07:29:03 (997) Worker-Interactive:SNMP DEBUG: Snmp4jSessionFactory: connection created for key SnmpSessionPoolKey[target: &port:161&fixed_cred:&tag:]
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: Using GETBULK
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.22.1.1, 1.3.6.1.2.1.4.22.1.2, 1.3.6.1.2.1.4.22.1.3], max rows: 10
08/30/18 07:29:04 (075) Worker-Interactive:SNMP DEBUG: First attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 10
08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: First attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out.
08/30/18 07:29:04 (122) Worker-Interactive:SNMP DEBUG: Second attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5
08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Second attempt of getTable failed on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], error: Request timed out.
08/30/18 07:29:04 (169) Worker-Interactive:SNMP DEBUG: Third attempt of getTable on target: /161, OIDs: [1.3.6.1.2.1.4.20.1.1, 1.3.6.1.2.1.4.20.1.2, 1.3.6.1.2.1.4.20.1.3], max rows: 5, forcing GETNEXT pdu type
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericScalarMetricEvent
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: CheckSessionCanceledEvent, correlator: , sysID: 561ea3acdbdca7008597d8c75e96191a, canceled: false
08/30/18 07:29:04 (215) Worker-Interactive:SNMP Enqueuing: C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml
08/30/18 07:29:04 (215) Worker-Interactive:SNMP DEBUG: Event: GenericCounterMetricEvent
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: ** enqueued C:\ServiceNow\emprcoeljak\agent\work\monitors\ECCSender\output_0\ecc_queue.561ea3acdbdca7008597d8c75e96191a.xml
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: MessageProcessedEvent, sysID: 561ea3acdbdca7008597d8c75e96191a
08/30/18 07:29:04 (231) Worker-Interactive:SNMP DEBUG: Event: SendMessageEvent, message: SNMP SNMP - Classify: 12 OIDs 
08/30/18 07:29:04 (231) Worker-Interactive:SNMP Worker completed: SNMP source:  time: 0:00:00.218

In the above example we can see that some probes timeout due to the low timeout configured.

SNNP Walk tool

Using an SNMP tool we can confirm whether the results are returned as expected. Failure or partial success in retrieving OIDs would further confirm no issues with the MID server SNMP implementation, while consistent success using a third party tool would suggest the MID server logs need to be reviewed to look for any potential issues. In the following example, from the MID server a query is executed for OID 1.3.6.1.2.1.1.1. This OID is the sysDescr and will return a description of the device. Note that the commands may change depending on the SNMP toll used.

The following example uses SnmpWalk.exe, however the credential was set to "publi" which is an incorrect community string for this device. The correct public string for this example should be public

C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"publi" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0

%Failed to get value of SNMP variable. Timedout.

As seen above there is no credential failure error. Instead of an error the query eventually times out.

In the following example the public string was corrected, public.

C:\SNMPWalk>.\SnmpWalk.exe -r:10.127.212.181 -c:"public" -os:.1.3.6.1.2.1.1 -op:.1.3.6.1.2.1.1.1.0

OID=.1.3.6.1.2.1.1.1.0, Type=OctetString, Value=Linux Linux-Tomcat 3.10.0-327.el7.x86_64 31 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64

As seen above, once the public string was corrected then the sysDescr was returned, only part of it is shown above, instead of timing out. 

Note: It is important to run the test from the same host where the MID server is installed and with same configuration for the credential

Network Traffic Monitoring Tool(Wireshark example)

Using a network traffic monitoring tool would help in determining where the issue is found. For example, we could confirm whether the packets are sent and if they are ever returned.

Setup:

  1. Download and install Wireshark from https://www.wireshark.org/download.html.
  2. Once installed, double-click the application icon to start the application.
  3. Select the interface that will be used to collect traffic.
  4. In the following image, Ethernet is selected.

In the following example we review the traffic for an SNMP query of table mgmt.mib-2.printmib.prtMarkerColorant.prtMarkerColorantTable prtMarkerColorantValue.

We can see from the ecc_queue record what was returned:

Use display filter "udp && ip.addr == <target_ip>" to filter for only the SNMP traffic to the target device (In the screenshot replaced with loopback IP after packets were collected).

The following screenshot shows data returned by the device in detail for one of the OIDs.

 

Note: Wireshark has both capture filters and display filters. Per https://wiki.wireshark.org/CaptureFilters, "Capture filters (like tcp port 80) are not to be confused with display filters (like tcp.port == 80). The former are much more limited and are used to reduce the size of a raw packet capture. The latter are used to hide some packets from the packet list. Capture filters are set before starting a packet capture and cannot be modified during the capture. Display filters on the other hand do not have this limitation and you can change them on the fly". System performance wise it can be helpful to setup a capture filter before doing a large packet capture.

Decrypt Wireshark SNMPv3


SNMPv3 traffic is encrypted and thus needs to be decrypted for review. Note that the following steps only decrypt the packets in memory.  

1. Open the captured packets using the Wireshark application.
2. Go to Edit > Preferences > Protocols.
3. Select SNMP from the protocol list.
4. Click to "Edit" the "Users Table".
5. Click on Add button and put the following details:
  • Engine ID (Engine ID can be collected from the wireshark encrypted captures as this value is not encrypted. To do so open the SNMP packet header and check for Engine ID string).
  • SNMPv3 username.
  • Choose the authentication model (MD5 | SHA1).
  • Put the password for authentication model.
  • Choose the privacy protocol (DES | AES | AES192 | AES256).
  • Put the privacy password.
  • Packet content should be decrypted now.

Solutions


Confirm Correct Credentials

Incorrect credentials are more often than not the root cause. SNMP v1/v2 will be simpler to configure as it only uses the community string. For SNMP v3 confirm the Username, Authentication Protocol, Authentication Key, Privacy Protocol and Privacy Key configured all match what is configured in the target device. A third party SNMP walk tool can be used as well to confirm the credential is correct.

Increase SNMP Timeout

The device at times may not be capable to reply within the timeout configure, or there could be a network issue. In most cases increasing the timeout would increase the changes of being able to retrieve the OIDs. SNMP timeout can be configure per MID server or directly on a probe.

View the following documents for the available parameters for probes and MID servers:

Context 

Some probes need to use context to collect information when discovering Cisco devices. For example, the probes triggered by/after SNMP - Switch probe need to pass context information in order to collect information for each vlan. Without the context only the default vlan information would be returned. 

Additional Information


Article Information

Last Updated:2019-08-05 05:46:35
Published:2019-08-05