Notifications

344 views

Description

Discovery Sensor job encounter a lot of deadlocks with insert to sys_mutex for "SYSTEM_LOCK:discovery_device_history".

Symptoms:

  1. Usually turns up as a "Scheduler is overloaded or stuck" incident from monitoring.
  2. Lots of lines that looks like this:
    2018-07-29 00:00:54 (513) worker.2 worker.2 txid=b0ddb55213a3 Retrying deadlocked statement (retry number 2) while trying to execute INSERT INTO sys_mutex (`sys_id`, `sys_updated_by`, `system_id`, `sys_created_on`, `name`, `sys_mod_count`, `sys_updated_on`, `sys_created_by`) VALUES('65ddb55213a39f44e9593ff18144b0fa', '_MID_User_PROD', 'available', '2018-07-29 07:00:53', 'SYSTEM_LOCK:discovery_device_history:8e3a34d2136f5f4437b3d2f18144b050', 0, '2018-07-29 07:00:53', '_MID_User_PROD') /* disney120, gs:glide.scheduler.worker.2, tx:b0ddb55213a39f44e9593ff18144b0f6 */
  3. Deadlock at DiscoverySensor (line 345)
    where
    Line 345 in "Script Include" (sys_script_include) name "Discovery Sensor" is as follows:
    g_device.changeDeviceIssueState(deviceState, lastState, currentState, logStateChanges, sourceName, this.getEccQueueId());
  4. [optional symptom]
    The ECC input could be "SNMP - Switch - SpanningTreeTable" or "SNMP - Switch - BridgePortTable"

Steps to Reproduce

  1. Discover a Network Switch,
    • where the SNMP will pass the "SNMP - Switch - Vlan"
    • but fails with "SNMP - Switch - SpanningTreeTable" and/or "SNMP - Switch - BridgePortTable" and/or "SNMP - Switch - ForwardingTable"

The Sensor for "SNMP - Switch - Vlan" will call the "Script Include" name "DiscoveryVlanSwitchProcessor" to trigger multiple ECC output for "SNMP - Switch - SpanningTreeTable", "SNMP - Switch - BridgePortTable" and "SNMP - Switch - ForwardingTable"

If the SpanningTreeTable, BridgePortTable and ForwardingTable probes come back with errors, we reconstructed the scenario to cause the "DeviceHistory" to lock.

Workaround

This problem has been fixed. If you are able to upgrade, review the Fixed In or Intended Fix Version fields to determine whether any versions have a planned or permanent fix.

Workaround 1 - Disable the "SNMP - Switch - Vlan" probe from being triggered.

If the ECC input were "SNMP - Switch - SpanningTreeTable" or "SNMP - Switch - BridgePortTable", then disable the "SNMP - Switch - Vlan" from triggered via the "SNMP Classification" for "Standard Network Switch". 

The SNMP Classification for "Standard Network Switch" 
https://XXX.service-now.com/discovery_classy_snmp.do?sys_id=b9d852c60ab3015100c74a8e158664ca

Workaround 2 - Modification to "DiscoverySensor" script include.

Workaround to reduce discovery_device_history contention: 

Inside DiscoverySensor script include, adjust the function handleError to wrap the g_device.changeDeviceIssueState call: 

if (this.shouldUpdateDeviceIssueState(errors)) 
g_device.changeDeviceIssueState(deviceState, lastState, currentState, logStateChanges, sourceName, this.getEccQueueId()); 

Then add 'shouldUpdateDeviceIssueState' function to that script include: 

shouldUpdateDeviceIssueState: function(errors) { 
if (errors.length === 0) 
return false; 

if (errors.length !== 1) 
return true; 

if (errors[0].msg.indexOf('Adding target to blacklist') > -1) 
return false; 

if (errors[0].msg.indexOf('Target is blacklisted') > -1) 
return false; 

return true; 
},

Related Problem: PRB1296757

Seen In

There is no data to report.

Fixed In

Madrid

Associated Community Threads

There is no data to report.

Article Information

Last Updated:2019-07-15 11:38:22
Published:2019-05-21