Notifications

1255 views

Description

Discovery.device.complete events may take a long time to process, which causes the event queue to be backed up.

Release or Environment

All currently supported releases.

Cause

Layer 2 and 3 mapping

The discovery.device.complete event is created when the discovery of a device completes. This event is used to trigger post discovery processing. The post discovery could include creating layer 2 and layer 3 connections, or possibly custom operations. Creating layer 2 and 3 relationships requires a lot of processing, going through different tables etc.

See the following documents for more information about layer 2 discovery and layer 3 mapping:

The layer 2 discovery creates "IP Connection" and "Connects To" relationships which can be seen on the BSM. To view these relationships, open the dependency view for the CI. On the dependency view from one of the CIs select the "Physical Network Connections" option for the "Dependency Type" in the map settings.

The layer 3 relationship provides a logical mapping of the TCP/IP layer for network gears. It loops through the IP addresses of the discovered CI. For each IP address, it searches for router interfaces of classes cmdb_ci_ip_switch and cmdb_ci_ip_router where lo_ip <= IP address <= hi_ip. If a match is found, a relationship is created from the router/switch to the CI.

Example stack excerpt seen when the issue is due to layer 3 mapping:

org.mozilla.javascript.gen.sys_script_include_0aa00c84ef52210098d5925495c0fb7a_script_3781766._c_anonymous_5(sys_script_include.0aa00c84ef52210098d5925495c0fb7a.script:105)
org.mozilla.javascript.gen.sys_script_include_0aa00c84ef52210098d5925495c0fb7a_script_3781766.call(sys_script_include.0aa00c84ef52210098d5925495c0fb7a.script)
org.mozilla.javascript.ScriptRuntime.doCall2(ScriptRuntime.java:2651)
org.mozilla.javascript.ScriptRuntime.doCall(ScriptRuntime.java:2590)
org.mozilla.javascript.optimizer.OptRuntime.callProp0(OptRuntime.java:85)
org.mozilla.javascript.gen.sys_script_include_0aa00c84ef52210098d5925495c0fb7a_script_3781766._c_anonymous_2(sys_script_include.0aa00c84ef52210098d5925495c0fb7a.script:27)
org.mozilla.javascript.gen.sys_script_include_0aa00c84ef52210098d5925495c0fb7a_script_3781766.call(sys_script_include.0aa00c84ef52210098d5925495c0fb7a.script)
org.mozilla.javascript.ScriptRuntime.doCall2(ScriptRuntime.java:2651)
org.mozilla.javascript.ScriptRuntime.doCall(ScriptRuntime.java:2590)
org.mozilla.javascript.optimizer.OptRuntime.callProp0(OptRuntime.java:85)
org.mozilla.javascript.gen.sysevent_script_action_479933b7ef42210098d5925495c0fb96_script_3473637._c_anonymous_1(sysevent_script_action.479933b7ef42210098d5925495c0fb96s.cript:18)
org.mozilla.javascript.gen.sysevent_script_action_479933b7ef42210098d5925495c0fb96_script_3473637.call(sysevent_script_action.479933b7ef42210098d5925495c0fb96.script)
org.mozilla.javascript.ScriptRuntime.doCall2(ScriptRuntime.java:2651)
org.mozilla.javascript.ScriptRuntime.doCall(ScriptRuntime.java:2590)
org.mozilla.javascript.optimizer.OptRuntime.call0(OptRuntime.java:23)
org.mozilla.javascript.gen.sysevent_script_action_479933b7ef42210098d5925495c0fb96_script_3473637._c_script_0(sysevent_script_action.479933b7ef42210098d5925495c0fb96.script:1)
org.mozilla.javascript.gen.sysevent_script_action_479933b7ef42210098d5925495c0fb96_script_3473637.call(sysevent_script_action.479933b7ef42210098d5925495c0fb96.script)
org.mozilla.javascript.gen.sysevent_script_action_479933b7ef42210098d5925495c0fb96_script_3473637.exec(sysevent_script_action.479933b7ef42210098d5925495c0fb96.script)

CI with large number of IP Addresses

Some CIs may contains multiple IP addresses, dozens or more. If a device is discovered via multiple interfaces, each interface will trigger a device.discovery.complete event. Since the processing, layer 3 for example, loops through the IP addresses of the CI discovered there is a geometric increase in processing in relation to the number of IP addresses of a given CI when such CI is discovered via all interfaces on the same discovery status.

Find CIs which had discovery.device.complete events triggered multiple times

Note: Always test any script in non-production first.

  1. Navigate to scripts - background, "System Definition > Scripts - Background".
  2. Run script:
    var ga = new GlideAggregate('discovery_device_history');
    ga.addAggregate('count', 'cmdb_ci');
    ga.orderByAggregate('count', 'cmdb_ci');
    ga.orderBy('cmdb_ci');
    ga.addQuery('sys_created_on','>=','javascript:gs.beginningOfToday()');
    ga.addQuery('sys_created_on','<=','javascript:gs.endOfToday()');
    ga.addNotNullQuery('cmdb_ci');
    ga.addHaving('count','>','50');
    ga.query();
    while (ga.next()) {
    gs.print(ga.getValue('cmdb_ci') + ' ' + ga.getAggregate('count', 'cmdb_ci'));
    }
  3. Result will display list of CIs which had over 50 device histories triggered on today.

Find discovery.device.complete events which took a long time to process

  1. Navigate to sysevent table, "System Logs > Events".
  2. Show events where field name = discovery.device.complete.
  3. Sort by "Processing Duration" to get the events which took the longest to process.
  4. Event Parm1 is the sys_id of the discovery_device_history record.
  5. Event Parm2 is the sys_id of the discovery_status record.
  6. The cmdb_ci for which the L2 and L3 mapping took long to process can be found in the discovery_device_history record.

Resolution

Turn off L2 Mapping:

If the "Physical Network Connections" for the CI dependency view is not used, layer 2 discovery can be turned off via system property sa.create_physical_connections.active (set the value to false). The impact would be that such relationships would no longer be created.

Turn off L3 Mapping:

Often the relationships created by layer 3 mapping are not clear to the team managing the CMDB. The layer 3 mapping can be turned off via system property glide.discovery.L3_mapping (set the value to false). The impact would be that such relationships would no longer be created.

Create a dedicated queue for discovery.device.complete

Open a support incident to have our Performance team create a separate event queue only for “discovery.device.complete” events, so that a delay in processing these events do not create a backlog for other system events.

Stop duplicated L2/L3 mapping for the same CI

Update the discovery.device.complete script actions, "System Policy > Events > Script Actions", to check on the last_state of the discovery device history. If the last state is "Identified, ignored extra IP" there should be no need to process this event, as it is already processed via another discovery device history event.

For example, add following to script action "Discovery - map device to netgears":

var lastState = current.last_state;
if (lastState == "Identified, ignored extra IP"){
return;
}

Script action "Discovery - map device to netgears" start would look like:

(function() {
var ciid = current.cmdb_ci + '';
if (!ciid)
return;

var lastState = current.last_state;
if (lastState == "Identified, ignored extra IP"){
return;
}

Apply patches which resolve issues regarding L2 and L3 mapping:

There have been several Known Problems which could cause slowness when processing discovery.device.complete events. Check the following articles to make sure you are not affected by them:

  • PRB1319185: Performance improvements for the layer 2 connections algorithm
  • PRB1309396: Issues with the event queue performance due to slow discovery.device.complete events
  • PRB1371401: Script Include 'DeviceL3Mapping' may trigger slow queries on CMDB
  • PRB1334573: If the macAddress parameter is empty or null, a query will display all ports
  • PRB1397581: Discovery.device.complete events may take a long time to process when CI has large number of IP addresses

Additional Information

Other knowledge articles which relate to discovery.device.complete:

Article Information

Last Updated:2020-05-28 05:35:44
Published:2020-05-28