Notifications

261 views

Description

Description


Occasionally a "ASYNC: Discovery - Sensors" scheduler worker job may be identified as cause a performance issue for the instance, perhaps due to excessive resource usage e.g. memory, database, run-time etc.  That sensor runs for most jobs run via a MID Server, which covers a vast range of different kinds of jobs and processing code.

To debug those it is vital to know which specific ecc_queue input record that job is running for. This KB explains how to find that, and what that ECC Queue record can then tell you for searching for possible causes.

Procedure


  1. /stats.do and /threads.do will list what Scheduler Workers are running currently. Save those for later, as this can tell how far processing got.
  2. Find the sys_trigger record. While a worker thread is running, there should be a sys_trigger record in Running state. 
    • Name = ASYNC: Discovery - Sensors
    • State = Running
    • Claimed by = <the specific application node that is running the worker thread>
    • Updated on = <the time the worker thread started running>
    • Document key => the sys_id of the ECC Queue Input record.

  1. Find the ECC Queue input record(s)
    • Go to: ECC -> Queue (/ecc_queue_list.do)
    • Filter the list for Sys ID 'is one of' and list the Document key values from the sys_trigger records

  1. Export those records as XML for later, which will also include any large payload attachment records.
  2. Analyse the Topic, Name, Source and Payload fields for details of what the job is, before coming to any conclusions as to what code or known problems may be involved.
  3. If the Topic and Name are not clear, then you can look this up in the Discovery Definition -> Probes [discovery_probes] Probes table. This includes both Discovery and other feature's probes, if you remove the filter

 

Additional Information on the examples used


The 4 examples in the screenshot are very different jobs. The first 3 are for a Discovery. The agent_correlator field of those records will have the sys_id of a specific record in the Discovery Status table [discovery_status]. That in turn will link to the Discovery Schedule [discovery_schedule] that is running.

"Shazzam" is the port scanner of Discovery, and this input payload may include the results of scanning 5000 or more target devices on many different ports.

"MultiProbe" could be running one of many different Discovery Multi-Probes, which contains several individual probes. In this case the "Name" field tells us it is specifically the ADM probe for Windows, which includes other probes for data on all running processes, and netstat data, which can be huge. This job will end up matching processes to network connections to automatically links CIs together. Huge CMDB tables are involved.

"HorizontalDiscoveryProbe" is what runs any Service Mapping "Patterns" during a normal Discovery schedule. In this case the Name shows its the Pattern for Windows desktops. Patterns include a lot of individual commands and the sum of the data can be huge. This could have been for another pattern, perhaps a Network Switch, which can bring vast amounts of data back to process.

The 4th "RESTProbe" isn't even Discovery. That's an input back from an Outbound REST Message via MID Server job. If there is not a custom sensor to set that ecc input as processed before the discovery sensor runs, and has a huge payload,  then it could also end up with a long running Discovery - Sensors job.

All of those have been known to run for a long time and use a lot of memory, and there will be other. In each case specific issues were identified for that particular job, and problem tickets were created and fixed.

Article Information

Last Updated:2019-08-27 06:11:46
Published:2019-08-27