Notifications

246 views

Symptoms


This article is aimed at working out why inputs back from MID Servers are not getting processed. Symptoms would include:

  • Ecc Queue table [ecc_queue] inputs, which are the results back from probes, remain in Ready state
  • Which in turn would cause
    • Discovery schedules get stuck and time-out
    • Service Maps don't re-discover
    • Workflows get stuck at an Orchestration Activity
    • Events no longer come into Event Management
    • Import Sets never receive the data from the JDBC Data Source
    • MID Server related lists are not updated with Stats such as the Threads list.
    • and there will be others...

Release


Any

Causes and Resolutions


The Sensor is inactive

For an input to change state from Ready to Processing to Processed, it needs a Sensor, and that is always implemented as an Insert Business Rule on the ecc_queue table. Most out-of-box features that use MID Servers will have a sensor, including:

FeatureBusiness Rule
Discovery ProbesDiscovery - Sensor 
Orchestration ActivitiesAutomation - Sensors 
MID Server systemMID - Heartbeat, MID - Process XMLStats, etc.
Import SetsJDBCProbeSensor
Event ManagementEvent Management - Connector
IntegrationsECC Queue Retry Policy 

 

Solutions:
Check that the business rule is Active.
Check the Versions related list to see if the Business Rule has been customized, and revert to the out-of-box version

There is no Sensor

If there is no sensor the input will remain in Ready state. That may be fine if it is the kind of fire-and-forget outbound integration where nothing needs to record or process the result. It's still best practice to have a sensor, even if all it does is mark the input processed.

For inputs from the RESTMessageV2 or SOAPMEssageV2 API, there is no out-of-box generic sensor. If no response is required and the API was executedAsync(), then these too can be ignored, but if the response does need processing then integrations using those APIs should have a sensor.

Solution: Implement a Sensor, so that the MID Server can be used asynchronously without causing performance issues.
ServiceNow KB: Best practices for RESTMessageV2 and SOAPMessageV2 (KB0716391)

The Scheduler workers are backlogged...

The MID Servers insert the inputs into the ECC Queue via SOAP and using the instance's API-INT semaphores, and in order not to block those most sensor business rules run 'async', meaning they run in the Scheduler Worker threads.You can check for a backlog:

  1. Navigate to System Diagnostics.
  2. On the page, find System Overview.
  3. If the scheduler is backed up, Events Pending displays in red.

To know exactly what is queued, you will find the ready and running jobs in the Scheduler table [sys_trigger], and the ecc_queue related ones will have a similar name to the sensor business rule. e.g. The "Discovery - Sensors" business rule will create sys_trigger records named "ASYNC: Discovery - Sensors". You can check that those are there:

  1. Navigate to System Scheduler -> Scheduled Jobs
  2. Add a filter for :
    • State is Ready, or State is Running
    • Next action is Today
    • Next action 'relative' on or before 1 minute ago
  3. You'll end up with a list URL something like this: 
    /sys_trigger_list.do?sysparm_query=stateIN0,1^next_action>javascript:gs.endOfYesterday()^next_actionRELATIVELE@minute@ago@1

That will give the backlog of scheduler worker jobs for today. You could filter on 'Name Contains' if you know the sensor.

If your job is there, in Ready state, then it has not been run yet. Possible causes:-

...because the instance is not keeping up with the workload

Scheduler Worker threads are a shared resource, used by all async business rules and scheduled jobs in the platform, and so something completely unrelated to the MID Server jobs could be delaying or blocking processing.

The above list would give you an idea of the Names and age of the scheduler queue. If you filter this for only State = Running jobs, you can see what is currently running in the scheduler workers.

Solution: From the list you may be able to identify some long running jobs, or identify large number of similar quick running jobs, that have in effect been blocking the queue.

Seeing what's currently running should allow you to identify particular jobs to investigate to see if they are functioning normally. e.g. An unusually large batch update of incidents could have triggered a huge cascade of email notifications to CMDB owners. You could find anything.

If the backlog gets particularly bad, you might consider opening an support incident to help you identify the cause of the backlog.

But, if nothing is currently running, it may be...

...because the instance is Paused

Normally only an Upgrade would pause the schedulers, and automatically resume them afterwards.

You can check this by running a background script as an admin user:

  1. Navigate to System Maintenance => Scripts - Background
  2. In the '' box, paste this script
gs.log(gs.isPaused());
  1. Click Run Script

If that returns "true" then someone or something set the workers on pause meaning they won't process jobs.

Solution: Check that the instance is not currently in the middle of an upgrade, and wait for it to finish if it is: Navigate to System Diagnostics -> Upgrade Monitor

If the instance is not in the middle of an upgrade, then you probably need to open a support incident, as something has gone wrong.

Article Information

Last Updated:2019-01-11 06:56:03
Published:2018-11-23