Notifications

596 views

Description

Discovery - Sensors business rule condition is the result of the script include method AutomationEccSensorConditions.discovery(), which is to "Determine if the given ecc_queue record should be processed by the Discovery sensor processor....returns: true if sensor script should be run, otherwise false". This returns true for non-discovery inputs, which means an "ASYNC: Discovery - Sensors" job will be scheduled and run on an unexpected payload, causing performance and other side-effects.

This will return true for:
- SOAPProbe and RESTProbe
- JDBCProbeCompleted
and possibly others

Unnecessarily running the sensor has side effects:

1/
It will lead to DiscoverySensorJob.process loading the payload into memory, potentially blocking a scheduler worker for a long time and using all the available memory of the instance node.
Discovery limits itself to 5MB by default, but other probes such as RESTProbe don't. These payloads can potentially be huge as a result.
Setting ECC Queue parameter skip_sensor=true will not avoid this, as the payload needs loading before the sensor code can check for that value.

2/
It will lead to DiscoverySensorJob.process executing DiscoveryStatus.updateStatusCompletedCount() being run with a parameter that is either blank or a value that is not a discovery_status record sys_id, causing "WARNING *** Get for non-existent record: discovery_status" errors in the logs. That code does a .get() with the agent_correlator value without then checking that a valid discovery_status record was got.

3/
ECC Queue Inputs for non-Discovery jobs may be set as State="Error", and Error string="No sensor defined"

Steps to Reproduce

For the Get for non-existent record errors:

1/ Add a log statement to DiscoverySensorJob script include just before the "DiscoveryStatus.updateStatusCompletedCount" line, to log the ecc_queue record sys_id and agent_correlator values.
2/ Run non-Discovery jobs, such as Data Source imports or SOAPProbe commands via the MID Server.
3/ Note the log statements showing this dicsovery sensor code ran for those non-discovery sensor jobs and attempted to update a Discovery Status record that did not exist.

The node logs show the following kinds of errors when the updateStatusCompletedCount function runs:

2017-07-16 21:45:14 (745) worker.3 worker.3 WARNING *** WARNING *** Get for non-existent record: discovery_status:null, initializing
2017-07-16 21:45:14 (746) worker.0 worker.0 WARNING *** WARNING *** Get for non-existent record: discovery_status:null, initializing
2017-07-16 21:45:14 (762) worker.4 worker.4 WARNING *** WARNING *** Get for non-existent record: discovery_status:null, initializing
2017-07-16 21:45:14 (765) worker.1 worker.1 WARNING *** WARNING *** Get for non-existent record: discovery_status:null, initializing
2017-07-16 21:45:35 (082) worker.4 worker.4 WARNING *** WARNING *** Get for non-existent record: discovery_status:null, initializing
2017-07-16 22:00:07 (566) worker.7 worker.7 WARNING *** WARNING *** Get for non-existent record: discovery_status:04d20009db4c8f805827f5b31d9619ce, initializing
2017-07-16 22:00:56 (944) worker.0 worker.0 WARNING *** WARNING *** Get for non-existent record: discovery_status:d3f24009db4c8f805827f5b31d9619d4, initializing
2017-07-17 22:00:06 (583) worker.7 worker.7 WARNING *** WARNING *** Get for non-existent record: discovery_status:e56c8d11db4803c05827f5b31d961942, initializing
2017-07-17 22:01:05 (811) worker.5 worker.5 WARNING *** WARNING *** Get for non-existent record: discovery_status:979c0151db4803c05827f5b31d961941, initializing

For the high memory issue when payloads are large:

1/ Run a REST Message via a MID Server, that returns a large e.g. 50MB payload.
2/ Note the instance available memory fall, perhaps also causing a node restart.
3/ Stats.do/therads.do will show a long running "ASYNC: Discovery - Sensors" worker thread, spending a long time running "createProbeResponse(Probe.java:262)"

App node localhost log example, for a RESTProbe input with 50MB payload:

2019-02-13 22:46:03 (160) worker.2 worker.2 txid=966873a6db23 Starting: ASYNC: Discovery - Sensors.f0183762db232300cb717f698c961990, Trigger Type: Once, Priority: 110, Upgrade Safe: false, Repeat:
2019-02-13 22:46:03 (160) worker.2 worker.2 txid=966873a6db23 Name: ASYNC: Discovery - Sensors
2019-02-13 22:46:05 (342) worker.2 worker.2 txid=966873a6db23 [0:00:02.156] Load attachment of: ecc_queue c5f73762db232300cb717f698c96198d, size 51514361
2019-02-13 22:49:17 (605) worker.2 worker.2 txid=966873a6db23 WARNING *** WARNING *** Long Transaction started at 02/13/19 22:46:03.157, Memory at start was 830, Memory is 1,954, SQL count is 24, BR count is now 0.
2019-02-13 22:49:17 (606) worker.2 worker.2 txid=966873a6db23 WARNING *** WARNING *** Transaction cancelled: Available memory is almost depleted
2019-02-13 22:49:17 (620) worker.2 worker.2 txid=966873a6db23 SEVERE *** ERROR *** *** Script: SensorProcessor failed for ECC queue record c5f73762db232300cb717f698c96198d - com.glide.sys.TransactionCancelledException: Transaction cancelled: Available memory is almost depleted

Stack trace for the above, which is while Probe.java is still loading the payload XML for initializing SensorProcessor.java, which is the first thing the "ASYNC: Discovery - Sensors" job does at the start of processing:

main,glide.scheduler.worker.2,4,ASYNC: Discovery - Sensors (43543 ms) 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
com.glide.util.XMLUtil.getDocument(XMLUtil.java:366)
com.glide.util.XMLUtil.load(XMLUtil.java:331)
com.glide.util.XMLUtil.parse(XMLUtil.java:240)
com.glide.util.XMLUtil.parse(XMLUtil.java:224)
com.glide.util.XMLUtil.parse(XMLUtil.java:220)
com.glide.util.XMLUtil.parse(XMLUtil.java:211)
com.snc.core_automation.Probe.setRawPayload(Probe.java:574)
com.snc.core_automation.Probe.createProbeResponse(Probe.java:262)
com.snc.discovery.SensorProcessor.init(SensorProcessor.java:154)
com.snc.discovery.SensorProcessor.<init>(SensorProcessor.java:117)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
org.mozilla.javascript.MemberBox.newInstance(MemberBox.java:184)
org.mozilla.javascript.NativeJavaClass.constructInternal(NativeJavaClass.java:279)
org.mozilla.javascript.NativeJavaClass.constructSpecific(NativeJavaClass.java:218)
org.mozilla.javascript.NativeJavaClass.construct(NativeJavaClass.java:176)
org.mozilla.javascript.ScriptRuntime.newObject(ScriptRuntime.java:2449)
org.mozilla.javascript.ScriptRuntime.newObjectEx(ScriptRuntime.java:2464)
org.mozilla.javascript.gen.sys_script_include_78dfb2dd536002001f175f43911c087d_script_1265._c_anonymous_2(sys_script_include.78dfb2dd536002001f175f43911c087d.script:12)
org.mozilla.javascript.gen.sys_script_include_78dfb2dd536002001f175f43911c087d_script_1265.call(sys_script_include.78dfb2dd536002001f175f43911c087d.script)
org.mozilla.javascript.ScriptRuntime.doCall2(ScriptRuntime.java:2650)
org.mozilla.javascript.ScriptRuntime.doCall(ScriptRuntime.java:2590)
org.mozilla.javascript.optimizer.OptRuntime.callProp0(OptRuntime.java:85)
org.mozilla.javascript.gen.sys_trigger_f0183762db232300cb717f698c961990_1264._c_script_0(sys_trigger.f0183762db232300cb717f698c961990:2)
org.mozilla.javascript.gen.sys_trigger_f0183762db232300cb717f698c961990_1264.call(sys_trigger.f0183762db232300cb717f698c961990)
org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:563)
org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3428)
org.mozilla.javascript.gen.sys_trigger_f0183762db232300cb717f698c961990_1264.call(sys_trigger.f0183762db232300cb717f698c961990)
org.mozilla.javascript.gen.sys_trigger_f0183762db232300cb717f698c961990_1264.exec(sys_trigger.f0183762db232300cb717f698c961990)
com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:279)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:118)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:82)
com.glide.script.Evaluator.evaluatePossiblePrefixedString(Evaluator.java:204)
com.glide.job.RunScriptJob.evaluateScript(RunScriptJob.java:163)
com.glide.job.RunScriptJob.execute(RunScriptJob.java:84)
com.glide.schedule.JobExecutor.executeJob(JobExecutor.java:103)
com.glide.schedule.JobExecutor.execute(JobExecutor.java:89)
com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:244)
com.glide.schedule.GlideScheduleWorker.lambda$process$32(GlideScheduleWorker.java:169)

Workaround

This issue is under review. If you are experiencing this problem, contact ServiceNow Customer Support.

ECC Queue Inputs for non-Discovery jobs may be set as State="Error", and Error string="No sensor defined" with REST and SOAP Message jobs can usually be avoided by adding payload parameter skip_sensor=true.
See KB0727028 - Why is State "Error" and "No sensors defined" in the ECC Queue for outbound REST/SOAP Message via MID Server?


Related Problem: PRB1113671

Seen In

There is no data to report.

Fixed In

New York

Associated Community Threads

There is no data to report.

Article Information

Last Updated:2019-07-26 22:34:25
Published:2019-02-21