Notifications

767 views

Description

Discovery's Shazzam Probe can hang on the HTTPS scanner, keeping hold of a 'lock' which in turn will block other MID Server worker threads.

Immediate symptoms are stuck DNS Probes and Shazzam Probes in MID Server Worker-Standard threads, which leads to all MID Server threads becoming blocked and Discovery via that MID Server will stall, leading to Discovery Schedules cancelling due to reaching Max Runtime.

This product defect affects the following releases:

  • Kingston Patch 10 to Patch 13
  • London Patch 3 to Patch 5
  • Madrid Patch 0

 

Steps to Reproduce

  1. On an affected Releases, run a lot of discovery.
  2. Sooner or later, some Discovery Schedules will be cancelled due to reaching the maximum run-time
  3. On inspecting the ecc_queue, inputs back from a MID Server for the discovery schedule will have stopped some time before then.
  4. On inspecting the ecc_queue for the MID Server's queue.stats/XMLStats inputs, for a time before the schedule was cancelled, you will see at least one Shazzam thread, and lots of other DNS threads listed.
  5. Using 'Get MID Thread Dump' UI action, or jstack from the command line, will show the BLOCKED threads, and the Shazzam job that is the cause:

This is what a Shazzam probe that has got stuck looks like. Note that this code has taken a Lock:

2018/12/11 09:38:26 | "Worker-Standard:Shazzam-bcd0b839139ea340005276d66144b0a5" #340 daemon prio=5 os_prio=0 tid=0x0000000022a68800 nid=0x2020 waiting on condition [0x000000002e21e000]
2018/12/11 09:38:26 | java.lang.Thread.State: WAITING (parking)
2018/12/11 09:38:26 | at sun.misc.Unsafe.park(Native Method)
2018/12/11 09:38:26 | - parking to wait for <0x00000006c62e9a18> (a java.util.concurrent.CountDownLatch$Sync)
2018/12/11 09:38:26 | at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
2018/12/11 09:38:26 | at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
2018/12/11 09:38:26 | at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
2018/12/11 09:38:26 | at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
2018/12/11 09:38:26 | at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
2018/12/11 09:38:26 | at sun.nio.ch.PendingFuture.get(PendingFuture.java:180)
2018/12/11 09:38:26 | at com.service_now.mid.probe.shazzam.scanners.HTTPS.completeHandshake(HTTPS.java:259)&#13; <<-- This is where the problem is
2018/12/11 09:38:26 | at com.service_now.mid.probe.shazzam.scanners.HTTPS.nextPhase(HTTPS.java:100)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.shazzam.PortScannerEngine.run(PortScannerEngine.java:69)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.Shazzam.processChunk(Shazzam.java:169)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.ShazzamBase.syncProcessChunk(ShazzamBase.java:71)&#13;
2018/12/11 09:38:26 | - locked &lt;0x00000006c3716a48&gt; (a java.lang.Class for com.service_now.mid.probe.ShazzamBase)&#13; <<-- This is what blocks the other theads
2018/12/11 09:38:26 | at com.service_now.mid.probe.Shazzam.probe(Shazzam.java:139)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.AProbe.process(AProbe.java:96)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:125)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&#13;
2018/12/11 09:38:26 | at java.lang.Thread.run(Thread.java:748)&#13;

This will then cause other DNS Threads to become BLOCKED, because they are waiting for that same Lock:

2018/12/11 09:38:26 | "Worker-Standard:DNS-c4142eb5db92ab80958e5bc0cf9619ce" #345 daemon prio=5 os_prio=0 tid=0x0000000022a64800 nid=0x2140 waiting for monitor entry [0x000000002e41e000]&#13;
2018/12/11 09:38:26 | java.lang.Thread.State: BLOCKED (on object monitor)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.ShazzamBase.syncProcessChunk(ShazzamBase.java:70)&#13;
2018/12/11 09:38:26 | - waiting to lock &lt;0x00000006c3716a48&gt; (a java.lang.Class for com.service_now.mid.probe.ShazzamBase)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.DNS.probe(DNS.java:90)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.AProbe.process(AProbe.java:96)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:125)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&#13;
2018/12/11 09:38:26 | at java.lang.Thread.run(Thread.java:748)&#13;
2018/12/11 09:38:26 | &#13;
2018/12/11 09:38:26 | "Worker-Standard:DNS-57c6e2fddb92ab80958e5bc0cf9619f7" #342 daemon prio=5 os_prio=0 tid=0x0000000022a62800 nid=0x1924 waiting for monitor entry [0x000000002e31e000]&#13;
2018/12/11 09:38:26 | java.lang.Thread.State: BLOCKED (on object monitor)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.ShazzamBase.syncProcessChunk(ShazzamBase.java:70)&#13;
2018/12/11 09:38:26 | - waiting to lock &lt;0x00000006c3716a48&gt; (a java.lang.Class for com.service_now.mid.probe.ShazzamBase)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.DNS.probe(DNS.java:90)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.probe.AProbe.process(AProbe.java:96)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:125)&#13;
2018/12/11 09:38:26 | at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)&#13;
2018/12/11 09:38:26 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)&#13;
2018/12/11 09:38:26 | at java.lang.Thread.run(Thread.java:748)&#13;

Workaround

This is now fixed in our code, and you can refer to the list at the bottom of this KB article for which patches this is fixed in.

In the meantime, you can implement the following workaround to disable winrm_ssl port probe. This will avoid running the code that gets stuck.

Note: This will prevent Windows Servers that only have WinRM enabled being scanned. If WMI port 35 is also enabled then the Windows server/computer can still be scanned.

  1. Navigate to Discovery Definition - Port Probes
  2. Open the record named 'winrm_ssl'
    /discovery_port_probe.do?sys_id=d4a7f3f29fe02300809adecf857fcf44
  3. Uncheck the following fields:
    • Active
    • CIs
    • IPs. 
  4. Save.
  5. Restart all affected MID Servers use by Discovery.

Related Problem: PRB1320230

Seen In

There is no data to report.

Intended Fix Version

New York

Fixed In

Kingston Patch 14
London Patch 6
Madrid Patch 1

Safe Harbor Statement

This "Intended Fix Version" information is meant to outline ServiceNow's general product direction and should not be relied upon in making a purchasing decision. The information provided here is for information purposes only and may not be incorporated into any contract. It is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for our products remains at ServiceNow's sole discretion.

Associated Community Threads

There is no data to report.

Article Information

Last Updated:2019-05-21 11:43:26
Published:2019-02-19