Notifications

125 views

Description

A MID server can fail to startup with an Out-of-Memory error shown in the logs if there are a large number of XML files in the ECC Sender folder.

Each XML record is a queued result/response that needs to be sent back to the instance to become an ecc_queue table input record. All probes create an XML file for their result, but jobs such as JDBCProbe can produce a huge number of these small input files for each output job. Generally, the results are split into 200 rows per file/XML payload, so for large imports, this can be a lot. If a MID Server cannot get the inputs back to the instance as quickly as they are being generated then a backlog will build up. That could be caused by a slow connection (compared to the connection to the target server), but also a loss of connection to the instance because the MID Server will continue running the jobs it has already taken in that situation and continue building up the backlog.

When a MID Server starts up, the ECCSender thread is started and the first thing it does is do a directory listing of the files in the .\agent\work\monitors\ECCSender folders so that it can send those previous results to the instance. That listing itself can take a huge amount of CPU time and Memory.

The MID Server agent logs will include something like this. "java.io.WinNTFileSystem.canonicalizeWithPrefix" is the folder listing command (a bit like 'dir'):

10/03/18 07:13:02 (378) StartupSequencer SEVERE *** ERROR *** java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.glide.util.ClassUtil.newInstance(ClassUtil.java:170)
at com.service_now.mid.services.Monitors.createMonitor(Monitors.java:225)
at com.service_now.mid.services.Monitors.loadInternalMonitor(Monitors.java:155)
at com.service_now.mid.services.Monitors.loadInternalMonitors(Monitors.java:124)
at com.service_now.mid.services.Monitors.start(Monitors.java:58)
at com.service_now.mid.services.Monitors.onMIDServerEvent(Monitors.java:343)
at com.service_now.mid.services.Events.internalFire(Events.java:102)
at com.service_now.mid.services.Events.fire(Events.java:34)
at com.service_now.mid.services.StartupSequencer.startServices(StartupSequencer.java:201)
at com.service_now.mid.services.StartupSequencer.testsSucceeded(StartupSequencer.java:103)
at com.service_now.mid.services.StartupSequencer.access$100(StartupSequencer.java:53)
at com.service_now.mid.services.StartupSequencer$Starter.run(StartupSequencer.java:305)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.WinNTFileSystem.canonicalizeWithPrefix(WinNTFileSystem.java:451)
at java.io.WinNTFileSystem.canonicalize(WinNTFileSystem.java:422)
at java.io.File.getCanonicalPath(File.java:618)
at java.io.FilePermission$1.run(FilePermission.java:215)
at java.io.FilePermission$1.run(FilePermission.java:203)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.FilePermission.init(FilePermission.java:203)
at java.io.FilePermission.<init>(FilePermission.java:277)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at com.service_now.mid.security.MIDSecurityManager.checkRead(MIDSecurityManager.java:31)
at java.io.File.isFile(File.java:877)
at com.service_now.monitor.ECCSenderQueueFile.init(ECCSenderQueueFile.java:49)
at com.service_now.monitor.ECCSenderQueueFile.<init>(ECCSenderQueueFile.java:37)
at com.service_now.monitor.ECCSenderCache.getQueueFiles(ECCSenderCache.java:589)
at com.service_now.monitor.ECCSenderCache.<init>(ECCSenderCache.java:100)
at com.service_now.monitor.ECCSender.<init>(ECCSender.java:84)

Steps to Reproduce

This is not going to be easy to reproduce.

  • One possible way would be to create a few million fake XML files. The content is not actually important as we fail list to them, before starting to process them. Then Start the MID Server.
  • Alternatively, setting up a few hundred JDBCProbe import set data source jobs with millions of small rows each, then let the MID Server start processing them, and then block the network connection between the MID and instance. Once the ECC Sender folder has got huge, restart the MID.

Workaround

The files blocking the MID Server from starting are within the install folder of the MID Server. 
\agent\work\monitors\ECCSender\output_2

  1. Rename folder agent\work\monitors\ECCSender\output_2 to agent\work\monitors\ECCSender\output_2_OLD
  2. Create a new empty agent\work\monitors\ECCSender\output_2 folder to replace it
  3. Start the MID Server

The payloads will now not be sent back to the instance. The jobs that generated those results may need running again.

The number in the folder name equates to the priority of the job. You may find records in the other folders as well.

If there is vital data in the records, then it could be searched, based on the XML file contents to identify them, and then each selectively moved back to the folder.

Note: The system property "glide.mid.max.sender.queue.size" is not a workaround for this, as there could be millions of very small files, and still be below that limit. This just prevents the ECC Sender folder from getting too large (size of data), but does not control the number of files. Depending on the size of the payloads, which for JDBC Data with small rows will be very small, there can be a huge number of records before the size limit is reached.


Related Problem: PRB1316111

Seen In

Fuji Patch 7 Hot Fix 5
Fuji Patch 9 Hot Fix 1

Intended Fix Version

Paris

Safe Harbor Statement

This "Intended Fix Version" information is meant to outline ServiceNow's general product direction and should not be relied upon in making a purchasing decision. The information provided here is for information purposes only and may not be incorporated into any contract. It is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for our products remains at ServiceNow's sole discretion.

Associated Community Threads

There is no data to report.

Article Information

Last Updated:2019-10-11 13:33:57
Published:2019-05-24