MID server can go down due to a thread leak that causes the MID server to run out of memory and go down.
Steps to Reproduce
Unfortunately, there are no definitive steps to reproduce.
- MID server is down and/or out of memory (see status on the MID server record)
- Thread count increasing for the affected MID with the majority of the threads being HttpClient threads.
- When the MID server is unable to subscribe to the AMB channel, the AMB channel connection failures will cause the thread count to increase. (The increasing thread count is the cause of the memory leak on the MID server and can increase to over 7,000 threads.)
- Periodically restart the MID server or restart when it goes down
- Upgrade to a fixed version. (Note that a previous fix was not universally successful.)
- Please reach out to Customer Support if you are on a version that does not have any fixes
This issue is under review. To receive notifications when more information is available, subscribe to this Known Error article by clicking the Subscribe button at the top right of the article. If you are able to upgrade, review the Fixed In field to determine whether any versions have a permanent fix.The problem is still under investigation and a fix is being worked on.
How do I confirm the fix?
- Go to the MID Server record (for affected MID) > Confirm that the number of HTTP threads is not increasing
- Prior to this fix the number of HTTP threads would increase 1,000+ until the MID server would run out of memory
- If AMB is able to successfully make a connection there should be around 7 active HTTP threads
- This thread leak was exposed by another issue with the AMB client on the MID server where we were NOT able to successfully subscribe to an AMB channel. Prior to this fix, each time we would attempt to re-connect we would leak threads thus causing the MID server to run out of memory and crash.
- The fix for this PRB is ONLY to prevent leaking threads. It does not fix the subscription issue.
- When AMB fails to connect the MID will revert back to querying the ecc_queue table to process input/output. This is done in intervals of 40 seconds by default, however, if there is a need to increase this polling period it is controlled by the "mid.poll.time" system property.
Related Problem: PRB732813