Description
When GetMIDInfo scripted SOAP Service responds to GetCloudServiceAccountInfo requests from a MID Server, if any of the cloud_service_account_view records are missing a sa_account_id value, it hangs in a loop allocating memory until the instance application node goes OOM and restarts.
The transaction may time out before that happens, returning status 408.
cloud_service_account_view is a database view that takes its sa_account_id field from the account_id field of table cmdb_ci_cloud_service_account. That field is mandatory in the dictionary, however it is still possible to have an empty value.
This will also prevent Cloud Discovery working (e.g. AWS), as the cloud service accounts won't be synched to the mid server.
When multiple MID Servers are starting up at the same time, such as after an instance upgrade when all the mid servers restart to upgrade, this can cause API_INT semaphore exhaustion.
Steps to Reproduce
- Create a cmdb_ci_cloud_service_account record with account_id empty.
- Either:
- Start up a mid server which will make a GetMIDInfo request for GetCloudServiceAccountInfo as part of startup
- or run this background script to emulate what the GetMIDInfo scripted SOAP Service function GetCloudServiceAccountInfo does when mid servers make this request:
var cloudServiceAccountInfoUtil = new CloudServiceAccountInfoUtil();
var doc = cloudServiceAccountInfoUtil.getCloudServiceAccountInfoXML();
App node logs may show a memory watcher alert while the request runs:
glide.memory.watcher SYSTEM URL= /GetMIDInfo.do?SOAP, THREAD= API_INT-thread-5, FG= true, TYPE= 3, STATE= 1, USER= xxxx, TIME= 58,167, MEM= 0, ATTRIBUTES= {X-Transaction-Source=Session-Type=non-interactive, user-agent=internal_soap_client}
/stats.do may show all or most of the available API_INT semaphores running, if there are multiple MID Servers:
/GetMIDInfo.do?SOAP
/threads.do will confirm if the "CloudServiceAccountInfoUtil" script include is running
com.glide.caller.gen.sys_script_include_392afd336b783010da1e64ed1e44afb9_script.call
Where the request times out from the instance side, agent log will show
2023-06-08T15:33:35.417+0000 WARN (StartupSequencer) [HTTPClient:828] Socket timeout: Read timed out
2023-06-08T15:33:35.417+0000 ERROR (StartupSequencer) [InstanceSOAPClient:139] SOAP Request: <SOAP-ENV:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tns="http://www.service-now.com/GetMIDInfo" xmlns:m="http://www.service-now.com" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body><m:execute><agent xsi:type="xsd:string">9f7fe439871fed902698eb9f8bbb35ff</agent><purpose xsi:type="xsd:string">GetCloudServiceAccountInfo</purpose></m:execute></SOAP-ENV:Body></SOAP-ENV:Envelope>
2023-06-08T15:33:35.417+0000 ERROR (StartupSequencer) [InstanceSOAPClient:139] SOAP Response: Status code=0, Response body=null
2023-06-08T15:33:35.417+0000 ERROR (StartupSequencer) [CloudServiceAccountInfo:385] Failed to retrieve cloud service account information as XML payload from instance
com.snc.midserver.midinfo.api.MIDServerInfoException: GetCloudServiceAccountInfo: non-retryable html response (clientError=Socket timeout)
at com.snc.midserver.midinfo.internal.MIDServerInfoRemote.request(MIDServerInfoRemote.java:154)
at com.snc.midserver.midinfo.internal.MIDServerInfoRemote.getCloudServiceAccountInfo(MIDServerInfoRemote.java:217)
at com.service_now.mid.services.cloud.CloudServiceAccountInfo.load(CloudServiceAccountInfo.java:383)
at com.service_now.mid.services.cloud.CloudServiceAccountInfo.reloadDataFromInstance(CloudServiceAccountInfo.java:108)
at com.service_now.mid.services.cloud.CloudServiceAccountInfo.<init>(CloudServiceAccountInfo.java:64)
at com.service_now.mid.services.cloud.CloudServiceAccountInfo.get(CloudServiceAccountInfo.java:95)
at com.service_now.mid.services.StartupSequencer.startServices(StartupSequencer.java:357)
at com.service_now.mid.services.StartupSequencer.testsSucceeded(StartupSequencer.java:177)
at com.service_now.mid.services.StartupSequencer.startupSequencerRunnable(StartupSequencer.java:741)
at java.base/java.lang.Thread.run(Thread.java:829)
2023-06-08T15:33:35.417+0000 WARN (StartupSequencer) [CloudServiceAccountInfo:112] Was not able to load/reload cloud service account information from the instance.
and app node log will show
2023-06-09 02:22:41 (764) API_INT-thread-2 9A156C8A87D3E11434E864A80CBB35E6 txid=b225e8c21f33 tx_pattern_hash=782662456 EXCESSIVE *** End #1162251 /GetMIDInfo.do, user: mid_user, total time: 0:03:01.247,
processing time: 0:03:01.247, CPU time: 0:00:00.015, SQL time: 0:00:00.018 (count: 15), source: 54.170.3.139, type: soap, origin scope: global , method:POST, api_name:SOAP APIs, resource:GetMIDInfo.do, user_id:xxx, response_status:408
Workaround
This problem is currently under review and targeted to be fixed in a future release. Subscribe to this Known Error article to receive notifications when more information will be available.
To workaround the problem, please delete any cmdb_ci_cloud_service_account record with account_id empty, or update them with an account_id value.
https://<instance name>.service-now.com/cmdb_ci_cloud_service_account_list.do?sysparm_query=account_idISEMPTY&sysparm_view=
Related Problem: PRB1665907