10813 views

Description

How does the MID server upgrade process work? Knowing this will help you debug if it goes wrong, and identify exactly where it went wrong.

This article describes the process that takes place when the MID server auto-upgrades, which should happen immediately after an instance upgrade of a patch finishes.

This is based on a Windows host and Paris version. The Linux process is basically the same, but with different temp folder locations and using shell scripts instead of batch files.

MID Server Upgrade Process on a Windows host

  1. The Upgrade check is triggered in one of these ways:
    • By the StartupSequencer thread when the MID Server starts up.
    • At the end of an Instance patch or upgrade, when the instance sends a topic=SystemCommand, source=autoUpgrade job to all MID Servers that are UP at the time via the ECC Queue.
      • Other MID Server related plugins may also send Restart requests to the MID Server at this time.1.1
    • Every hour, when the MID Server's "AutoUpgrade.3600" thread runs. You cannot stop that check happening.
    • If the "Upgrade MID" related link is clicked on the MID Server form.
  2. "Checking to see if MID server needs to upgrade." will be written to the MID Server agent log, and the instance is queried to find out what version the MID Server should be:
    • The MID Buildstamp will be reported by the Instance App node that the MID Server is connected to. This is derived from the glide.war and glide.war.assigned system properties.2.1,2.2,2.3
    • The "MIDAssignedPackages" Scripted SOAP Service is used for the request. This needs to be the out-of-box version, and Active.
    • The instance stats.do page will also report the MID Buildstamp.
    • If the assigned version is older than the installed version, then a downgrade is attempted, however, in some cases this will cause problems. The older instance version may not have the newer code/APIs that the MID Server was expecting when starting the downgrade process, while the MID Server is still running future code.
    • If MID Server Parameter "mid.pinned.version" is set, then this will override the instance version. 2.4

The MID Server agent log will report something like this. 'Missing' will only include a mid-JRE entry if that is also to be upgraded:

Current packages:
 Installed: [mid-core.madrid-12-18-2018__patch1-hotfix2-03-14-2019_03-20-2019_1304.universal.universal.zip, mid-jre.madrid-12-18-2018__patch1-02-13-2019_02-25-2019_1807.windows.x86-64.zip]
 Assigned: [mid-upgrade.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip, mid-core.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip]
 Missing: [mid-upgrade.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip, mid-core.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip]
 Downloaded: []
  1. "Setting mid status to Upgrading" will be written to the MID Server agent log
    • The MID Server record will be set as Status=Upgrading3.1
  2. "Performing pre-upgrade validation tests" will be written to the MID Server agent log
    • For on-premise instances or isolated network environment it may not be possible to pass this tests, requiring a workaround.4.1
    • A "mid-upgrade...preUpgradeCheck.zip" file is downloaded from https://install.service-now.com and extracted to the TEMP folder.
    • On Windows, this includes a test where a simple PowerShell script is run to check the PowerShell version and user permissions.4.2,4.3,4.7
    • If the signed preUpgradeCheck.zip file fails the certificate validation, the upgrade will fail.4.4
    • Some known configurations that cause upgrades to fail are also checked, including that Application Experience is running4.6.
    • If all is well then "Pre-upgrade validation tests successful. Continuing with upgrade process", otherwise specific errors or non-blocking warnings will be added to the agent log and to the MID Server Issue table [ecc_agent_issue]. 4.5
      Note: These pre-checks will not be run if MID Server configuration parameter mid.upgrade.run_precheck=false
  3. Download the missing ZIP files
    • mid-core...zip and mid-upgrade...zip will always need downloading, and possibly also mid-jre...zip, if the Java Runtime also needs upgrading. The specific filenames needed are listed under the "Missing: [...]" line of the Current Packages check above.
    • ZIP files are saved in the \agent\package\incoming\ folder.
    • If all were logged as "Package was successfully downloaded" then we continue.
    • If instance system property mid.download.through.instance=true, then ZIP files will be downloaded via the instance, and not directly from install.service-now.com. That should now be set false by default. 5.1
    • If the ZIP file contains a META-INF folder, Signatures are checked to make sure the ZIP file is not tampered with.4.5
    • If the file is incomplete, perhaps due to socket timeout, the file is deleted and download retried. If maximum reties is reached, or there is a problem deleting the file, this needs resolving manually.
  4. "Upgrading MID server" will be written to the MID Server agent log, once we have everything we need.
    • Extract the ZIP files to the TEMP folder. This is a random folder name like C:\Windows\TEMP\<random 13 digit number>-0\ . Anti-virus/security software can block those temp files, breaking the upgrade6.5.
    • "Stopping MID server. Bootstrapping upgrade."will be written to the MID Server agent log
    • A new Windows Service is created named "ServiceNow Platform Distribution Upgrade (<MID Server name>)" service, which is then started to execute the upgrade binaries that are now in the TEMP folder.6.1 The C:\Windows\TEMP\<random number>-0\<mid buildstamp>\upgrade-wrapper\bin\glide-dist-upgrade.bat file is run to do that.
    • The MID Server service needs a 'logon as' user that is a member of the local Administrators group, or it will not be able to stop/start/create/delete itself and the temporary service.
    • Since Orlando, if the MID Server login as user is non-admin, the upgrade service will be started as a normal command prompt process, not a service6.4.
    • "Setting mid status to Down" is logged at the start of the shutdown process. The wrapper log will log "Stopping the ServiceNow MID Server_xxx service..." at this point.
    • "MIDServer MID Server stopped" is logged, however that does mean that all threads have been killed or that the JVM has stopped yet. There will still be probes that have not finished yet, and those are still going to have to end, or may crash with exceptions. 
    • During this time wrapper log shows several "Waiting to stop..." logs, and will continue to repeat that every 5 seconds until all running threads/probes have ended.
    • Finally a log of "<-- Wrapper Stopped" in the wrapper log shows the JVM has shut down. There should now be no files locked for the java application, or wrapper service. This will take >2 minutes longer than normal in Madrid6.2 , and if there are other stuck probes this can take 15 minutes or more6.3.
  5. Meanwhile, the "ServiceNow Platform Distribution Upgrade..." Service Started, and will do the following:
    • This will start immediately after the "Bootstrapping upgrade" log above, before the MID Server has finished shutting down, which may take some minutes to complete.
    • This upgrade service (or process, for non-admin login as users) waits until the MID Server service has fully shut down, before continuing7.9.
    • Files in the agent\bin and agent\lib folder are deleted from the MID Server installation. It will retry every second if the file is still locked, so the fact the MID Server might still be shutting down should not be an issue, assuming the MID Server does eventually cleanly shut down.
    • If the files are still locked after 10 minutes7.1 the upgrade will fail. The upgrade service stops, and the MID Server is not started, and remains Down. From Paris, a list of currently running processes will be written to the log, which will probably confirm the java and wrapper processes were still running. It does not do a stack dump, or list running services, so the information to match up the process IDs (PID) with the installs/services when multiple MID Servers are running is not easy from this log.
    • If the agent log shows "MID Server stopped" and "Main.handleStop() after shutdown, OperationalState=UPGRADING" it doesn't mean the JVM and wrapper have actually stopped. You need to also see "<-- Wrapper Stopped" in the wrapper.log to confirm the MID Server has shut down.
    • It is possible that file lock errors happen before the 10 minute timeout, after the MID Server truly has shut down. For example, due to Application Experience7.3, and Anti-Virus software such as Cisco AMP7.4 and Dell SecureWorks Red Cloak7.10. There are others causes not yet nailed down7.5. These other non-mid server process are momentarily keeping a lock on the files as the upgrade service tries to delete them. Code is being added to the MID Server to create Issues records when known causes such as these are identified, and Application Experience is already checked for4.6. Anti-virus deleting suspicious files, such InjectorService.exe, while the upgrade is also trying to delete them causes exceptions as well.7.11 The upgrade service stops, and the MID Server is not started, and remains Down.
    • If 2 services incorrectly use the same install folder, the copies may fail due to file locks by the other running service.7.2 Checks on MID Server startup should now prevent that.
    • Any errors or warnings will be logged to the glide-dist-upgrade.log, within the TEMP folder. If the upgrade fails during this step, then this file may be the only clue as to what happened.
    • After the deletes are done, "Copying files to MID server installation path" is logged and new files previously extracted into the "C:\Windows\Temp\<random 13 digit number>-0\agent" folder are now copied over the MID Server installation folder. Any existing files will be overwritten, and so would also need not to be locked. As part of the copy it does "Correcting file permissions for directory", before logging "Finished copying files".
    • If the Java JRE is also upgraded7.6, then the agent\jre folder is deleted and replaced. Customised files within agent/jre such as "cacerts" will be overwritten.7.7
    • "Upgrade complete" is logged
    • A crash or exception around this point could mean no further steps happen. This may be recoverable by allowing the ServiceNow Platform Distribution Upgrade to run again.7.8
    • The log file is copied into the MID Server's wrapper log, in an << UPGRADE LOG BEGIN >>...<< UPGRADE LOG END >> section.
    • The MID Server service is Started.
    • This ServiceNow Platform Distribution Upgrade service shuts itself down.
    • Note: The "Unable to install the XXXX service - The specified service already exists." error will always be seen and should be ignored. The code to Start the service also tries to install the service, just in case, and the fact it is already there causes this error.
  6. The MID Server Starts
    • The upgrade check on startup should confirm that the MID Server Installed version is now the Assigned version. If not, it will attempt to upgrade again.
    • The previous ServiceNow Platform Distribution Upgrade Windows service is uninstalled, and the TEMP folder is deleted, even if it had crashed and not finished.8.1
    • The TEMP folder is deleted, and the glide-dist-upgrade.log file with it.
    • The Instance Certificate will be validated, by checking for revocation with OCSP8.2, and the certificate chain and root certificate are also checked, which can cause problems when self-signed certificates of a proxy/firewall are involved.8.10
    • The Tanuki Wrapper will verify the start parameters, and the Certificates of the wrapper executables are valid.8.3,8.4,8.5
    • Passwords in config.xml may be re-encrypted if security-related code has changed.8.6
    • The Powershell version of the host is checked.
    • A PowerShell script to enforce stricter Windows file permissions is run8.7. MID Server parameter mid.windows_host.file_permissions.enforce=false disables this.
    • A check is done to make sure the Service name in wrapper-override.conf matches the actual running service name, and if not shuts down the MID Server to avoid the chance of 2 services for the same install running. 8.8,8.9
    • A check is done for any other MID Server records in the instance with the same name.
    • The Version field of the MID Server record [ecc_agent.version] is updated by the "MID - Process XMLStats" Business rule sensor, in response to the topic=queue.stats input sent by the MID Server's StatusMonitor thread, which runs on startup (and every 10 minutes) and gets the version number from the agent/package/meta/mid-core.meta file.

Additional Information

Footnotes:

Article Information

Last Updated:2020-11-26 08:41:21
Published:2020-11-26