How does mid server upgrade process work? Knowing this may help you debug if it goes wrong.
Below is the process that takes place when the mid server auto-upgrades, which should happen immediately after an Instance upgrade of patch finishes.
This is based on a Windows host, and the Madrid behaviour. The Linux process is generally the same, but with different temp folder locations and using shell scripts instead.
MID Server Upgrade Process on a Windows host
- The Upgrade check is triggered in one of these ways:
- By the StartupSequencer thread when the MID Server starts up.
- At the end of an Instance patch or upgrade, when the instance sends a topic=SystemCommand, source=autoUpgrade job to all MID Servers via the ECC Queue.
- Every hour, when the MID Server's "AutoUpgrade.3600" thread runs.
- If the "Upgrade MID" related link is clicked on the MID Server form.
- "Checking to see if MID server needs to upgrade." will be written to the MID Server agent log, and the instance is queried to find out what version the MID Server should be:
- The MID Buildstamp will be reported by the Instance App node that the MID Server is connected to. This is derived from the glide.war and glide.war.assigned system properties.2.1, 2.2
- The "MIDAssignedPackages" Scripted SOAP Service is used for the request. This needs to be the out-of-box version, and Active.
- The instance stats.do page will also report the MID Buildstamp.
- If the assigned version is older than the installed version, then a downgrade is attempted, however in some cases this will cause problems. The older instance version may not have the newer code/APIs that the MID Server was expecting when starting the downgrade process, while the MID Server is still running future code.
- If MID Server Parameter "mid.pinned.version" is set, then this will override the instance version. 2.3
The MID Server agent log will report something like:
AutoUpgrade.3600 Current packages: AutoUpgrade.3600 Installed: [mid-core.madrid-12-18-2018__patch1-hotfix2-03-14-2019_03-20-2019_1304.universal.universal.zip, mid-jre.madrid-12-18-2018__patch1-02-13-2019_02-25-2019_1807.windows.x86-64.zip] AutoUpgrade.3600 Assigned: [mid-upgrade.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip, mid-core.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip] AutoUpgrade.3600 Missing: [mid-upgrade.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip, mid-core.madrid-12-18-2018__patch2-03-20-2019_03-29-2019_1650.universal.universal.zip] AutoUpgrade.3600 Downloaded:  MIB Initializer Extended MIBS loaded
- "Setting mid status to Upgrading" will be written to the MID Server agent log, and The MID Server record will be set as Status=Upgrading
- "Performing pre-upgrade validation tests" will be written to the MID Server agent log
- A "mid-upgrade...preUpgradeCheck.zip" file is downloaded from https://install.service-now.com and extracted to the TEMP folder.
- If all is well then "Pre-upgrade validation tests successful. Continuing with upgrade process", otherwise specific errors or non-blocking warnings will be added to the agent log and to the MID Server Issue table [ecc_agent_issue]. 4.1
Note: These pre-checks will not be run if MID Server configuration parameter mid.upgrade.run_precheck=false
- Download the missing ZIP files
- mid-core...zip and mid-upgrade...zip will always need downloading, and possibly also mid-jre...zip, if the Java Runtime also needs upgrading. The specific filenames needed are listed under the "Missing: [...]" line of the Current Packages check above.
- ZIP files are saved in the \agent\package\incoming\ folder.
- If all were logged as "Package was successfully downloaded" then we continue.
- If instance system property mid.download.through.instance=true, then ZIP files will be downloaded via the instance, and not directly from install.service-now.com. That should now be set false by default. 5.1
- "Upgrading MID server" will be written to the MID Server agent log, once we have everything we need.
- Extract the ZIP files to the TEMP folder. This is a random folder name like C:\Windows\TEMP\<random 13 digit number>-0\
- "Stopping MID server. Bootstrapping upgrade."will be written to the MID Server agent log
- A new Windows Service is created named "ServiceNow Platform Distribution Upgrade (<MID Server name>)" service, which is then started to execute the upgrade binaries that are now in the TEMP folder. The C:\Windows\TEMP\<random number>-0\<mid buildstamp>\upgrade-wrapper\bin\glide-dist-upgrade.bat file is run to do that.
- The MID Server service needs a 'logon as' user that is a member of the local Administrators group, or it will not be able to stop/start/create/delete itself and the temporary service. (That restriction may be lifted in the Orlando release.)
- "Setting mid status to Down" is logged at the start of the shutdown process. The wrapper log will log "Stopping the ServiceNow MID Server_xxx service..." at this point.
- "MIDServer MID Server stopped" is logged, however that does mean that all threads have been killed or that the JVM has stopped yet. There will still be probes that have not finished yet, and those are still going to have to end, or may crash with exceptions.
- During this time wrapper log shows several "Waiting to stop..." logs, and will continue to repeat that every 5 seconds until all running threads/probes have ended.
- Finally a log of "<-- Wrapper Stopped" in the wrapper log shows the JVM has shut down. There should now be no files locked for the java application, or wrapper service. This will take >2 minutes longer than normal in Madrid6.1 , and if there are other stuck probes this can take 15 minutes or more.
- Meanwhile, the "ServiceNow Platform Distribution Upgrade..." Service Started, and will do the following:
- This will start immediately after the "Bootstrapping upgrade" log above, before the MID Server has finished shutting down, which may take some minutes to complete. This upgrade service does not wait for the MID Server to fully shut down before continuing.
- If there are file locks when deleting or overwriting files, then the MID Server will retry each second until whatever process has the file (which may be the MID Server, or something else like anti-virus) lets go of the file, but we only retry up to a overall timeout of 2 or 10 minutes 7.1, before the upgrade service fails. The upgrade will stop and MID Server remain down if that happens. Manually running glide-dist-upgrade.bat again later may allow it to continue in this situation.
- Any errors or warnings will be logged to the glide-dist-upgrade.log, within the TEMP folder. If the upgrade fails during this step, then this file may remain as the only clue as to what happened.
- Files in the agent\bin and agent\lib folder are deleted from the MID Server installation. It will retry if the file is still locked, so the fact the MID Server might still be shutting down should not be an issue, assuming the MID Server does eventually cleanly shut down. It is possible that another process keeps files locked or hides the files, e.g. Application Experience7.2, which at the time of writing is still under investigation.7.3
- "Copying files to MID server installation path" is logged and new files previously extracted into the "C:\Windows\Temp\<random 13 digit number>-0\agent" folder are now copied over the MID Server installation folder. Any existing files will be overwritten, and so would also need not to be locked.
- If the JRE is also upgraded, then files within the agent/jre such as "cacerts" may be overwritten.
- "Upgrade complete" is logged
- The log file is copied into the MID Server's wrapper log, in an << UPGRADE LOG BEGIN >>...<< UPGRADE LOG END >> section.
- The MID Server service is Started.
- This ServiceNow Platform Distribution Upgrade service stops
- The MID Server Starts
- The upgrade check on startup should confirm that the MID Server Installed version is now the Assigned version. If not, it will attempt to upgrade again.
- The previous ServiceNow Platform Distribution Upgrade Windows service is uninstalled.
- The TEMP folder is deleted.
2.1 KB0749119 / PRB1344057 MID Server Autoupgrade is not able to handle Instance Rollbacks - MID Server doesn't downgrade again
2.2 KB0697389 MID Servers repeatedly Upgrade and Downgrade between current and previous instance version
2.3 Docs: MID Server Version Selection
4.1 Docs: MID Server pre-upgrade check
5.1 PRB1332088 Adjustments to the MID Server downloading process - Downloading via instance will no longer be the default for new instances (Since LP8, MP3, N)
6.1 KB0754285 / PRB1322060 MID Server Stop or Restart takes over 2 minutes longer due to a JMX thread remaining running - "Shutdown failed: Timed out waiting for signal from JVM"
7.1 KB0749292 / PRB1314105 During the MID upgrade 2 minutes time-out for deleting old files is not enough for some customers
7.2 KB0715612 / PRB1307275 MID Server auto-upgrade will fail if Windows has 'Application Experience' disabled
7.3 KB0715244 / PRB1279578 MID server auto-upgrade fails while copying files from temp to agent folder on Windows even with ''Application Experience' enabled