Notifications

168 views

Description

The aim of this article is to increase knowledge in order to make doing changes safer, and spotting potentially problematic scripts easier, but is all general information that would apply to any ServiceNow user doing scripting.

Contents

  • When would you write a script
  • .get() and if next()
  • Looping through records, using addEncodedQuery and while loop
  • .deleteRecord() and deleteMultiple(), and Cascade delete
  • .setWorkflow() and autoSysFields(), and Class changes
  • .gs.log() and the Source parameter
  • Thread dump and exact Timestamps
  • Run as a scheduled script, splitting the data sets, and limit()
  • Example of running a UI Action or Business Rule as a standalone script
  • Other stuff

Instructions

When might Technical Support write a script?

Rule #1: Technical Support doesn't write code for customers

  • Technical Support doesn't write functional code for Customers. If there is a requirement to enhance, modify or change existing designed behaviour of the out-of-box features, then that is a job for our Developers as part of a planned feature enhancement in which case it will then become supported code for all customers, or a Customer-specific Implementation perhaps with the involvement of a Partner or Professional Services, which would not be supported code.
  • Support won’t customize an Out-of-the-box script for a customer unless it’s a Problem workaround, and in that case, we will aim to make sure it is captured in an update set, and clear instructions are given about reverting it after the fix release.
  • Support will try to identify if an issue with an out-of-box feature is caused by code in a custom script, but will not be responsible for fixing that custom script.
  • Support may correct custom scripts for customers if it is very simple, such as a logical or syntax error in a single line, and they will also aim to explain best practice and why the customisation breaks things, so that Customers know not to do it again.
  • For “how can I do something” questions, Support may give general steps or pseudo-code if we have a good solution idea, but will be clear that implementing the idea is the responsibility of the customer, perhaps with the help of an implementation Partner or Professional Services, who may actually have a better idea.
  • Support will also point out anything found that is not supported or recommended, or that it is likely to break things after upgrades, or is going to need maintaining. Services can help with that work, which is normal once a customer takes ownership of our code or adds their own code alongside it.

If our bug broke customer's data, we should help clean it up

Examples where a script may need writing by Technical Support for running on the Customer instance would be:

  • A Problem in Normalization Data Services caused loads of duplicate Model records to be created, requiring CIs to be pointed back to the original model, and the duplicate models deleting.
  • A Problem in CMDB Table-per-partition (TPP) Migration caused loads of duplicate CIs to be created after the upgrade, due to existing records not being seen by identification queries. Help would be required to identify and delete the duplicate CIs.
  • A Problem left loads of orphaned IP Address CI records with no parent Network Interface (NIC) record.
  • An undocumented Problem Fix added extra Model Category records, causing many unwanted Asset records to be created. Those would need unlinking from the CI, and the Assets deleted without causing the CI to be cascade deleted.

Adding Debugging code, when we think may have a Problem

Examples would be:

  • Adding gs.log() statements for finding answers to questions such as:
    • Did we get to this line?
    • What was a variable value at this point?
    • Was it this update() that changed the value?
  • Commenting lines to see if we avoid the error, or get a different one.
  • Making corrections to see if we have identified the bug in the code

Would writing a custom script be the most efficient solution?

  • How many records are involved?
  • Could it be fixed manually with a couple of hours of work instead?
  • Am I confident that I can write a script myself to do this?
  • Would Development or Support experts need getting involved?
  • Am I digging a hole for myself? See diagram:-

 

The Trouble with .get() and .update()

GlideRecord API:

The scary things are:

.get() may not actually get a record at all

If your script then uses values from that record in later code or queries, then you may be using 'null' rather than the value you assumed you had.

.get() may get more than 1 record

If your script assumes a single record is got, then you may end up doing something unexpected to records you were not intending to touch.

It is possible the whole table is returned, which also has app node memory implications.

.update() may actually insert a new record

If the .get() didn't get a record, or if you dot-walk to a record via a broken reference, then the GlideRecord will still be initialized.

The common warning in the node logs for "Get for non-existent record: ...., initializing " shows this bad situation has occurred.

If your script carries on regardless, without checking, and updated that gliderecord, an insert will happen.

Example: A custom Business Rule running on insert of a CI Relationship record, to update a field of the Parent CI

var parentGr = new GlideRecord('cmdb_ci');
parentGr.get(current.parent.sys_id.toString());
parentGr.u_has_relationship = true;
parentGr.update();

That has been written on the assumption that for a relationship record to exist, surely both the parent and child CI must already exit.

That assumption is false, because the cmdb_rel_ci record may have been inserted by the Discovery “Virtual Computer Check” Business Rule, which runs before insert of the Parent Computer CI.

This will cause a premature Insert of the parent computer CI when .update() is called, with blank values in all the CI fields. When the 'real' insert happens, after all before insert business rules have completed, it will fail due to the sys_id already having been inserted.

This would be slightly better, by testing the .get() did return true.

var parentGr = new GlideRecord(‘cmdb_ci’);
if (parentGr.get(current.parent.sys_id.toString())) {
parentGr.u_has_relationship = true;
parentGr.update();
}

But that's still not foolproof, as we may return more than one record. With a null value it can return the whole table.

Note: The use of ".sys_id.toString()" is also a good idea. JavaScript is supposed to be able to automatically type-cast between different object types but sometimes doesn't get it right. Doing the type conversions yourself is safer. .get() requires a string, and without this you  are actually giving the method a GlideElement object of the reference field or sys_id field.

Querying and using if and .next() instead

This approach is even safer. We do a .query() instead of a .get(). We then use .next() to get the record. We can also use getRowCount() to check the record count is 1 before doing anything.

var parentGr = new GlideRecord(‘cmdb_ci’);
parentGr.addQuery(‘sys_id’, current.parent.sys_id.toString());
parentGr.query();
if (parentGr.next() && parentGr.getRowCount()==1) {
parentGr.u_has_relationship = true;
parentGr.update();
}

Lessons learned:

You may be using a broken reference value, or even a null if previous code hasn’t correctly set up the value first, so:

  1. Never use .get() without checking the result is True.
  2. It is a lot safer to use a query followed by if next.
  3. Count the records returned by the query

 

Looping through records, using addEncodedQuery and while loop

Let's pretend we have a requirement to set all Computer class CIs as Retired if the most recent discovery date is longer ago than 3 months. These 8 records:

We could write this script to do it:

var gr = new GlideRecord(‘cmdb_ci_computer’);
gr.query();
while (gr.next()) {
  if (gr.last_discovered < ‘2019-04-19’) {
  gr.hardware_status = ‘retired’;
  gr.update();
  }
}

Bad example:

At least 7 things are badly wrong with that:

  • CMDB is an extended table. We are including all child tables of Computer, including Server, etc. in this query.
  • We are looping more records than necessary. We are holding in memory in the GlideRecord object records we are not going to be updating.
  • We are updating records that may already be Retired, without first checking if they are already Retired
  • We are assuming last_discovered has a value, but this would include all CIs without values too.
  • By the time we run this, 3 months ago may be a different date.
  • ‘gr’ is not a safe variable name, especially when we have not wrapped this script in a function – see KB0713029 
  • Powerpoint has messed up the apostrophes.

Better example:

We can address all the above points by adding them to our list filter. Then use addEncodedQuery() with the same table/query string as the list, and you can safely loop through exactly the same records as the list. Not only the 8 records we are going to update will be in the GlideRecord object after the query(), making this quicker and use less memory.

var ciGr = new GlideRecord('cmdb_ci_computer');
ciGr.addEncodedQuery('last_discovered<javascript:gs.beginningoflast3months()^hardware_status!=retired^sys_class_name=cmdb_ci_computer^last_discoveredisnotempty');
cigr.query();
while (cigr.next()) {
cigr.hardware_status='retired';
cigr.update();
}

 

deleteRecord() and deleteMultiple(), and Cascade Delete

var ciGr = new GlideRecord('cmdb_ci_computer');
ciGr.addEncodedQuery('<whatever>');
ciGr.query();
while (ciGr.next()) {
ciGr.deleteRecord(); // not .delete() as you might expect, given update() and insert()!
}

Is equivalent to:

var ciGr = new GlideRecord(‘cmdb_ci_computer');
ciGr.addEncodedQuery('<whatever>');
ciGr.query();
ciGr.deleteMultiple();

Don’t forget that both will cause a Cascade Delete, deleting all child records, and clearing of reference values from other records. For a CI that may be all the NICs, Disks, Memory, Serial Numbers, and the linked Asset, plus all that's child record like Costs Lines, plus anything that references it like Affected CIs in Tasks like Incidents and Problems.

Cascade delete happens by looking for any other records that might reference the CI being deleted. Every field referencing the CMDB should have an Index. Deletes can be very slow if there is a missing index, with a huge proportion of the time could be spent checking a rarely used table that doesn't even have any relevant records in. If the delete is slow, check for slow SELECT queries.

 

setWorkflow() and autoSysFields()

Business rules and Engines will run for operations done in scripts, just like any other insert/update/delete.

Those could do unexpected things, and very slow things.

e.g. CI-Asset Synchronisation will run, and so Assets and their parents/children will also be involved.

setWorkflow(false);

Disables the running of business rules that might normally be triggered by subsequent actions.

Workflow Engine, and I believe all other engines, will not run. 

Also turns off auditing.

autoSysFields(false);

Disables the update to the fields sys_updated_by, sys_updated_on, sys_mod_count, sys_created_by, and sys_created_on.

But just for the subsequent action! Just this script’s next .update/delete.

Example:

var ciGr = new GlideRecord('cmdb_ci_computer');
ciGr.addEncodedQuery('<whatever>');
ciGr.query();
while (ciGr.next()) {
ciGr.setWorkflow(false);
ciGr.autoSysFields(false);
ciGr.deleteRecord();
}

Notes:
  • Can’t be used with deleteMultiple()
  • Will still cascade delete and run BRs for those extra updates/deletes.
  • Won’t have any effect if sys_class_name is changed at the same time – see KB0727652

 

gs.log() and the Source parameter

gs.log() has a 2nd parameter for the ‘source’.

 

while (ciGr.next()) {
gs.log(‘Deleting cmdb_ci:’ + ciGr.sys_id, 'CS1234567');
ciGr.deleteRecord();
}

/syslog_list.do can then be filtered on source=CS1234567

 

Thread dump and exact Timestamps

Adding gs.log() statements to several Sensors and Script Includes is often useful, but created times in syslog are only to 1 second, so the sequence is confused.

getTime() will return the milliseconds since 1970, so include that in you message:

gs.log('Now:' + new Date().getTime() + 'Deleting CI: ' + ciGr.sys_id, 'INT1234567');

I then export syslog to Excel, and sort by the timestamp.

You can extract the timestamp after “Now:” with this excel formula:
=MID(C2,FIND("Now:",C2)+4 ,13)

Example use case:

Example of debugging MID Server Heartbeat. Log statements were added to the Sensor BR, and 5m scheduled job.

This ended up giving the proof that 2 app nodes must have system clocks that were out of sync. The input from the heartbeat probe arrives before it was sent, which is impossible if everything is using the same clock.

GlideLog.getStackTrace(new Packages.java.lang.Throwable())

/threads.do will list the thread dump for all threads on the app node.

You can generate a similar thread dump for a thread as it runs on demand, and include it in a gs.log, from any script, such as Business Rule or Script include.

gs.log('Now:' + new Date().getTime() + 'Stack Trace:\n '
  + GlideLog.getStackTrace(new Packages.java.lang.Throwable()),
  'INT1234567');

Great for answering questions like “What code is deleting the network adapters during a discovery run?”

A new Before Delete business rule for table cmdb_ci network_adapter can simply gs.log a stack trace.

Working back down the stack trace, you can see the .delete(), and the lines below that is what’s doing the delete.

 

Run as a scheduled script, splitting the data sets, and limit()

Data Fix scripts that Support write for customers can run for a long time, or timeout.

  • Run the script as a Scheduled Script (/sysauto_script.do)
  • Run=On Demand, and Execute Now.
  • Scheduled to run later out-of-hours
  • Customer can run it themselves after we set it up, so Support doesn’t need the added delay of a Normal Change in HI.

Use .limit(n) to restrict the query to n records at a time, and then run the script several times.

var ciGr = new GlideRecord(‘cmdb_ci_computer’);
ciGr.addEncodedQuery(‘<whatever>’);
ciGr.limit(1000);
ciGr.query();
ciGr.deleteMultiple();
  • Only the first 1000 results of the query will be included.
  • Good for testing on a subset to then allow estimating total runtime
  • Good for confirming the script works!
  • to see if anything unexpected happened, check Rollback context, and deleted records as things may have been updated/cascade deleted in the process. e.g. linked Assets, Affected CIs on Changes etc.

Speed the job up by running multiple threads at once.

  • The Database is a lot quicker than the App Node -> The bottleneck is the scheduler worker thread.
  • Often 4 or more CMDB update/delete scripts can be run in parallel before you start seeing database replication lag
  • e.g. set up 16 scheduled scripts. Run a few to begin with, and after reviewing database performance, start more.

Script 1 of 16

var ciGr = new GlideRecord(‘cmdb_ci_computer’);
ciGr.addQuery(‘sys_id’,’’STARTSWITH,’0’);
ciGr.query();
ciGr.deleteMultiple();

Script 16 of 16

var ciGr = new GlideRecord(‘cmdb_ci_computer’);
ciGr.addQuery(‘sys_id’,’’STARTSWITH,’f’);
ciGr.query();
ciGr.deleteMultiple();

 

Example of running a UI Action or Business Rule as a standalone script

“Get MID thread dump” UI Action

var agent_name = current.name.replace(/'/g, "\\'");
var midmanage = new MIDServerManage();
midmanage.threaddump(agent_name);
action.setRedirectURL(current);
  • ‘current’ is a GlideRecord of the MID Server record, so ecc_agent table, and whatever sys_id this MID Server is.
  • If this had been a BR, current would be whatever record it was running for.
  • We don’t care about the redirect line, which just tells the UI what to load next

We don’t have ‘current’, so get() the record instead:

var midGr = new GlideRecord(‘ecc_agent’);
if (midGr.get(‘d55a7c2fdb813300e1943ecf9d961964’)) {
var agent_name = midGr.name.replace(/'/g, "\\'");
var midmanage = new MIDServerManage();
midmanage.threaddump(agent_name);
}

And why not do it for all ‘Up’ MID Servers?

var midGr = new GlideRecord(‘ecc_agent’);
midGr.addEncodedQuery(‘status=Up’);
midGr.query();
while (midGr.next()) {
var agent_name = midGr.name.replace(/'/g, "\\'");
var midmanage = new MIDServerManage();
midmanage.threaddump(agent_name);
}

And then wrap it in a scheduled job...

 

Example of running a UI Action or Business Rule as a standalone script

We could come back to this MID server tomorrow, and see from the wrapper log exactly what line of code each thread was running, every 10 minutes through the night.

That could identify what long-running threads were doing, or what threads were running during High CPU/memory periods, and which lines of code or loops they were in.

Additional Information

What’s the scripted equivalent of something like:

SELECT name, COUNT(name) AS NumOccurrences FROM cmdb GROUP BY name HAVING ( COUNT(name) > 1 ) ORDER BY NumOccurrences DESC;

Answer: GlideAggregate

See KB0744490 - "Deleting duplicate records from any table" (login required).

Article Information

Last Updated:2019-10-14 01:03:26
Published:2019-10-14