Notifications

4198 views

Description

Troubleshooting Guide | Using the Transaction Logs



Overview


You can use the information in this troubleshooting guide to help track down the cause of an issue by examining the relevant entries in the Transaction Log table.

One key technique for understanding an issue is to track down the historical transaction details related to a reported example. The key to finding transaction details is using the Transaction Logs table [syslog_transaction]. This article describes the features and usage of this table. The details in this table are very rich and might provide enough information to solve the issue. Even if they do not provide enough information to solve an issue, they can provide many leads that you can investigate.

Things you might learn from the Transaction Logs table

Where is the bottleneck?

One key question in any performance issue is "Where is the bottleneck?". Because the transaction logs show you the breakdown of time spent between server, client, SQL, business rules, client scripts, UI Policies, wait time, and more, you can be well on your way to isolating the cause of a given performance issue.

What was the user doing?

By observing the URL field in the transaction logs you can determine what page the user was accessing. Also, in the case of a form submission, there will be parameters included in the transaction URL. Understanding the meaning of URL paths and the parameters included after the "?" can greatly assist in interpreting exactly what was happening in a given transaction. (For more information, see Navigating by URL.)

By observing the User Agent field you can determine the browser and OS that the user was using. This information is very important if the issue is a client-side issue and you need to know how to reproduce the issue. Finding out the exact, detailed steps to reproduce (STR) an issue is an important part of troubleshooting any issue. User agent strings may be difficult to decipher. A tool like UserAgentString.com can help.

By observing the IP address field you can determine the network that originated the request into ServiceNow. This information can be very helpful when diagnosing issues that seem to occur sometimes but not others. For example, the issue might occur when the user is logging on from within their corporate network but not when logging in from home. Or perhaps performance issues are being reported by a certain user group located in the same geographic location. You might be able to prove that transactions coming from certain IP addresses are slower than others.

Where did the issue occur?

Because ServiceNow has a "clustered" architecture with multiple Java nodes, knowing where in the ServiceNow architecture a certain issue was experienced is important. The Session ID and System ID fields are key here. System ID tells you the node on which the issue occurred. Imagine a case where all transactions for one node were affected but no transactions on the other nodes. This piece of information would be critical in isolating the possible cause. Knowing where the issue occurred is also important so you can search through the detailed System Logs (accessed via the Log File Downloader or Log File Browser).


What was happening during a known affected period?

This question often comes up when an issue was experienced by multiple users during a certain window of time but then the issue stopped. You can use the Transaction Log table to determine the transactions or scheduled jobs that were executing during a given time period. Determining this information is trickier than it sounds because the Created field in the Transaction Log table reflects the time the transaction completed.

Take the following example as an illustration: Suppose a period of slowness was reported between 7:15 and 7:45 AM. Further, suppose that the root cause was a transaction that started at 7:10 AM and ended at 9:15 AM. Searching for all transactions created between 7:15 and 7:45 AM would not reveal the causal transaction because the scheduled job didn't finish until 9:15 AM and therefore would have a Created time of 9:15 AM.

To find very long running transactions, you might want to search a broader time frame and include a condition for Response time that is greater than 120,000 milliseconds (or 2 minutes). This type of search should complete within 10 to 20 seconds even if you are searching a 24-hour time span.


How it works


The Transaction Log table [syslog_transaction] captures many of the transactions that come through the ServiceNow product. These transactions include both user transactions (SOAP or actual human users) and background transactions (that is, scheduled jobs). However, it does not capture the following types of transactions:

  • Some asynchronous transactions like AJAX or Angular
  • Some high volume REST transactions
  • Transactions that did not complete. One exception is that cancelled transactions are recorded in some versions of the product (definitely in Fuji).
 Each transaction includes the breakdown of where the time is being spent including response time (time on the server), client network time (total time - (server + client time)), database inclusive time, business rule inclusive time, browser times, and so on. Other details that are tracked include which user was doing which transaction, the node the user was logged in to, the user's source IP address, and even the User Agent string, which indicates the browser and operating system that was used.

About the performance of the Transaction Log table

Because the Transaction Log table is very large, you should always exercise caution when querying it. In most active customer systems, a 24-hour period of time is the maximum span of time that should be queried on the Transaction Log table at once, although you might be able to query more than that if you have other filter criteria that efficiently limit the result set. For this reason, the modules that access the table (System Diagnostics > Transactions, System Diagnostics > Transactions (background), and System Diagnostics > Transactions (all users)) all have a default filter on Created On = Today. However, in some customer systems, querying 24 hours of transactions might be way too much for the system to handle in a reasonable processing time. For that reason, the best practice is to first access the table by URL and use the sysparm_filter_only=true parameter to display only the list filter. From that point, you can specify a very limited query for only the information that you need. Try to specify the shortest time period required using a Between filter paired with some other restrictive filters such as Response time > 1000 or Created by = joe.employee.

Steps to efficiently query for relevant log entries:

  1. Find out the User ID (from the user table) of the person who experienced the issue.

  2. Find out when they experienced it (down to the minute if possible).

  3. Find out what that user was doing when the issue happened.

  4. Log into the instance as an admin user.

  5. Go to <my_instance_name>/syslog_transaction_list.do?sysparm_filter_only=true.

  6. Expand the graphical filter builder for the list and add the following filter to grab all transactions by that user around the reported time:
    [Created] [between] [<some_time_before_event>] and [<some_time_after_event>]
    [Created by] [is] [<user ID of affected_user>]

  7. Click Run.

Definitions of the Fields of the Transaction Log table

If you haven't already done so, click the gear icon on the list to personalize the columns that are displayed in the list. The columns described in this section might be useful to you.

Server Metrics

• Created: (sys_created_on) The moment the transaction was recorded. Note that transactions are recorded at the end of the transaction. Therefore, to determine the start time of a transaction (including scheduled jobs), you must subtract the Response time value from the Created time.

• Created By: (sys_created_by) The User ID (sys_user.user_name) of the user who performed the transaction. Most background/scheduled jobs are run by a special user called "system". Transactions initiated by a user's pre-authentication will be marked by a special system user called "guest".

• Response Time: (response_time) The number of milliseconds spent by the server in fulfilling the transaction. Does not include server time spent prior to a redirect (HTTP 302). Does not include time spent on subsequent, partial page requests (i.e., AJAX) or serving up a resource (CSS, JavaScript, images, etc). Does include wait time from waiting for a semaphore or waiting for a previous transaction on the same session to complete. Therefore, this can be a confusing and inflated metric because some transactions may have 90% or more of their response time spent waiting for another transaction to complete.

• Network Time: (network_time) As of Istanbul, the network time is the time it takes the server to write all the bytes of the request to the output stream - not the time it takes to transmit those bytes across the internet. This is not what most people expect from this measurement. It is much lower than what most people would expect.

• Business Rule Time: (business_rule_time) The number of milliseconds spent executing Business Rules or Script Jobs. Script Jobs includes scheduled script executions and asynchronous Business Rules. This is inclusive of time spent downstream from the first Business Rule called. The timer starts once a Business Rule begins and stops only when that Business Rule is complete, including any time spent waiting on SQL to complete or any other Business Rules or processes that are triggered from the script of the top Business Rule. There are other ways that SQL can get triggered outside of Business Rules, so it is entirely possible that SQL Time may be listed as being greater than Business Rule time; however, you should keep in mind that whatever time spent running SQL was triggered by a Business Rule will be included in the calculation of Business Rule Time.

• SQL Time: (sql_time) The number of milliseconds spent waiting for the request and response to the database. This can be artificially bloated in cases where a resource that is involved in handling the request/response to the database is overloaded (e.g., application server CPU, network bandwidth between app server and db server). One thing to note is that homepages employee multithreading and can execute multiple SQL statements simultaneously. SQL Time is the total of all SQL executed as part of a transaction on the main thread or child threads and therefore SQL time for home.do transactions is often greater than the total transaction processing time.

• Type: (type) The type of transaction (such as form or list).

• Table: (table) The table that was displayed, for example, incident, change_request.

• View: (view_id) The View (sys_ui_view table) that was used for this form/list.

• Session: (session) The Java session ID of the user who was logged in. This ID is useful for identifying log messages output by this same user in the System Logs. For background jobs the Session ID will contain the name of the worker that is executing the job.

• System ID: (system_id) This field includes the name of the node upon which the transaction was executed.

• IP Address: (ip_address) The source IP address of the transaction inbound to ServiceNow.

• User Agent: (user_agent) A user agent string is a unique identifier to identify the browser and operating system that initiated the transaction (see http://www.useragentstring.com/index.php).

Available with Fuji and later:

• Total wait time (total_wait_time): Time in milliseconds that the transaction spent waiting before the server was able to process it.

• Transaction processing time (transaction_processing_time): Server response time, minus wait time. A good way to measure what transactions are actually using all the processing time.

• Session wait time (session_wait_time): Wait time in milliseconds caused by the same user session initiating a second transaction before the first has completed.

• Semaphore wait time (semaphore_wait_time): Wait time in milliseconds caused by all semaphores being in use.

• App Scope (app_scope): The parent scope from which the transaction was executed.

Available with Istanbul and later:

• Transaction Pattern: (transaction_pattern) The unique hash code identifying an anonymized URL - this can be used to link a specific transaction in the logs with the Slow Transactions module (sys_transaction_pattern table).

Client Transaction Timings

Timings that track operations on the client-side (i.e. on the browser or mobile device). Not available for all transaction types. Not available on Safari.

• Client Transaction: (client_transaction) True if the transaction has successfully recorded client timings.

• Client Response Time: (client_response_time) The number of milliseconds between navigationStart and loadEventEnd. In other words, the total time, including server time, between when a page was requested and the HTML load event completed firing. This would include onLoad client scripts and any onChange scripts that fired as a result. In other words, this is inclusive of server time, browser time, javascript and any time spent on the HTTP request or HTTP response. This is the most inclusive timing metric and therefore should be bigger than other times. There are exceptions to this rule, like SQL Time, that can count cumulative time from multiple threads.

• Client Network Time: (client_network_time)
Since at least Kingston and all current later versions (London, Madrid) this reflects a calculated value of the total client_response_time minus what is considered server and browser response times - you may notice slight inconsistencies with this formula for some transactions, but by and large this is accurate.

One item of note is that transactions that are directly preceded by an HTTP 302 redirect will reflect the transaction response time of the previous transaction in their client_network_time. This is the current behavior of the product and there is not a plan to change this behavior [as of Oct 30, 2018]. This is documented in PRB632265. This misattribution of server time as client network time is most commonly seen in transactions in the Service Catalog because all Service Catalog transactions from the point you start to order an item until the point where you hit the order summary page are handled via the service_catalog.do processor and redirected to the appropriate page.

For Fuji and some older versions on Chrome, Firefox and IE9+: The number of milliseconds between requestStart and navigationStart (see diagram below). In other words, client_network_time is the time between initial page request and when the first byte of the payload of the request starts getting uploaded (i.e., only the TCP handshake). This is not what most people expect when they look at this metric. It is much lower than what one might expect.

For browsers that don't support the Performance Timing DOM object, client_network_time is determined by the difference between the point in time when the page started loading and when the page was requested, minus server time.

• Browser time (browser_time): Browser time is measured differently depending on if you are using List V2 or List V3.

[Live V2 and earlier] The number of milliseconds between responseEnd and loadEventEnd. In other words, the time between when the current page initially finished downloading the initial wireframe HTML page and the time when the "load" event and all consequent actions have completed. Includes time to download all the linked resources (e.g., images, CSS, JavaScript). Does NOT include time for scripts to execute!

[List V3 - Available with Helsinki] [Possibly also for all transactions in Jakarta and later, this paragraph of the KB needs revision] In List V3 browser_time tracks the time between when JavaScript first starts to run against the time when the AJAXClientTimings request is sent back to the server. This is a bigger chunk of time than the old way of doing the window.performance.timing events of loadEventEnd - responseEnd. So, do not be surprised when you notice browser_time increase in the metrics after moving to List V3. It may partially reflect an actual slow down in user experience but it certainly also includes a simple change in the way the metric is measured. One confusing impact is that in List V3 it is not uncommon to see browser_time be 2 or 3 times bigger than client_response_time. This is because client_response_time still follows the old model of using the window.performance.timing object.

• Client Script Time: (client_script_time) The number of milliseconds spent executing client scripts.


Additonal Client Timings Details


Starting in Fuji, ServiceNow uses the window.performance object to handle calculations of most client timings. This object is a web standard that is supported by all modern browsers at the time of writing (Oct 14, 2016). Window.performance is supported in IE9 and later. For more information about this window.performance, see the following resources:


The following image shows a quick reference of the timeline of the navigation timing web standard.


Article Information

Last Updated:2020-10-02 11:31:04
Published:2020-10-02