This article details ways to trace a long-running WFTimer scheduler job back to the workflow context and the current activity that it is running. These steps should be used if the Timer activity correctly started the workflow after the timer expired, but the workflow has continued running the subsequent activities, and one of these is running longer than expected.
"WFTimer.." and "SLA breach timer" scheduler worker jobs
If a Workflow includes a Timer or SLA percentage timer activity, then at the point in the workflow that it runs, these steps occur:
- The workflow engine creates a Scheduled Job record in table [sys_trigger], and sets the Next action time to the end time of the workflow Timer.
- The executing workflow pauses, and the current transaction ends.
- When the Schedule Job run time comes around, a Scheduler Worker thread starts running the workflow again, and continues to the next activity/activities in the workflow.
- When the workflow has finished running the subsequent activities and performing any updates as a result of them, the job ends, and the Scheduler Worker thread is freed up.
When the activity is a Timer, then the scheduler job's name starts with WFTimer... . When the activity is a SLA Percentage Timer, then the name starts with SLA breach timer... instead, but the procedure is identical except you can ignore all contexts except the ones running for the task_sla table.
Locating scheduler jobs
Scheduled Jobs - /sys_trigger_list.do
All scheduler jobs that are currently running will be listed in sys_trigger with state=running. To locate:
- Add the Updated column to the list, so you know when it really did start running, which will be a short time after the planned Next Run time.
Stats - /stats.do
The Scheduler Worker section of the Stats page gives the following information:
- Current job: name
- Job Started: timestamp
At this point it is worth clicking the blue link through to the thread dump, which may provide clues as to which script include or business rules are currently running. e.g. The workflow context may be inserting a task and a custom business rule on insert of task is what is actually stuck.
Locating Workflow Context
The sys_id mentioned in the name is not useful because this was the sys_id of the wf_executing record for the Timer activity when it was the currently executing activity at the time. When the job starts, this gets deleted and replaced with a record for the new currently running activity instead.
- Use the Start timestamp of the job, which matches the End timestamp of the Timer activity in the Workflow Activity History table [wf_history].
- Filter the list on only activities that have finished and where the workflow activity is using the Timer activity definition.
- Open the link in the Context column to go to the Workflow Context that is currently being run in the Scheduler Worker thread.
Finding the long-running activity in the Workflow Context
The Workflow Context form has a related list for Workflow Executing Activities. In this example, it is an activity called Stupid Script.
- Click the Show Workflow related link to view activities highlighted in green.
- Hover over the icon next to the activity title to a popup of scripts that are part of this activity. This is how you can find your problem.
If the workflow context has finished, then you need to consider what records the workflow would be inserting or updating, and what code might still be running as part of that transaction. You can link to a thread dump from the stats.do page for the scheduler worker thread, that may provided clues.
How to fix long-running workflow contexts
It may be correct that your workflow is running for a long time. For example:
- Looping through a large table and making update
- Performing a complex Orchestration workflow activity
- Waiting for a response from a REST message integration
You may let these run their course. Consider a redesign of your scripts if this is regularly causing performance issues on the instance. If there is a design issue in the script, then you are going to need to re-publish the workflow after fixing it.
Other solutions include:
- It is also sometimes possible to update the scripts in activities for older unpublished workflow versions so the existing records do not run into the same problem later in their existing workflow contexts.
- If the workflow contexts needs stopping, clicking the Cancel link on the workflow context form may work in most cases.
- If the scheduler job is still running after canceling the workflow context, then the transaction needs manually killing. This can usually be done through the All Active Transactions module. The URL column shows the name of the job. Select the row, and Kill. If this method cannot kill the job, then a node restart may be required.