Event Management events are not processed. Because such events are not processed, alerts will not be created nor incidents which are created by such alerts.
Scheduled job "Event Management - process event" issues such as stopped, stuck or claimed by a passive node. It is also possible they are not recreated properly after the number of jobs is updated, or after an instance upgrade.
To confirm this is the correct root cause for which the events are stuck in "ready" state:
- Go to "System Scheduler > Scheduled Jobs > Today's Scheduled Jobs".
- Search for jobs like "Event Management - process events".
- Check the "Next action" and the "Claimed by" columns.
- This job out of box (OOB) should run every 5 seconds. Therefore, the "Next action" should be a few seconds from now. If "Next action" is a time in the past, then likely the job is stuck.
- If the job was claimed by a passive node, then the job is stuck as well.
- Lastly, if "Enable multi node event processing = true", confirm that there are (<number_of_jobs_configured> * (1 + <active_worker_nodes>)) jobs.
- Example: an instance with 6 active worker nodes configured to have 4 jobs processing events per node would have (4 * (1 + 6)) = 28.
Note: The 1 above, added to the number of active worker nodes, is because a job is also created for system "Active Nodes".
When "Enable multi node event processing = true" multiple event processing jobs are created. The events are divided into "buckets" to be processed by the jobs. If there are issues with any job, then the events of the related "bucket" are not processed. Therefore, in some cases it may happen that some events are being processed and some are not.
A simple solution is to recreate the jobs as follows:
- Go to "Event Management > Settings > Properties"
- Find option "Number of scheduled jobs processing events" and update to a number different then the current value and save.
The above will recreate the jobs to be claimed by an active node. The value can be reverted back afterwards.
Event management by default processes events created within the last two days (current and previous shard). Therefore, even if after the jobs are recreated successfully there may be older events which are not processed. This behavior can be modified, so that event management will process events on all shards. Setting the following property will have event management process events older than two days (process events on all shards). We recommend reverting back to the default behavior after event processing is caught up.
- evt_mgmt.events_processing_all_shards = true
See the following document for a review of the Event Management event process flow: