Calgary Text Indexing | Instance Full Text Re-Indexing after upgrade to Calgary
Full text re-indexing in a Calgary instance is initiated immediately after upgrading, and text indexing works differently in Calgary compared to Berlin and earlier releases. This article collects the information that might aid in solving issues where text index events appear stuck after upgrading to Calgary.
Immediately after the upgrade, a fix job called fix_z_convert_to_calgary_format.xml is run, which schedules Calgary ts index upgrades for each individual table to occur on a schedule, spaced out over several hours following the upgrade.
After an upgrade to Calgary, you will see several jobs in [sys_trigger], named ASYNC: Index conversion of <table-name-here> under scheduled jobs, which inserts a text_index.upgrade_calgary event for that table.
The Text index events process job picks these events whenever they appear and converts the individual table. This is done one table at a time, to allow regular text search events to get processed between table conversions, so that ts indexing doesn't fall too behind.
If the initial conversion is taking long (long-running "Text index events process" job) at any point, then it is likely rebuilding a table with a very large index such as kb_knowledge, task.
You can quicken the text index event process by changing the following parallelism property values, and restarting the job/node:
- glide.ts.index.parallelism to 12 from 1
- glide.ts.optimize.parallelism to 5 from 2
This will fire 12 times more parallel multiple threads to complete the text index quicker but it comes at a caveat that you run a risk of using up all the DB connections.
Therefore, it is advisable to change these settings out of office hours or use a lesser number of parallel threads for example, 8 for glide.tx.index.parallelism if done within office hours and monitor the localhost logs. Please revert them back to original values as soon as the full text re-indexing is complete.
Also bear in mind, parallelism settings should not be tweaked for shared customers due to the same risk stated above.
You can determine when all indices have been converted by going to the Text Indexes module and verifying each entry has a state of Ready and Format of v3. Once verified, then the Upgrade to Calgary Text Search UI action can be clicked to finalize the process, and should run quickly.
When the user clicks the Upgrade to Calgary Text Search UI action, another text_index.upgrade_calgary event is logged, targeting all tables whose format is not v3.
If this is clicked after all the tables have been individually converted, then the event will process quickly, as it will skip all the tables that have already been upgraded.
It is necessary to run this, in order to set the default engine for all new tables to use Calgary indexing but advisable to initiate after the full text re-indexing is complete.
After this event is processed, the 'Upgrade to Calgary Text Search' link will disappear confirming full text re-indexing is now complete.
NOTE: If you are upgrading to a calgary version lower than CP3 (Calgary Patch 3), you may encounter a known issue for which text indexing is running the node out of memory, possible making the instance unaccessible. This is due to large .doc, .docx, .xls, and .xlsx attachments being indexed, and brining the node/instance down. This is fixed in Calgary Patch 3 or higher releases.
In order to prevent this, you can perform the following changes and restart the nodes to provide relief.
As per problem PRB590120
1) Stop indexing on .docx attachments.
Go to attachment_extractor table on the instance. On the entry .docx extension, set it to "inactive".
As per problem PRB584653
2) Create the property glide.ts.index.poi.use_event_extractor and set it to true. This will make the indexer use the document parser that uses less memory but will also skip the metadata and some calculated cell's content during indexing.