This article provides an overview of how hierarchical scheduled imports and concurrent imports work in the Madrid release.
What are concurrent import sets?
Concurrent import sets is a feature introduced in the Madrid release to handle imports that take a long time to transform large amount of data. This feature helps customer have imports, that usually take a long time during the transformation phase, to be split into multiple import sets for which concurrent jobs transform the data in parallel. This helps bring down the transform run time significantly for large imports.
How it works
This section provides a detailed overview on how concurrent imports and hierarchical scheduled imports work.
In London and previous releases, scheduled job which executes the parent import executes child imports in the hierarchical order. For concurrent imports, the child import should start only after the last transform for the parent import completes. For this reason, there is a change in how hierarchical scheduled imports are done.
1. First, an Execution context for the scheduled import is created in sys_execution_context table.
2. Next, an execution plan is generated in the sys_execution_plan table.
Once the execution plan for the main parent scheduled import is created, system will check if there are child scheduled imports. If there are, the “Next scheduled import” field is updated with the first child scheduled import (ordered by sys_id) and it will recurse to create execution plans for all child scheduled imports. In the example above, the 'Import User Data' is the parent concurrent scheduled import and 'Import kingston user data' is its child concurrent scheduled import.
This way, the parent and all its child scheduled imports will have the same execution context but an individual execution plan for each.
NOTE: From the first step till here, the procedure is common for regular and concurrent imports. From here on, information is specific to Concurrent import sets.
3. The number of import sets is calculated. For each node, there will be one Parent sys_trigger job "Import Set Transformer" and there will be 2 child triggers for each parent trigger.
For example, if there are 2 nodes, there will be 2 parent triggers and a total of 4 "Import Set Transformer" child triggers.
The number of import sets is the minimum of number of child triggers where System ID is NOT 'ACTIVE NODES' and is NOT 'ALL NODES' and the value for system property "glide.scheduled_import.max.concurrent.import_sets" which has a default value of 10. Please note that It's not recommended to modify the value of this property as it might cause other performance issues.
There is 1 concurrent import set created in the sys_concurrent_import_set table. All import sets created for this scheduled import are linked to this concurrent import set.
4. Data is loaded into each of these import sets using Progress Workers based on the Partition Method selected in the Scheduled import record. By default, system distributes records among import sets in a round robin manner.
Users can write their custom script to define a custom partition key. Every row with the same partition key will be part of the same import set.
5. Now that the data is loaded in the import sets, Concurrent Import Set Jobs are created in the sys_concurrent_import_set_job table for each import set.
The state for these jobs is set to "Pending" initially.
Since this is a concurrent scheduled import, the system will not initiate the child scheduled import right away. The child import will be triggered only after the last transform completes for the current concurrent import.
This concludes the loading of data and splitting of the data into multiple import sets. The transformation process for all these import sets is explained in the next section.
Transforming Concurrent Import Sets
This section explains how the data is transformed in concurrent import sets.
The Import Set Transformer job runs every minute and polls the sys_concurrent_import_set_job queue to transform any pending jobs. This jobs only looks for jobs with state "Pending" and will only pick 1 job.