Task Parallel Computing
There are two primary parallel computing techniques: task parallel and data parallel. Some key aspects to consider if tasks can be run in parallel:
- Are there dependencies between tasks?
- Does the result of the operations change if you change the execution order?
- Will there be contention for data or some other resource?
- Will we get a return on investment for the overhead of configuring and running in parallel?
- Do we have a significant number of tasks to run? We should have more tasks to run than we have cores.
I created a BatchProcessor solution with a console application called BPRun. This project runs a setup portion, the parallel tasks and then the teardown jobs. The tasks to run are specified in corresponding configuration files. I use code similar to this in production for batch processing ~20 DOS .bat jobs. Being able to have jobs run in a parallel fashion has resulted in a significant time savings.
Setup and Teardown
The setup and teardown portions of the application are run in a serial fashion. The items in the configuration files are executed one after another. Comments may be placed in the file with // prefixing.
The task parallel section executes the items in the corresponding configuration file. As a task finishes the scheduler picks up and runs the next job on that process. The code that does this is in a corresponding Processor class.
Tasks with dependencies
What if you have a task that has a dependency? You can chain them. That is, have the first process or task call the second or dependent task upon completion. For instance, you may have a job that does some data prep and then chain it to call a task that does analysis.
On *nix systems we have the sleep command which we can use to pause for the given # of seconds. However, this doesn