As shown in the Quickstart tutorial, a job is a workflow of tasks to be executed. Workflows provide the ability to replicate tasks and transfer data between tasks.
In this tutorial, we will first upload an image to the user space then use the Studio to create a workflow with task replication that will process the image by applying an edge detection algorithm in parallel.
1 Upload the Image File to User Space
Using the Pydio Web Interface upload this image file to the user space:
Your file will be stored in the user space:
You can also use ProActive Scheduler REST Interface to upload a file to the user space in the server. Below is an example of a commandset to upload file to the user space in the server by using the REST Interface with cURL. Use the login and password you received by e-mail, when you first signed up.
# first we login and retrieve a session id $ sessionid=$(curl -d "username=LOGIN&password=PASSWORD" https://trydev.activeeon.com/rest/scheduler/login) # then we push the image file into the USERSPACE $ curl -H "sessionid:$sessionid" -F "fileName=neptune_triton_01_3000x3000.jpg" -F "fileContent=@neptune_triton_01_3000x3000.jpg;type=image/jpg" https://trydev.activeeon.com/rest/scheduler/dataspace/USERSPACE/
2 Create the Image Processing Workflow
We want to apply a Canny Edge Detector algorithm to the neptune_triton_01_3000x3000.jpg that is is too large to be processed on a single machine. So we will cut it into an equal number of parts and process each part separately on a different node in parallel.
For that we will use groovy script tasks and task replication mechanism. A first task will split the image, the following task will be replicated for each part of the image and produce a processed part, finally the last task will merge all processed parts into a final image.
Follow these steps to create the workflow:
Open the ProActive Studio and Login using the user and password you received by e-mail when you first signed up.
Fill in the name of the job in the left panel, call it image-processing.
Define the following job variables in the Job Variables section of the left panel by clicking on Add button of Local Variables :
inputFilenamewith neptune_triton_01_3000x3000.jpg as value, the name of the image to split
outputFilenamewith processed.jpg as value, the name of the final processed image
nbPartswith 4 as value, the number of parts
Create a new replicate block by dragging and dropping the Replicate block from the Controls into the workspace, it will create a workflow of 3 tasks.
Click on the Task1 (Split) and rename it into split-image by filling the Task Name field in the left panel.
To make the image file accessible to the task go into the Data Management section in the left panel, click on the Add button of Input Files then specify the neptune_triton_01_3000x3000.jpg as Includes and set the Access Mode to transferFromUserSpace.
Then in the Execution section in the left panel, select groovy as Script Engine and click on to open the Script Editor and paste the content of split-image.groovy.
Note that after the image file is transferred from the user space into
the task local space, that is referred by the built-in variable
localspace. So to load the image from the local space the script
contains the following code:
// Load the image from local space File localspaceDir = new File(localspace) File imgFile = new File(localspaceDir, imgFilename) BufferedImage img = ImageIO.read(imgFile)
Also note that the
result variable is a
java.util.ArrayList containing split parts to transmit to the next
Rename the Task2 (Process) into process-part, this task will be executed only after the split-image task is finished.
For this task we will use a java implementation of the Canny Edge Detector written by Tom Gibara. To make the class definition available to our script task, firstly the jar have to be available on the machine that hosts the node and then we need to add a JAR to the task's classpath: in the Fork Environment section, put /home/cperMaster/opt/tutorials/tutorials/canny-edge-detector.jar in additionnal classpath.
As for the previous task, select groovy as Script Engine and paste the contents of process-part.groovy.
Note that the replication index is provided as a system property, it is used to get the part of the image to process:
int partIndex = variables.get("PA_TASK_REPLICATION")
results variable is an array of
org.ow2.proactive.scheduler.common.task.TaskResult that contains the
result of the previous task split-image, since there is only
one parent task the array contains a single element the ArrayList of
splitted parts, the index of the image part to process is given by the
The following code is used to process the image part:
CannyEdgeDetector detector = new com.CannyEdgeDetector() detector.setLowThreshold(0.5) detector.setHighThreshold(1) detector.setSourceImage(partImage) detector.process()
To specify how many times the process-part task will be replicated, click on the and paste the following code into the script section in the left panel:
Rename the Task3 (Merge) into merge-parts, this task will merge the processed parts into a final image once all replicated parent tasks are finished.
Paste the contents of the merge-parts.groovy
Note that this time the size of the
results array will be
equal to 4; the number of replicated tasks. The processed parts are
merged into the final image using the same way the image was split in
the split-image task.
The merge-part task will produce the final processed.jpg image in it's local space so the image needs to be transferred back into the user space. To do so, go into the Data Management section in the left panel, click on the Add button of Output Files then specify the processed.jpg as Includes and set the Access Mode to transferToUserSpace.
Once your workflow is ready and you are logged in, click on to submit your workflow as a job to the Scheduler.
Login to the Scheduler portal using the login and password you received by e-mail when you first signed up. Your job should appear in the job list panel
Depending on the available nodes, you can try to execute the workflow
by setting the
nbParts variable from 4 to 9, 16, 25, 36
it will split the image into smaller parts, produce more tasks and
reduce the tasks computation time.
3 Download the resulting Image File from the User Space
You can also use ProActive Scheduler REST Interface to download the resulting file from the user space by using cURL. Use the login and password you received by e-mail, when you first signed up and then request the processed.jpg file
# first we login and retrieve a session id $ sessionid=$(curl -d "username=LOGIN&password=PASSWORD" https://trydev.activeeon.com/rest/scheduler/login) # then we push the image file into the USERSPACE $ curl -k -H "sessionid:$sessionid" https://trydev.activeeon.com/rest/scheduler/dataspace/USERSPACE/processed.jpg > processed.jpg