11. How to use PSMN installations to accelerate the data processing?

At the ENS de Lyon, we have access to PSMN installations which gives us access to several thousands of cores. As soon as it is possible, we can run many short jobs simultaneously to accelerate data processing. Our codes were made for that and the following documentation will show you how does it work. All configuration files are specific to PSMN but our codes can be used everywhere with configuration corresponding to your specific computational system.

Warning

All functions specific to the PSMN are in the folder PSMN of the module. In this folder, you will find a folder for each step of the data processing (CentersFinding, Centers) Rays…) and in these folders, the specific functions for the step.

11.1. Center finding

11.1.1. How to run a compiled version of the CenterFinding2D.m?

It is possible to compile CenterFinding2D.m function to run it outside a MATLAB instance directly in a terminal. This can be useful to run it on cluster, for instance at the PSMN. The matlab function to use for that is submission_center_finding.m.

  1. If you don’t have the compiled files yet (an executable submission_center_finding and a bash script run_submission_center_finding.sh), compile the script submission_center_finding.m doing in a matlab terminal:

    mcc -m submission_center_finding.m
    

    An executable file submission_center_finding and a bash file run_submission_center_finding.sh will appear in the same folder.

    1. Modifie the line 30 of the file “run_submission_center_finding” to add the path of the executable file like this:

    eval "/MyPath/Submission_center_finding" $args
    
  1. To run it in your machine:

    sh run_submission_center_finding.sh $MCRROOT "ManipName" "CamNum" "FirstFrame" "Nframes" "Th" "Size" "Session_INPUT" "Session_OUTPUT"
    

Warning

Even if some parameters are numbers (integers or floats), you need to tipe them as string by using the quote “.

11.1.2. How to split CentersFinding into many jobs on the PSMN?

If you want to run the function at the PSMN and use parallelisation, use the file submission_CenterFinding.sh:

  1. Change the parameters at the begining of the script to use your own parameters

ManipName="MyExperiment"
CamNum=3                                                                #The camera on which you want to find the center
FirstFrame=300                                                  #The first frame (useful if you don't start at one)
Nframes=36000                                                   #The final frame to treat
Th=6500                                                                 #Threshold to detect a part (it has to be tuned with the function ``CenterFinding.m`` and with test=true
Size=5                                                                  #The size of a part (in pixel)
Session_INPUT="/MyWorkspace/"           #The path of the DATA directory, where all the images are
Session_OUTPUT="/MyWorkspace/"          #The path of the PROCESSED_DATA directory, where the centercamk.mat will be saved

CompileFileDir="MyPath/4d-ptv/CenterFinding"                        #The directory where the file "runSubmision_center_finding.sh" is
LOG_path="/MyWorkspace/MyExperiment/CenterFinding_LOG"          #log directory (warning: the directory has to be created before launch the code)
OUT_path="/MyWorkspace/MyExperiment/CenterFinding_OUT"          #matlab output (warning: the directory has to be created before launch the code)
  1. Run this function in a terminal doing:

sh submission_CenterFinding.sh

This will launch a job at the PSMN, on the queue PIV, you can check if everything is ok by looking at the file center_camCamNum.log in the LOG directory.

11.2. Center to rays

11.2.1. How to parallelize this step?

It’s possible to split the work for each camera instead of doing all the camera in a row, for this you can use the function Centers2RaysParallel which will give a number x file center_camx.mat, with x the number of camera.

Once you have those file for all the camera, you can use the function Ray_recombinaison.m which will gathered all the rays in one file rays.mat

Center2RaysParallel takes 6 arguments:

  • session : structure containing paths of MyPath folders,

  • ManipName : name of the experiment,

  • Calib : calib.mat file,

  • kcam : the number of the cam you want to treat,

  • FirstFrame : the first frame to treat

  • Ttype (optional) : type of the transformation to use. ‘T1’ for linear transformation (defaut). ‘T3’ for cubic transformation.

Rays_recombinaison.m takes 3 arguments:

  • session : structure containing paths of MyPath folders,

  • ManipName : name of the experiment,

  • camID : list of camera numbers. ex: [1,2,3] if you have 3 cameras numbered 1,2,3 respectively,

11.2.2. How to run a compiled version of Center2Rays?

It can be useful to run this step at the PSMN and for this you can compile the Matlab function and use the bash submission function. The matlab function to use is submission_Centers2Rays.m.

  1. If you don’t have the compiled files yet (an executable submission_Centers2Rays and a bash script run_submission_Centers2Rays), compile the script submission_Centers2Rays.m doing in a matlab terminal:

    mcc -m submission_Centers2Rays.m -a /applis/PSMN/generic/Matlab/R2017b/toolbox/images/images
    
  2. Modify the line 30 of the file run_submission_center_finding.sh to add the path of the executable file like this:

eval "/MyPath/submission_Centers2Rays" $args
  1. To run it in your machine:

    sh run_submission_center_finding.sh $MCRROOT "$kcam" "$CalibPath" "$ManipName" "$FirstFrame" "$Session_INPUT" "$Session_OUTPUT"
    

11.2.3. How to split this step into many jobs on the PSMN?

If you want to run the function at the PSMN and use parallelisation, once you have executed the previous paragraph, use the file submission_Centers2Rays.sh:

  1. Change the parameters at the begining of the script to use your own parameters

kcam=4          #The camera on which you want to trace the rays
CalibPath="/MyPath/calib.mat"   #The path of the calibration file
ManipName="MyExperiment"
FirstFrame=400
Session_INPUT="/MyWorkspace/"           #The path of the PROCESSED_DATA directory, where the "center_camX.mat" are
Session_OUTPUT="/MyWorkspace/"          #The path of the PROCESSED_DATA directory, where the "rays_camX.mat" will be saved

CompileFileDir="/MyWorkspace/4d-ptv/Center2Rays"        #The directory where the file "submission_Centers2Rays.sh" is
LOG_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Centers2Rays_LOG"  #Log directory
OUT_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Centers2Rays_OUT"  #Matlab output directory
  1. Run this function for each camera in a terminal doing:

sh submission_Centers2Rays.sh
  1. Once you have all your file rays_camX.mat, launch the function Rays_recombinaison.m in a Matlab terminal doing:

    Rays_recombinaison(session,'MyExperiment',[1 2 3 4])
    

11.3. Matching

11.3.1. How to split the matching into many jobs on the PSMN?

When you try to track too many particles, run one job to do matching for the whole experiment is definetely too long. Typically, you may have some troubles beyond 3000 particles on your pictures. However the matching step can be done separatively for each frame. Indeed, we only need rays from the current frame. So it is possible to split the big job into small ones, doing matching only for some frames. That is very insteresting if you have access to a computational center which can provide you many cores simultaneously. This is what we call parallelisation.

It is mot efficient (possible) for many jobs to access to a single file. But each job need for rays data to process its frames. That is why we will split rays data into smaller files. Then each job will have its own rays data file. This data spliting is realized by RaysSavingForParallelMatching.m which takes 4 arguments:

  • session : Path to the achitecture root,

  • ManipName : Name of the folder experiment,

  • camID : List of cameras number,

  • NbFramePerJobMatching : number of frames per job. Pay attention, has to be chosen as a function of processing time of one picture, in order to that each job runs for 10 min (PSMN requirements).

RaysSavingForParallelMatching.m function creates a folder Parallel/Matching/Rays/ and saves there all splitted rays.dat file.

Note

With Test Data, in a matlab terminal :

session.input_path = "My4DPTVInstallationPath/Documentation/TestData/";  % My4DPTVInstallationPath has to be adapted !!!
session.output_path = "My4DPTVInstallationPath/Documentation/TestData/";
RaysSavingForParallelMatching(session,"MyExperiment",[1,2,3],10)

It will split rays.dat file into small files composed of 10 frames.

Note

You can also launching this step in parallel at the PSMN, by compiling the function RaysSavingForParallelMatching.m and use the bash script submission_RaysSavingForParallelMatching.sh. It the same way to do it that the ones describes in CenterFinding and Center2Rays. This could be usefull if you have several run to treat.

Warning

To run your jobs on PSMN computers, it is preferable to run short jobs with a typical runtime of 10 min. The NbFramePerJob parameter is determined by that kind of constraints.

Following this method, we generate several hundreds of jobs: it it definetely not possible to run it manually. We create a .sh file which will run all jobs when it is executed. This file is created by the function ParallelJobsMatching.m which requires 11 arguments:

  • session : Paths to the achitecture root

  • STM_path : Path to the STM script

  • ManipName : Name of the folder experiment

  • nframes : Total number of frames in the experiment

  • NbFramePerJobMatching : Number of frames per job. Pay attention, has to be chosen as a function of processing time of one picture, in order to that each job runs for 10 min (PSMN requirements).

  • CamMatch : Minimum number of rays to get a match

  • MaxDistance : Maximal authorized distance between rays to consider having a match

  • nx,ny,nz : number of voxels in each direction

  • MaxMatchesPerRay : Maximum number of matches for one ray. 2 to consider particle overlap

  • bminx,bmaxx : x limits of bounding box

  • bminy,bmaxy : y limits of bounding box

  • bminz,bmaxz : z limits of bounding box

  • MinDistMatchperRay : Specify a volume in which you cannot have an other match if you have already found one (avoid to consider several matches for one particule)

  • Queue : Running queue. By defaut it is equal to ‘PIV’. It is possible to run jobs on monointeldeb128 or monointeldeb48 for example. Do qstat -g c to get all opened queues. It it possible to set several queues as “MyFirstQueue,MySecondQueue”.

The function ParallelJobsMatching.m creates a Parallel folder with two subfolders SH and LOG which will contains all sh and log files for each job. The log file is made of all jobs output and allows you to understand what happens in case of errors. The sh file contains all information to run the job properly on a specific queue. This file is very specific to the PSMN.

How to choose queue?

It is possible to see all queues and their avalaibility doing

qstat -g c

Pay attention some queue are reserved for multi-processors jobs which is not our case. Run your jobs only on single processor queues. When you have lots of jobs, do not hesite to write to PSMN staff and ask for more cores.

The function ParallelJobsMatching.m creates also a file <ManipName>-ParallelMatching.sh in the folder session.output_path/Processed_DATA/ManipName (where ManipName is the name of the experiment). To run all jobs on the PSMN, o in a terminal

Note

With TestData

cd Processed_DATA/MyExperiment/Parallel/
sh MyExperiment-ParallelMatching.sh
What can I do when some jobs fail?

It is possible to run again only these jobs doing in the SH folder:

qsub rays_n-m.sh

with n and m are the proper integers.

Matching script will save all matching files in folder Parallel/Rays/.

11.4. Tracking

11.4.1. How to split Tracking into many jobs on the PSMN?

As for the matching, as soon as you track several thousands of particles, it becomes impossible to get trajectories in a raisonnable time (reasonable meaning a few days). However, the tracking is harder to parallelize because it requires matched points for all frames. So instead of running one job for tracking, we run several jobs tracking particles only over several hundreds of frames. The precise number of frames treated by each job is, as previously, determined by the required time to do it. Doing that, we have particules trajectory for each sets of frames. To get the complete trajectories over the whole experiment, there need to be reconnected and that is done in the following step: the stitching.

The function track3d_psmn.m computes trajectory like previously but it loads several matched files simultaneously. Indeed, the tracking is faster than matching so we can treat more frames per job. Besides, as it is not possible to estimate next particle position when we have its position for less than mpriormax frames. In that case, we do simple closest neighbour tracking so it makes no sense to track a small number of frame per job with this code because otherwise we do not do predictive tracking but we do closest neighbour tracking. That’s why we load several matched files. The function track3d_psmn.m takes 10 arguments:

  • session : Path to the achitecture root

  • ManipName : Name of the experiment

  • NbFramePerJobTracking : Number of frame per job for parallelized matching

  • minframe : number of the first frame to treat

  • maxframe : number of the last frame to treat. Pay attention, min and max frame have to be multiple of NbFramePerJob

  • maxdist : maximum travelled distance between two successive frames

  • lmin : minimum length of a trajectory (number of frames)

  • flag_pred : 1 for predictive tracking, 0 otherwise

  • npriormax : maximum number of prior frames used for predictive tracking

  • flag_conf : 1 for conflict solving, 0 otherwise

_images/InOutputtrack3d_psmn.png

Input and output files of track3d_psmn.m function.

As previously, trajectories are saved in a file /Parallel/Tracking/Tracks/tracks_{minframe}-{maxframe}.h5. This file can also be read with readmatches.m function.

It is better to compile track3d_psmn.m function.
  1. Again, if you don’t have the compiled file yet, compile the function submission_Tracking.m

    mcc -m submission_Tracking.m
    
  2. Modify the line 30 of the file run_submission_Tracking.sh to add the path of the executable file like this:

    eval "/MyPath/submission_Tracking" $args
    
  3. To run it in your machine:

    sh run_submission_Tracking.sh "MyExperiment" "NbFramePerJobMatching" "FirstFrame" "LastFrame" "maxdist" "lmin" "flag_pred" "npriormax" "flag_conf" "Session_INPUT" "Session_OUTPUT"
    

Warning

Even if some parameters are numbers (integers or floats), you need to type them as string by using the quote “.

If you modify any sub-function called by `submission_Tracking.m`, you have to compile `submission_Tracking.m` again to take into account your adds.

To run all jobs simultaneously use submission_Tracking.sh file after completing its header:

NbFramePerJobMatching=20             # Number of frame per job for parallel matching
maxdist=0.4                         # maximum distance between rays to consider a match
lmin=5                                # minimum trajectory length
npriormax=5                           # number of points used to predict next particle position
manipname="Ra1.51e10_peudense_6"
first=401                               # First frame of the experiment
last=36000                           # Last frame of the experiment
NbFramePerJobTracking=5000             # Number of frame per job for tracking. Has to be a multiple of NbFramePerJob

flag_pred=1                           # To do predictive tracking. If 0 do closest neighbour tracking
flag_conf=1                           # To resolve conflict when two particles belong to the same trajectory. Only the closest is kept

Session_INPUT="/Xnfs/convection/Stage_EB_2020/"     #The path of the PROCESSED_DATA directory, where the files rays_out_ccp.hdf5 are
Session_OUTPUT="/Xnfs/convection/Stage_EB_2020/"        #The path of the PROCESSED_DATA directory, where the track_x_x.hdf5 will be

CompileFileDir="/home/eberna07/Stage_EB_2020/4d-ptv/Tracking3D"     # Directory where the compiled file "run_submission_matlab.sh" is
LOG_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Parallel/Tracking/LOG"     #log directory
OUT_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Parallel/Tracking/OUT"     #matlab output

Several parameters are very important:

  • minframe and maxframe the first and last are number of the first and last frames of the experiment,

  • NbFramePerJobMatching is the number of frame per job for parallel matching,

  • NbFramePerJobTracking is the number of frame per job for parallel tracking: it has to be a multiple of NbFramePerJobMatching because it will open several matching output files until achieves NbFramePerJobTracking. This number has to be selected as a function of computational time. Typically it is equal to several thousands.

Once the submission files completed, you can launch it by opening a terminal in the the tracking directory and enter the command

sh submission_Tracking.sh

Note

You can see if your job are running by doing qstat. If their state are qw it mean that all the CPU of the queue are running and your job is in waiting state. Then, if everything is ok, you will see the state r, meaning that the job is running. If you see eqw, it means that there is a problem but you can info on this problem by tiping the command qstat -explain E -j and the number of the job. In general, it’s because the log and out directory you have defined are not created. The exact path depends on where you are precisely in the folder. We precise that is is not necessary to parallelize tracking for test data as data are very small, it is presented only to understand processing.

11.5. Stitching

11.5.1. How to split Stitching processing into many jobs on the PSMN?

When the tracking has been made by several jobs, it is necessary to reconnect trajectories in between successive files and then we can do classic stitching. There are one or two steps depending on the total number of frame of your experiment. The stitching is quitte time consuming so if you have many frames (more than 20000) for your experiment, it is clever to do parallel stitching on a reduced number of frames. Doing that, you will get several packets of reconnected trajectories. The second step allows you to reconnect these packets into only one. If you have few frames in your experiment (less than 10000) it is worth to run only one stitching job working on all frames.

If you have 50000 frames per experiment and that the tracking was done on packets of 1000 frames, you can run 25 jobs doing stitching on packets of 2000 frames using `Stitching_psmnA.m` function and then reconnect all the packet into one using ```Stitching_psmnB.m`.

The first step is realised by stitchTracksSides.m. But the user has just to use Stitching_psmnA.m function which does everything for him. This function takes 11 arguments:

  • session : session.path contains MyPath,

  • ManipName : Name of the experiment,

  • minframe : First frame to process,

  • maxframe : Last frame to process,

  • NbFramePerJobTracking : Number of frame per tracking job,

  • FileName : Name of the tracks file,

  • dfmax : maximum number of tolerated missing frames to reconnect to trajectories,

  • dxmax : maximum tolerated distance (in norm) between projected point after the first trajectory and the real beginning position of the stitched one,

  • dvmax : maximum tolerated relative velocity difference between the end of the first trajectory and the beginning of the stitched one,

  • lmin : minimum length for a trajectory to be stitched.

`Stitching_psmnA` reconnectes trajectories in a small packet of a few thousands of frames. `Stitching_psmnB` reconnectes trajectories between the small packets: to get the trajectories from 2 successive packets, it looks for the last dfmax frame of the first packet and the first dfmax frames of the second packet and reconnect trajectories only within these frames. Indeed, trajectories were already reconnected anywhere else in the packets by `Stitching_psmnA`. `

This final step can be splitted into several jobs at the PSMN, in order to save time. To do it you have to:

  1. If you don’t have the compiled files yet you can compile the function submission_Matlab_Stitching.m

    mcc -m submission_Matlab_Stitching
    
  2. Modify the line 30 of the file “run_Submission_Matlab_Stitching.sh” to add the path of the executable file like this:

    eval "/MyPath/Submission_Matlab_Stitching" $args
    
  3. To run it in your machine:

    sh run_Submission_Matlab_Stitching.sh $MCRROOT "ManipName" "FirstFrame" "LastFrame" "dfmax" "dxmax" "dvmax" "lmin" "Session_INPUT" "Session_OUTPUT"
    

Warning

Even if some parameters are numbers (integers or floats), you need to type them as string by using the quote “.

As for the Tracking, CenterFinding and Centers2Rays, you can launch several job at the PSMN by using submission_Stitching.sh:

  1. Complete the header of the function to tune your parameters

    ManipName="Ra1.51e10_peudense_6"
    dfmax=60              # Number of frame per job for parallel matching
    dxmax=0.5                           # maximum distance between rays to consider a match
    dvmax=0.35                               # minimum trajectory length
    lmin=5                           # number of points used to predict next particle position
    first=35401                               # First frame of the experiment
    last=36000                              # Last frame of the experiment
    NbFramePerJobTracking=600              # Number of frame per job for tracking. Has to be a multiple of NbFramePerJob
    
    Session_INPUT="/Xnfs/convection/Stage_EB_2020/"             #The path of the PROCESSED_DATA directory, where tge file track_x_x.h5 are
    Session_OUTPUT="/Xnfs/convection/Stage_EB_2020/"            #The path of the PROCESSED_DATA directory, where the StitchA_x_x.h5 will be
    
    CompileFileDir="/home/eberna07/Stage_EB_2020/4d-ptv/Stitching" #Directory where the file "run_Submission_Matlab_Stitching.sh" is
    LOG_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Parallel/Stitching/LOG"        #Log directory
    OUT_path="/Xnfs/convection/Stage_EB_2020/Processed_DATA/Ra1.51e10_peudense_6/Parallel/Stitching/OUT"        #Matlab output directory
    
  2. Open a terminal in the directiry Stitching and use the command:

    sh submission_Stitching.sh