Report

Components

AutoFlow has three main modules: Stack, Queue Manager and Workflow Smith (Provider). The stack module takes a plain text file with a workflow description and identifies the Tasks to execute building Batchs (with one task or many, depending on the presence or not of iterative marks). Once the Stack builds the Batches and generates the atomic Tasks (with dependencies and hardware/software resources), they are transferred to the Queue Manager that builds a shell script to execute them or communicates with a Queue Manager software (in a supercomputer/HPC) to send the entire workflow and the Queue Manager software takes responsibility of wf execution. The wf_smith or wf_provider is in charge of supplying and managing the different virtualizations (venv, anaconda or containers) needed by each task.

How does AutoFlow work?

Autoflow takes a plain text file template as input (1), which describes each task to be executed and how these tasks are related to perform the execution in the correct order. In this way, AutoFlow executes the workflow template within the HPC/supercomputer (2). It identifies each task and creates the folder structure within the storage media. Then, it sends all the tasks to the Queue System (3). Finally, the queue system executes each task taking into account the dependencies to get the results succesfully.

First, AutoFlow identifies all the tasks and gets the dependencies between them (1). Then, Autoflow creates a execution folder with a dedicated subfolder for each task. These subfolders have a sh script that contains the code of the task to be executed (2). All the scripts are sent to the Queue System to be executed (3). The script execution obtains the results for each task (4). The script code assumes that temporal files or result files mut be placed in the subfolder. The input data is taken from another subfolder (task) or given by the user to an external data source.

Basic template sintaxis

List_dir){

#Initialize: definir entorno y ejecución de tareas pequeñas

? #Separador initialize-main command

ls > out

}

Show_list){

#Initialize

? #Separador initialize-main command

cat List_dir)/out

}

An AutoFlow template (orange box) describes each one of the tasks to be executed (yellow boxes). Each task must begin with its unique task name identifier (red text) followed by the ) paranthesis character. Then, the body of the task with the code to be executed must be declared between { } characters. This body is marked as grey and purple text. The former is comment text for user orientation and the later is the code to be executed. The task body is split in two sections with the ? character in one line. The initialize section, previous to the ? character, is designed to perform minor operations to prepare the execution of the main software of the task. The section after the ? character is the main command in which the code that executes the main software of the task must be written. This command separation has NO impact for the execution purposes but it's used to name task folder and show code parsed by Autoflow. The initialize section could be empty but the ? character line must ALWAYS be used to declare the main command section.

Finally, the dependencies between tasks are specified in an implicit way using the tasks names. As seen in the example template, the Show_list task has in its main command a cat isntruction that needs a input path. Instead of specifying an absolute or relative path, we use the task name of the List_dir task with the ) character (blue text) to build the input path. AutoFlow automatically replaces this with the absolute path of the List_dir task that contains the full execution of the task. In this way, the user specifies how to interconnect the workflow tasks and has no worries about where each task is executed.

Executing basic example

We will execute the previous example using -v flag to obtain a dry run that describes the full execution.

The dry run will parse the workflow template and generate all the folder tree with each task script sh but the scripts will not be executed.

autoflow -v -w af_templates/basic.af

Click to see results

List_dir >

ls > out

exec/ls_0000 False

Show_list >

cat exec/ls_0000/out

exec/cat_0000 False

List_dir

AutoFlow parses the worflow template, creates all the folders and task scripts, and shows the resulting task list as it should be executed (including the final absolute paths that are represented as the / character here). The red lines show the original task name the main command that represents each task (the task could have several operations but conceptually, only one generates the results of interest) and they highlight the identification of one task and its attributes (following indented lines). The yellow lines show the main command fully parsed with the paths to the other tasks when needed and other thigns such as resources, autoflow variables, etc). Note that the initialize command is NOT shown to get a summary view of the task. The blue lines indicate in which folder the task will be executed and has a boolean variable that indicates if the task is marked as commented (if it shows as True, the task will not be executed but its results will be taken into account for other tasks). The green lines (when listed) show the dependencies of the task. This means that the task will not be executed until the tasks it depends on are done. It should have one line with the task name for each dependency needed for the task.

Executing on the command line

We'll go through different ways we can execute Autoflow on the command line, taking into account that some of the concepts will be further explained in the next sections.

Base command: Once our template is built, the next step is executing the workflow. To do so in a queuing system (which will be the most common case), we only have to indicate the template path with the -w (--workflow) flag and the output path with the -o path.

Autoflow -w template_name -o exec

Variables: we can add input values to the variables present in the template straight from the command line with the -V (--Variables) flag. This functionality will enable us to have a general workflow template that we'll be able to personalize in each project just by changing Autoflow's command, without needing to change the template. This concept will be further explained later.

Autoflow -w template_name -V '$var1=val1,$var2=val2'

Resources: we can also specify the resources we want on the command line. If we define resources this way, they will be applied in all tasks, although a specific resources setup for a particular task will overwrite the general one. This matter will be further explained as well but an example would be:

Autoflow -w template_name -s -n cal -c 1 -t 3-03:53:00 -m 100gb

Verbose mode: briefly explained before, this mode enables us to create the folder structure without executing anything. It is very useful to check everything is right before executing the workflow. To activate this mode, we just have to add the -v (--verbose) flag. This command is particularly useful when combined with flow logger, which we'll explain later.

Autoflow -w template_name -v

Local mode: we can execute the workflow in local mode by using -b (--batch) flag. This way, the workflow won't be sent to the queuing system.

Autoflow -w template_name -b

Muting the principal command: if we add the -C (--comment_main_command) flag, Autoflow will comment the first line of the task (the one after the '?'), which is the main command. If we did it manually, it would break Autoflow due to it needing that line internally. Doing it this way avoids said problem, allowing partial reexecutions without repeating the main command (which is usually the most computationally requiring one). The rest of the lines in the task can be commented if you don't want to execute some commands. It is not something that you'll use a lot, but it is useful to know about it for developing purposes.

Autoflow -w template_name -C

Additional options: We can specify other options from the queuing system with the -A flag (--additional_job_options).

Autoflow -w template_name -A "parameter:value"

Pause between submissions. we can establish a waiting time between submissions with the --sleep flag. It is very useful in big workflows, where submitting a great quantity of tasks may cause the queuing system to start rejecting them.

Autoflow -w template_name --sleep 0.1 #sleep 0.1 seconds between job submissions

Simbolic folders: The normal directory structure when launching an AutoFlow template is determined by the main command of each task. Depending on how many times that command is executed—which in turn depends on how many tasks use it as their main command and on the iterators of those tasks—the directory will also include a number (for example: cat_0000, cat_0001, etc.). This creates a problem: if one of the samples is removed from the workflow, or if new samples are added, the structure of these folders will change, but their previous contents will not be deleted. As a result, different executions become mixed together, which interferes with the updated workflow and prevents flow_logger from functioning correctly. To solve this problem, the -L flag has been added. This flag instructs AutoFlow to generate static names, which are then referenced in the usual way through symbolic links. This means that we will see the same directory structure as before, but in reality a more robust underlying structure is being used—one that can withstand these changes.

Autoflow -L -w template_name

NOTE: ALL workflows must be executed with the -L flag.

Handling iterative tasks

When we work with workflows, we usually need to repeat one task several times using different parameters, samples, algorithms, etc. In this case, we would need a template like the following to repeat this task:

List_home){

#Initialize

?

ls /home > out

}

List_etc){

#Initialize

?

ls /etc > out

}

List_var){

#Initialize

?

ls /var > out

}

We would write one task for each one of the items that we are using, home, etc and var (the iterable items and as a set, we'll call them the iterator).

But AutoFlow has specific sintaxis to handle this situation and avoid the code redundance (and to adapt dinamically if the iterator set changes):

List_[home;etc;var]){

#Initialize

?

ls /(*) > out

}

The iterable items are enumerated in the list structure [home;etc;var] and AutoFlow iterates them generating one task per item. The task code (initialize or main command) must contain the (*) expression that will be replaced with each item. In this way we get a batch of tasks that only changes in one parameter but each one has its own folder and script. This can be observed when this workflow is executed:

autoflow -v -w af_templates/iterative_task.af

Click to see results

List_home >

ls /home > out

exec/ls_0000 False

List_etc >

ls /etc > out

exec/ls_0001 False

List_var >

ls /var > out

exec/ls_0002 False

Working with iterative tasks and dependencies

As we have seen, we could use specific sintaxis to handle repetitive tasks and perform an iterative task. But, what about connecting other tasks with this iterative task?. Firstly, it is worth mentioning that there are two posibilities:

In the first case (left), we need to perform a new task onto each one of the generated iterative tasks. Basically, we need to connect a new iterative task with a previous iterative task. The second case (right), we need a single task that collects the results from all the tasks of the iterative task. Each case has a specific sintaxis. In the case of the iterative to iterative tasks:

Show_[home;etc;var]){

#Initialize

?

cat !List_*!/out

}

We have the classic iterative task but it uses !List_*! (the !ITERATIVE_TASK_NAME*! expression) to indicate we want to connect each previous task to each new one. In the case of connecting a set of iterative tasks to a single task, we have the following:

Show){

#Initialize

?

cat !List_!/out

}

We have a single task node that uses !List_! (the !ITERATIVE_TASK_NAME! expression) to collect the paths to the specified file in each previous task and insert them in the current task. The following presents the execution for the iterative dependency case:

autoflow -v -w af_templates/iterative_task_iterative_deps.af

Click to see results

List_home >

ls /home > out

exec/ls_0000 False

List_etc >

ls /etc > out

exec/ls_0001 False

List_var >

ls /var > out

exec/ls_0002 False

Show_home >

cat exec/ls_0000/out

exec/cat_0000 False

List_home

Show_etc >

cat exec/ls_0001/out

exec/cat_0001 False

List_etc

Show_var >

cat exec/ls_0002/out

exec/cat_0002 False

List_var

But now, we will apply this to theiterative to single dependency case:

autoflow -v -w af_templates/iterative_task_single_dep.af

Click to see results

List_home >

ls /home > out

exec/ls_0000 False

List_etc >

ls /etc > out

exec/ls_0001 False

List_var >

ls /var > out

exec/ls_0002 False

Show >

cat exec/ls_0000/out exec/ls_0001/out exec/ls_0002/out

exec/cat_0000 False

List_home

List_etc

List_var

Using static variables in workflow templates

The workflow sintaxis described previously only allows building workflow templates with hardcoded input. To allow dynamic asignation of input paths or parameters we can use AutoFlow static variables:

$folder=/var

List_dir){

#Initialize

?

ls $folder/out

}

We can define workflow static variables as $VARIABLE_NAME=value (in red in the example template) and use them anywhere in the template. They are string variables so they could contain simple parameters, paths, iterators, full task nodes... anything. But with the given example, we could think that the problem is the same due to the fact that the variable declaration is hardcoded in the template. This is true, but we can use the -V flag to override the template declaration (or simply make the variable declaration because the template lacks of these variable declarations). The sintaxis of the -V flag should be '$VAR1=value1,$VAR2=value2,..':

autoflow -v -w af_templates/var_definition.af -V $folder=/home

Click to see results

List_dir >

ls /home > out

exec/ls_0000 False

As shown, the command ls is made onto /home folder instead of the originally declared /var folder.

Special attributes to modify task execution behaviour

There is a set of special characters that, when used preceding the name of task when the task is defined, change its execution behaviour. These characters are % to make a task commented/not executable, ! to avoid creating subfolder for the task, and & to aggregate several tasks in a single one.

%Show_list){

?

ls /sys > out

}

!listing){

?

ls /etc > out

}

&stats){

?

wc -l listing)/out

}

char_stats){

?

wc listing)/out

}

autoflow -v -w af_templates/special_attributes.af

Click to see results

show_list >

ls /sys > out

exec/ls_0000 True

listing >

ls /etc > out

exec False

stats >

wc -l exec/out

exec/wc_0000 False

listing

char_stats >

wc exec/out

exec/wc_0001 False

listing

In this way, the task show_list has its 'commented task/not execute' attribute as True whereas the rest have it as False. For the listing task, its path is exec instead exec/ls_0001 when ! is applied. The case of the stats task seems to do nothing but if we check the exec/wc_0000 folder it should be empty and if we read the exec/wc_0001/char_stats.sh file it will contain the commands of both, stats and char_stats tasks merged together.

Using nested tasks

In some cases, we need to repeat a set of tasks with diferent parameters and a first approach would be to convert each task of the set in an iterative task with the same iterator. To avoid this redundance we can use the nested tasks as can be seen in the following template:

ls_[temp;sys]){

?

scan){

?

pwd > file

}

show_[file;folder]){

?

echo `cat scan)/file` ls_(+) (*)

}

In red we find a clasical iterative task (ls_) but the body doesn't have typical commands. There are two nested nodes: scan and show_ (this last one an iterative task). In this case, AutoFlow will create one copy of these tasks for the temp item and another copy for the item sys, both in the ls_ iterative task. To call this iterator in the desired node we use the task name plus the (+) expression as in ls_(+). This behaviour is exactly the same as with the (*) expression but the difference is that this form allows us several levels of nesting and call in each location the desired iterator. We can see the template interpretation as following:

autoflow -v -w af_templates/nested_tasks.af

Click to see results

scan_temp >

pwd > file

exec/pwd_0000 False

scan_sys >

pwd > file

exec/pwd_0001 False

show_file_temp >

echo `cat exec/pwd_0000/file` temp file

exec/echo_0000 False

scan_temp

show_folder_temp >

echo `cat exec/pwd_0000/file` temp folder

exec/echo_0001 False

scan_temp

show_file_sys >

echo `cat exec/pwd_0001/file` sys file

exec/echo_0002 False

scan_sys

show_folder_sys >

echo `cat exec/pwd_0001/file` sys folder

exec/echo_0003 False

scan_sys

Regular expressions applied to task dependencies

When we work with nested tasks or in a complex workflow (in this case, you will use nested tasks) a problem arises. The iterations and the applied permutations will generate a bunch of tasks and we need to select a subset of tasks to follow the steps of our workflow. In this case, we need to capture specific tasks using their name to be able to apply the desired operation. For this purpose, regular expressions are very powerful and useful giving a great versatility to our workflow. We have to remember that with a set of tasks there are two cases: 1) we need to create a new task for each one or 2) we need to create a single task that takes data from the whole task set. The first case should be solved with the following:

Show_list){

?

ls /sys > out

}

listing){

?

ls -lsa /etc > out

}

get_content_[JobRegExp:list:-]){

?

wc -l (*)/out

}

autoflow -v -w af_templates/regex_iterative.af

Click to see results

Show_list >

ls /sys > out

exec/ls_0000 False

listing >

ls -lsa /etc > out

exec/ls_0001 False

get_content_Show_list >

wc -l exec/ls_0000/out

exec/wc_0000 False

Show_list

get_content_listing >

wc -l exec/ls_0001/out

exec/wc_0001 False

listing

The second case (set of tasks to one single task) should be solved with the following:

Show_list){

?

ls /sys > out

}

listing){

?

ls -lsa /etc > out

}

get_content){

?

wc -l !JobRegExp:list:-!/out

}

autoflow -v -w af_templates/regex_single.af

Click to see results

Show_list >

ls /sys > out

exec/ls_0000 False

listing >

ls -lsa /etc > out

exec/ls_0001 False

get_content >

wc -l exec/ls_0000/out exec/ls_0001/out

exec/wc_0000 False

Show_list

listing

In both cases, the JobRegExp expression has defined two fields: one with the string 'list' to be searched in the tasks names and a second set to '-'. The second is a regexp to be applied to iterators. If the first has a match to an iterative task and the second field is set with a RegExp, this RegExp is applied to select only tasks with correct items in the parameter. The purpose of this is that the main RegExp could match with several iterative tasks but you are only interested in one iteration. Imagine that you execute 10 AI models with diferent values of one parameter (0, 1 and 2) and you need the executions of 0 to build ground truth. Then, you can use JobRegExp:launchAImodel:0 to capture the desired executions an only get the result for this case.

Merging templates

Another powerful feature of AutoFlow is the possibility of merging several templates to reuse them or split a complex workflow in submodules. In this case we only have to give several paths to file templates to the -w flag separated by commas. AutoFlow will parse them in the specified order but the defined tasks will be put together in one workflow. Here, we show two templates that could be merged:

List template

List_dir){

# Initialize

?

ls > out

}

Show template

Show){

# Initialize

?

cat $file

}

Of course, we need to specify how to connect these templates which means setting the dependencies of tasks between different templates. To do so, we use the -V flag to define a variable that acts as input of the task in the 'Show template' and we can use it to specify a dependency to a node in the List template.

autoflow -v -w af_templates/wf_merge_list.af,af_templates/wf_merge_show.af -V $file=List_dir)/out

Click to see results

List_dir >

ls > out

exec/ls_0000 False

Show >

cat exec/ls_0000/out

exec/cat_0000 False

List_dir

Handling workflow resources

When we work with supercomputing resources, we need to ask for the specific resources for our work. In this case, we need to ask for a number of cpu, memory, time and specify which computing node type fits our task. To set workflow resources, AutoFlow has the -c, -m, -t and -n flags, respectively. Their definitions, as well as other complementary ones, are the following:

AutoFlow –w template_name #Mandatory arguments

#Optional arguments

-c: Number of cpus needed for each task

-t: Time needed for each task. Format: days-hours:minutes:seconds

-m: RAM memory needed for the task. Format: required number plus standard memory units: 5GB, 4000MB, etc

-n: Name of a specific system queue (often, computing nodes with specific hardware)

-s: If set, the requiered cpus could be allocated from several computing nodes

-u: Maximum number of computer nodes to allocate the requested cpus (per task)

This way of defining the resources has a limitation: all tasks are set with the same resources. To overcome this, we have a sintaxis to specify resources per task:

List_dir_[sys;etc]){

resources: -n bigmem -c 1 –t 7-00:00:00 –m 100gb

?

ls /(*) > out

}

This way, the List_dir_ tasks will have the specified resources overriding the general resources configured for the workflow. To observe the resource changes, we have to see the generated scripts in each task folder and read the commented section at the sh header.

Task execution control

In some cases, we need to execute only a subset of tasks in the workflow because we need to update results or due to minor errors that must be fixed. To do so, in addition to using the character % in the task name within the template, we can use the --white_list and --black_list flags. These flags are used as input string patterns separated by commas to match the task names. When a white flag is used the tasks that do NOT match with the patterns are marked NOT to be executed. If the black flag is used, the tasks that match the patterns are marked NOT to be executed.

algo){

?

echo 'OK'

}

result){

?

echo algo)/file

}

In this example, the task 'result' is marked to NOT be executed because the white_list flag does NOT match 'result'.

autoflow -v -w af_templates/simple.af --white_list alg*

Click to see results

algo >

echo 'OK'

exec/echo_0000 False

result >

echo exec/echo_0000/file

exec/echo_0001 True

algo

In this example, the task result is marked to be executed because the black_list flag does match the 'algo' task and therefore marks it to NOT be executed. We have the opposite behaviour than the previous case.

autoflow -v -w af_templates/simple.af --black_list alg*

Click to see results

algo >

echo 'OK'

exec/echo_0000 True

result >

echo exec/echo_0000/file

exec/echo_0001 False

algo

Advanced AutoFlow variable configuration

In complex workflows we have to deal with a large amount of variables that are hard to set in the command line or we need to change to different values according to the analyses. For this reason, the -V flag to set AutoFlow variables can use path to variable text files. In the following example, there is a template that uses the file_name, attribute1 and attribute2 variables with the last two defined in a file:

result){

?

echo -e "$attribute1\t$attribute2" > $file_name

}

Text file with variable definitions, basic_with_vars.var:

attribute1=exec

attribute2=login

Using var file with a workflow template:

autoflow -v -w af_templates/basic_with_vars.af -V $file_name=test,af_templates/basic_with_vars.var

Click to see results

result >

echo -e "exec\tlogin" > test

exec/echo_0000 False

Handling resource configuration files

When we have multiple tasks that share computational resources or we need to execute them using different resources in the same workflow, inline or cmd management could be difficult. To deal with this situation, AutoFlow has the option of using resource files in json (wf_with_res_prof.json) format in which we can define task resources profiles with a specific name:

{

"resources": {

"test": {

"cpu" : 2,

"mem" : "300GB",

"time" : "7-00:00:00",

"node" : "bigmem"

}

Then, the workflow template changes the resource line file to -r flag and the resource profile name that is needed for this task:

algo){

resources: -r test

?

echo -e "OK\t[cpu]" > log

}

And finally, when we execute the template with the described resource file, we can observe how the cpus are replaced with the number specified in the test profile.

autoflow -v -v -w af_templates/wf_with_res_prof.af -r af_templates/wf_with_res_prof.json

Click to see results

algo >

echo "OK\t2" > log

exec/echo_0000 False

{'cpu': 2, 'mem': '300GB', 'time': '7-00:00:00', 'node': 'bigmem', 'multinode': 0, 'ntask': False, 'additional_job_options': None, 'done': False, 'folder': True, 'buffer': False, 'exec_folder': '/mnt/home/users/pab_001_uma/pedro/dev_py/py_autoflow/tests/cli_examples/exec/echo_0000', 'cpu_asign': 'number', 'virt': 'test_virt', 'virt_type': 'env'}

Handling external workflow dependencies

A workflow executes different software and libraries from programming languages getting them installed in the Operating System is required. If we execute our worflow in another computer, this software could be not available. For this reason, the resources file could include a virt section that describes the desired method to install the dependencies (mostly through virtualization strategies):

{

"resources": {

"test": {

"cpu" : 2,

"mem" : "300GB",

"time" : "7-00:00:00",

"node" : "bigmem"

"virt" : "test_virt"

"virt_type" : "env"

}

"virt": {

"test_virt" : {

"virt_type" : "env",

"venv_opts" : ["--system-site-packages"],

"requirements": ["cowsay"],

"pip_opts": []

}

Using the previous workflow template, we get the following:

autoflow -v -w af_templates/wf_with_res_prof.af -r af_templates/wf_with_res_prof_virt.json

Click to see results

algo >

echo "OK\t2" > log

exec/echo_0000 False

The dry execution doesn't show changes but if the generated sh is inspected, we could see how to include the loading of python virtual environment. This environment will include all the specified python libraries in the test_virt profile. This virtualization profile is called by the test resource profile with the key virt and the key virt_type specifies if is a venv, a anaconda environment or singularity image. The test_virt virtualization profile needs the key requirements as a string vector/list of library names (as pip specifications). We can also pass options for the creation of the venv using the venv_opts key with a vector/list. We use --system-site-packages to install only the libraries that are not in the system to avoid redundancy. Finally when the libraries are installed in the venv with pip, we can use additional options using the pip_opts key.

Main purpose

The flow_logger function has three main purposes: 1) add tracking system to the task execution, 2) show workflow status and 3) manage the execution of failed tasks. To do this, flow_logger works as a loggin system and each executed task invokes the program at the start and at the stop of the execution (the user doesn't have to care about this because AutoFlow adds this commands to the sh script). A log file is saved by each task in which a status signal with its time record is written. The loggin system has three different signals: 1) set: It means that the task has been selected for execution and it has been sent to the execution engine, 2) start: The sh script started to execute as the flow_logger is the first command in the script and 3) stop: The sh script finished the execution and flow_logger was invoked at the last line of the script. In this way, a sucessful execution of the task must have the three signals. If there are several executions of the task, for each signal, the last record is selected.

We have to note that AutoFlow interprets the workflow and launches all the tasks at once to either the shell or the queue system. In this way, it has no way of knowing the workflow status and the worflow is managed by the operative system or the queue system. The user must check if the tasks are executing or not (a top command in shell or querying the queue system). If at least one task remains in execution or to be executed, we consider the workflow in execution. If there are not tasks in execution or waiting for execution, we consider the workflow finished. We can find that the lack of certain signals marks the status of the task as following: 1) SUCCESSFUL (SUCC):All task signals are detected. This status is not related with the workflow status, 2) RUNNING/ABORTED (RUN/ABORT): The stop signal is not detected. If the workflow is in execution, this means that the task is RUNNING but if the workflow is finished, the task had some kind of error and aborted and 3) PENDING/NOT EXECUTED (PEND/NOT): The start and stop signal are not detected, the task is only marked to be executed. If the workflow is en execution, this means that the task is not executed yet but if the workflow is finished, the task was never executed. This could be due to the failing of a task needed as dependency (very likely) or a hardware problem that made the computer fail the execution.

Workflow logging

First, we will execute a previous template (using -v to generate workflow structure only).

autoflow -v -w af_templates/simple.af

Click to see results

algo >

echo 'OK'

exec/echo_0000 False

result >

echo exec/echo_0000/file

exec/echo_0001 False

algo

It is not shown, but we have executed flow_logger to set the signals corresponding to the sucess of both tasks in the workflow. Then, we execute the flow_logger command to report the workflow task status. The flag -e is the path to the AutoFlow execution and -w tells flow_logger that the workflow is finished. The flag -r with the argument ALL makes the report for all the workflow tasks. If instead of ALL we use some of the task status listed previously, only tasks with this status are shown. The flag --raw is for debugging and allows this guide to capture the flow_logger. Ignore it for normal flow_logger use.

flow_logger -e ./exec -w -r ALL --raw

Click to see results

Status	Folder	Time	Size	Job Name
SUCC	echo_0000	1 s	1.0K	algo
SUCC	echo_0001	1 s	1.0K	result

The workflow report is a table with the following columns: 1) Status that corresponds with the task status described in the first section of flow_logger, 2) Folder indicates the workflow subfolder of the task, 3) Time is the elapsed time for the task execution, 4) Size indicates how much space storage is allocated by the task execution and 5) Job Name task name defined in the workflow template. Now, we will execute and configure an execution that has not finished:

autoflow -v -w af_templates/simple.af

Click to see results

algo >

echo 'OK'

exec/echo_0000 False

result >

echo exec/echo_0000/file

exec/echo_0001 False

algo

Now, we will execute flow_logger without -w flag, and we see a running task and a pending task:

flow_logger -e ./exec -r ALL --raw

Click to see results

Status	Folder	Time	Size	Job Name
RUN	echo_0000	-	1.0K	algo
PEND	echo_0001	-	1.0K	result

But if we add the -w flag that says to flow_logger that the workflow is finished, the task status changes to aborted and not (not executed).

flow_logger -e ./exec -w -r ALL --raw

Click to see results

Status	Folder	Time	Size	Job Name
ABORT	echo_0000	-	1.0K	algo
NOT	echo_0001	-	1.0K	result

Executing failed tasks

When we execute a workflow, we can find that some tasks have failed or are not launched due to hardware problems. If we change the -r flag for the boolean -l flag, flow_logger will analyse the workflow execution. All aborted tasks will be executed and all tasks that depend on the aborted tasks too. If there are not executed tasks (NOT) that do not depend on failed tasks due to a system failure we need to add the -p flag to execute them:

flow_logger -e ./exec -w -l -p

This command will execute the aborted or not executed tasks and if we execute a flow_logger report command (with -r and without -l nor -p flags), we will obtain the status table shown at the beginning of the previous section.

How to make controlled task errors

Use bash to execute an exit command if a user condition is not meet TODO.

AutoFlow

Description

Components

How does AutoFlow work?

Basic template sintaxis

Workflow Execution

Executing basic example

Executing on the command line

Handling iterative tasks

Working with iterative tasks and dependencies

Using static variables in workflow templates

Special attributes to modify task execution behaviour

Using nested tasks

Regular expressions applied to task dependencies

Advanced capabilities

Merging templates

Handling workflow resources

Task execution control

Advanced AutoFlow variable configuration

Handling resource configuration files

Handling external workflow dependencies

flow_logger

Description

Main purpose

Execution modes

Workflow logging

Executing failed tasks

How to make controlled task errors