-
Notifications
You must be signed in to change notification settings - Fork 734
Typed processes #6368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Typed processes #6368
Conversation
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
f7dd01a
to
71526ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. I like the tutorial in particular. I think it's clear and includes the right amount of detail. Also, the order makes sense and it's a good length.
I will take a second pass and nit pick the language. In the meantime, I've added two high level comments. They are very minor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a too big change compared to current syntax. I do not support this approach
06e9d56
to
25a80b1
Compare
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
25a80b1
to
0e7be56
Compare
Updated to use "phase 1" syntax, i.e. support for multiple input channels and tuple inputs |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman - I went through the tutorial in detail focusing on the language. I split everything into separate comments to hopefully make it easier to accept/reject.
I found some of the code blocks confusing as when I went into the example repo the code blocks didn't match what was in master branch. I'm fine with using the rnaseq-nf
example and a little bit of difference is okay, but if anyone does what I tried to do it's hard to follow. Can we better align this? Alternatively, can we peel off this example from rnaseq-nf
and start building an repo full of examples specifically for the docs? If might give a little more latitude for v1, v2, v3 of tutorials like this and allow better synergy between what is written and what's in the repo. Happy to hear your thoughts
|
||
# Migrating to static types | ||
|
||
Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example. | |
Nextflow 25.10 introduces the ability to use *static types* in Nextflow pipelines. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example. |
# Migrating to static types | ||
|
||
Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a note here that static types are optional?
|
||
## Overview | ||
|
||
Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code. | |
Static types allow you to specify variable types in Nextflow code for code documentation and deeper validation purposes. The Nextflow language server uses these type annotations to identify type-related errors during development without requiring code execution. |
|
||
Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code. | ||
|
||
While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines. | |
While Nextflow inherited type annotations from Groovy, types were limited to functions and local variables and couldn't be applied to Nextflow-specific concepts, such as processes, workflows, and pipeline parameters. Additionally, Groovy's type system was significantly larger and more complex than necessary for Nextflow pipelines. |
Keeping this paragraph in past tense.
|
||
While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines. | ||
|
||
Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from a pipeline parameter to a local variable in a process, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from a pipeline parameter to a local variable in a process, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library. | |
Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from pipeline parameters to local variables in processes, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library. |
- Values in the `output:` section can use standard library functions as well as several specialized functions for {ref}`process outputs <process-reference-typed>`. In this case, `tuple()` is the {ref}`standard library function <stdlib-namespaces-global>` (not the `tuple` output qualifier) and `file()` is the process output function (not the standard library function). | ||
|
||
:::{note} | ||
The other process sections, such as the directives and the `script:` block, are not shown here because they do not need to be changed. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other process sections, such as the directives and the `script:` block, are not shown here because they do not need to be changed. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before. | |
Other process sections, such as the directives and the `script:` block, are not shown because they do not require changes. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before. |
} | ||
``` | ||
|
||
Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`. | |
When you declare a `path` input with a file pattern, Nextflow requires both a variable name and an explicit *stage directive*. In this example, the input uses the variable name `logs`, and the `stageAs` directive stages the input using the glob pattern `*`. |
|
||
Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`. | ||
|
||
In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default. | |
In this case, you can omit the stage directive because `*` matches Nextflow's default staging behavior. File {ref}`collections <stdlib-types-iterable>` like `Bag<Path>` also use default staging. |
In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default. | ||
|
||
:::{note} | ||
In the legacy syntax, the `arity` option can be used to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, this behavior is determined by the type, i.e. `Path` vs `Bag<Path>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the legacy syntax, the `arity` option can be used to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, this behavior is determined by the type, i.e. `Path` vs `Bag<Path>`. | |
In the legacy syntax, you use the `arity` option to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, the type determines this behavior, i.e., `Path` vs `Bag<Path>`. |
|
||
<h4>INDEX</h4> | ||
|
||
The `INDEX` process can be migrated using principles already described in the other processes, so it is left as an exercise for the reader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `INDEX` process can be migrated using principles already described in the other processes, so it is left as an exercise for the reader. | |
Apply the same migration principles from the previous processes to migrate `INDEX`. | |
## Additional resources | |
See the following links to learn more about static types: | |
- {ref}`process-typed-page` | |
- {ref}`stdlib-types` | |
- {ref}`syntax-process-typed` |
This PR introduces a new syntax for process which uses typed inputs and outputs. The existing syntax is still supported.
This PR refactors several large classes -- namely
ProcessConfig
andTaskProcessor
-- to better separate concerns and enable a v1 / v2 model for process inputs/outputs. When moving existing code to new files, I try to change it as little as possible to not break anything.ProcessConfig refactor
The following new classes were spun out of
ProcessConfig
:ProcessConfigV1
/ProcessConfigV2
extendProcessConfig
with the declared inputs / outputs based on legacy (v1) or typed (v2) semanticsProcessDslV1
/ProcessDslV2
are builder DSLs for legacy / typed process definitionsProcessConfigBuilder
is an adapter for applying process configuration to a process definitionProcessBuilder
is the base builder class used by the above buildersTaskProcessor refactor
The following new classes were spun out of
TaskProcessor
:TaskInputResolver
implements the input file resolution frommakeTaskContextStage2()
TaskOutputResolver
implements the task output resolution logic for typed processesTaskEnvCollector
implements the output env/eval resolution fromcollectOutEnvMap()
TaskFileCollector
implements the output file resolution fromcollectOutFiles()
Typed inputs / outputs
The following new classes implement the new behavior for typed inputs / outputs:
ProcessInputs
andProcessOutputs
replaceInputsList
andOutputsList
from the v1 modelProcessInput
andProcessOutput
replace allInParam
andOutParam
classes from the v1 modelProcessFileInput
andProcessFileOutput
replaceFileInParam
andFileOutParam
in the v1 modelBackwards compatibility
The runtime supports both legacy (v1) and typed (v2) processes by creating the ProcessDef with either a ProcessConfigV1 or ProcessConfigV2.
ProcessDef, TaskProcessor, and TaskRun check this type to determine whether to use v1 or v2 semantics. An
instanceof
check is performed at these decision points:Based on initial work in #4553
TODO: