Skip to content

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Aug 27, 2025

This PR introduces a new syntax for process which uses typed inputs and outputs. The existing syntax is still supported.

This PR refactors several large classes -- namely ProcessConfig and TaskProcessor -- to better separate concerns and enable a v1 / v2 model for process inputs/outputs. When moving existing code to new files, I try to change it as little as possible to not break anything.

ProcessConfig refactor

The following new classes were spun out of ProcessConfig:

  • ProcessConfigV1 / ProcessConfigV2 extend ProcessConfig with the declared inputs / outputs based on legacy (v1) or typed (v2) semantics

  • ProcessDslV1 / ProcessDslV2 are builder DSLs for legacy / typed process definitions

  • ProcessConfigBuilder is an adapter for applying process configuration to a process definition

  • ProcessBuilder is the base builder class used by the above builders

TaskProcessor refactor

The following new classes were spun out of TaskProcessor:

  • TaskInputResolver implements the input file resolution from makeTaskContextStage2()

  • TaskOutputResolver implements the task output resolution logic for typed processes

  • TaskEnvCollector implements the output env/eval resolution from collectOutEnvMap()

  • TaskFileCollector implements the output file resolution from collectOutFiles()

Typed inputs / outputs

The following new classes implement the new behavior for typed inputs / outputs:

  • ProcessInputs and ProcessOutputs replace InputsList and OutputsList from the v1 model

  • ProcessInput and ProcessOutput replace all InParam and OutParam classes from the v1 model

  • ProcessFileInput and ProcessFileOutput replace FileInParam and FileOutParam in the v1 model

Backwards compatibility

The runtime supports both legacy (v1) and typed (v2) processes by creating the ProcessDef with either a ProcessConfigV1 or ProcessConfigV2.

ProcessDef, TaskProcessor, and TaskRun check this type to determine whether to use v1 or v2 semantics. An instanceof check is performed at these decision points:

if( config instanceof ProcessConfigV1 )
    // use legacy inputs/outputs
if( config instanceof ProcessConfigV2 )
    // use typed inputs/outputs

Based on initial work in #4553

TODO:

  • update docs
  • update tests
  • add e2e tests

Copy link

netlify bot commented Aug 27, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit af09783
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68bf021add01ef000985b2f4
😎 Deploy Preview https://deploy-preview-6368--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bentsherman bentsherman changed the title Typed processe Typed processes Aug 27, 2025
@bentsherman bentsherman marked this pull request as ready for review September 1, 2025 17:53
@bentsherman bentsherman requested review from a team as code owners September 1, 2025 17:53
@bentsherman bentsherman added this to the 25.10 milestone Sep 1, 2025
Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good. I like the tutorial in particular. I think it's clear and includes the right amount of detail. Also, the order makes sense and it's a good length.

I will take a second pass and nit pick the language. In the meantime, I've added two high level comments. They are very minor.

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a too big change compared to current syntax. I do not support this approach

@bentsherman bentsherman force-pushed the typed-processes branch 2 times, most recently from 06e9d56 to 25a80b1 Compare September 3, 2025 17:08
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member Author

Updated to use "phase 1" syntax, i.e. support for multiple input channels and tuple inputs

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman - I went through the tutorial in detail focusing on the language. I split everything into separate comments to hopefully make it easier to accept/reject.

I found some of the code blocks confusing as when I went into the example repo the code blocks didn't match what was in master branch. I'm fine with using the rnaseq-nf example and a little bit of difference is okay, but if anyone does what I tried to do it's hard to follow. Can we better align this? Alternatively, can we peel off this example from rnaseq-nf and start building an repo full of examples specifically for the docs? If might give a little more latitude for v1, v2, v3 of tutorials like this and allow better synergy between what is written and what's in the repo. Happy to hear your thoughts


# Migrating to static types

Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example.
Nextflow 25.10 introduces the ability to use *static types* in Nextflow pipelines. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example.

# Migrating to static types

Nextflow 25.10 introduces the ability to use *static types* in a Nextflow pipeline. This tutorial demonstrates how to migrate to static types using the [rnaseq-nf](https://github.com/nextflow-io/rnaseq-nf) pipeline as an example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a note here that static types are optional?


## Overview

Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code.
Static types allow you to specify variable types in Nextflow code for code documentation and deeper validation purposes. The Nextflow language server uses these type annotations to identify type-related errors during development without requiring code execution.


Static types are a way to specify the types of variables in Nextflow code, both to document the code and enable deeper forms of validation. The Nextflow language server can use type annotations to identify type-related errors during development, without needing to run the code.

While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines.
While Nextflow inherited type annotations from Groovy, types were limited to functions and local variables and couldn't be applied to Nextflow-specific concepts, such as processes, workflows, and pipeline parameters. Additionally, Groovy's type system was significantly larger and more complex than necessary for Nextflow pipelines.

Keeping this paragraph in past tense.


While Nextflow inherited type annotation from Groovy, types could only be specified for functions and local variables, and not for Nextflow-specific concepts such as processes, workflows, and pipeline parameters. Additionally, the Groovy type system is significantly larger and more complex than what is required for Nextflow pipelines.

Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from a pipeline parameter to a local variable in a process, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from a pipeline parameter to a local variable in a process, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library.
Nextflow 25.10 provides a native way to specify types at every level of a pipeline, from pipeline parameters to local variables in processes, using the {ref}`standard types <stdlib-types>` in the Nextflow standard library.

- Values in the `output:` section can use standard library functions as well as several specialized functions for {ref}`process outputs <process-reference-typed>`. In this case, `tuple()` is the {ref}`standard library function <stdlib-namespaces-global>` (not the `tuple` output qualifier) and `file()` is the process output function (not the standard library function).

:::{note}
The other process sections, such as the directives and the `script:` block, are not shown here because they do not need to be changed. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The other process sections, such as the directives and the `script:` block, are not shown here because they do not need to be changed. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before.
Other process sections, such as the directives and the `script:` block, are not shown because they do not require changes. As long as the inputs and outputs declare and reference the same variable names and file patterns, the other process sections will behave the same as before.

}
```

Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`.
When you declare a `path` input with a file pattern, Nextflow requires both a variable name and an explicit *stage directive*. In this example, the input uses the variable name `logs`, and the `stageAs` directive stages the input using the glob pattern `*`.


Since the first `path` input was declared with a file pattern, it requires an explicit *stage directive* to stage the file input under a specific alias. You must also declare a variable name for the input, which in the above example is `logs`. The `stageAs` directive specifies that the value of `logs` should be staged using the glob pattern `*`.

In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default.
In this case, you can omit the stage directive because `*` matches Nextflow's default staging behavior. File {ref}`collections <stdlib-types-iterable>` like `Bag<Path>` also use default staging.

In this case, the stage directive can actually be omitted because staging a file input as `*` is equivalent to the default behavior. Inputs that are {ref}`collections <stdlib-types-iterable>` of files (e.g., `Bag<Path>`) are also staged by default.

:::{note}
In the legacy syntax, the `arity` option can be used to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, this behavior is determined by the type, i.e. `Path` vs `Bag<Path>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the legacy syntax, the `arity` option can be used to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, this behavior is determined by the type, i.e. `Path` vs `Bag<Path>`.
In the legacy syntax, you use the `arity` option to specify whether a `path` qualifier expects a single file or collection of files. When using typed inputs and outputs, the type determines this behavior, i.e., `Path` vs `Bag<Path>`.


<h4>INDEX</h4>

The `INDEX` process can be migrated using principles already described in the other processes, so it is left as an exercise for the reader.
Copy link
Collaborator

@christopher-hakkaart christopher-hakkaart Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `INDEX` process can be migrated using principles already described in the other processes, so it is left as an exercise for the reader.
Apply the same migration principles from the previous processes to migrate `INDEX`.
## Additional resources
See the following links to learn more about static types:
- {ref}`process-typed-page`
- {ref}`stdlib-types`
- {ref}`syntax-process-typed`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants