Skip to content

Dimension order produced by Rechunk is opaque and not controllable; mismatch can cause errors from subsequent ChunksToZarr. #94

@mjwillson

Description

@mjwillson

In a pipeline in which Rechunk is followed by ChunksToZarr, one can run into errors when the dimension order of variables output by Rechunk doesn't match that of the template you pass to ChunksToZarr, resulting in errors like:

ValueError: variable 'geopotential_quantiles' already exists with different dimension names ('hour', 'dayofyear', 'level', 'latitude', 'longitude', 'quantile') != ('level', 'hour', 'dayofyear', 'latitude', 'longitude', 'quantile'), but changing variable dimensions is not supported by to_zarr().

As far as I can tell Rechunk doesn't allow you to control the output dimension order (at least, not on a per-variable basis, which may be necessary to match a given template). An alternative could be to transpose the output template instead to match whatever Rechunk is going to produce, but it's hard to know what that's going to be as well.

As another way around this, it'd be nice if ChunksToZarr could just do the transpose rather than complain if it finds this kind of dimension mismatch (same dimensions in a different order).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions