DO NOT MERGE: Parametrize all pipeline tests to run with arrow output format #2610

IvoDD · 2025-08-25T09:55:36Z

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

github-actions · 2025-08-25T09:55:45Z

Label error. Requires exactly 1 of: patch, minor, major. Found:

`NullReducer` code was assuming that `len(slice_and_keys) = len(row_slices_per_column)` when using `dynamic_schema=True`. That is not true if we use projections. This PR modifies `NullReducer` code to not rely on the slice index and by preserving a `column_block_offset_` state avoids an unneeded `log(n)` search for the offset. E.g. for the following projection our slicing would look like: ``` Given: TD key 1: index A 1 1 2 2 TD key 2: index A B 3 3 1 4 4 2 TD key 3: index B 5 3 6 4 And we do a projection like `q.apply("C", q["A"] + q["B"])` our slicing would look like: Slice 1: TD key 1 Slice 2: TD key 2 Slice 3: index C 3 4 4 6 Slice 4: TD key 3 ```

When doing aggregation we explicitly default `sum=0` for slices with no underlying values. For arrow this means to not set the validity bitmap in this case and to default initialize the values. The change includes: - Small refactor of `NullReducer` to extract common parts between `reduce` and `finalize` in `backfill_up_to_frame_offset` - Modification of `Column::default_initialize` to work across several blocks - Removes broken `memset` method from `ChunkedBuffer` and instead provides a new `util::initialize` method which can initialize a `ChunkedBuffer` across blocks

- Add comment explaining `byte_blocks_at` - Remove leftover prints

- Makes Aggregation clauses like `Mean` and `Count` respect input column sparsity - Fixes `CopyToBufferTask` to respect sparsity for arrow

Also discovered an issue with appending an empty column set. Added an xfail test for it and an issue 10029194063

IvoDD force-pushed the run-all-pipeline-tests branch 2 times, most recently from d6e7c85 to d7d1877 Compare August 26, 2025 08:15

IvoDD added 7 commits September 12, 2025 10:44

Assertion for backfilling out of range

9de5577

Address review comments

e953a98

- Add comment explaining `byte_blocks_at` - Remove leftover prints

Correct sparse handling for Aggregation clauses

f80d857

- Makes Aggregation clauses like `Mean` and `Count` respect input column sparsity - Fixes `CopyToBufferTask` to respect sparsity for arrow

Abstract common sparse column init logic in a function

0415374

Add resampling aggregation test with missing data

720110a

Also discovered an issue with appending an empty column set. Added an xfail test for it and an issue 10029194063

IvoDD force-pushed the run-all-pipeline-tests branch from d7d1877 to 8629499 Compare September 12, 2025 08:14

IvoDD changed the base branch from master to fix-sparse-buffer-processing September 12, 2025 08:15

Parametrize all pipeline tests to run with arrow output format

8df2f0e

IvoDD force-pushed the run-all-pipeline-tests branch from 8629499 to 8df2f0e Compare September 12, 2025 13:38

IvoDD force-pushed the fix-sparse-buffer-processing branch 2 times, most recently from 6570743 to c8a0745 Compare September 17, 2025 09:25

Base automatically changed from fix-sparse-buffer-processing to master September 17, 2025 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO NOT MERGE: Parametrize all pipeline tests to run with arrow output format #2610

DO NOT MERGE: Parametrize all pipeline tests to run with arrow output format #2610

Uh oh!

IvoDD commented Aug 25, 2025

Uh oh!

github-actions bot commented Aug 25, 2025

Uh oh!

Uh oh!

DO NOT MERGE: Parametrize all pipeline tests to run with arrow output format #2610

Are you sure you want to change the base?

DO NOT MERGE: Parametrize all pipeline tests to run with arrow output format #2610

Uh oh!

Conversation

IvoDD commented Aug 25, 2025

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Uh oh!

github-actions bot commented Aug 25, 2025

Uh oh!

Uh oh!