Skip to content

Conversation

IvoDD
Copy link
Collaborator

@IvoDD IvoDD commented Aug 25, 2025

Reference Issues/PRs

What does this implement or fix?

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

Copy link

Label error. Requires exactly 1 of: patch, minor, major. Found:

@IvoDD IvoDD force-pushed the run-all-pipeline-tests branch 2 times, most recently from d6e7c85 to d7d1877 Compare August 26, 2025 08:15
`NullReducer` code was assuming that `len(slice_and_keys) = len(row_slices_per_column)` when using `dynamic_schema=True`.
That is not true if we use projections.

This PR modifies `NullReducer` code to not rely on the slice index and
by preserving a `column_block_offset_` state avoids an unneeded `log(n)` search for the offset.

E.g. for the following projection our slicing would look like:
```
Given:

TD key 1:
index A
1     1
2     2

TD key 2:
index A  B
3     3  1
4     4  2

TD key 3:
index B
5     3
6     4

And we do a projection like `q.apply("C", q["A"] + q["B"])` our slicing would look
like:

Slice 1: TD key 1
Slice 2: TD key 2
Slice 3:
index C
3     4
4     6
Slice 4: TD key 3
```
When doing aggregation we explicitly default `sum=0` for slices with no
underlying values.
For arrow this means to not set the validity bitmap in this case and to
default initialize the values.

The change includes:
- Small refactor of `NullReducer` to extract common parts between
  `reduce` and `finalize` in `backfill_up_to_frame_offset`
- Modification of `Column::default_initialize` to work across several
  blocks
- Removes broken `memset` method from `ChunkedBuffer` and instead
  provides a new `util::initialize` method which can initialize a
`ChunkedBuffer` across blocks
- Add comment explaining `byte_blocks_at`
- Remove leftover prints
- Makes Aggregation clauses like `Mean` and `Count` respect input column
  sparsity
- Fixes `CopyToBufferTask` to respect sparsity for arrow
Also discovered an issue with appending an empty column set.
Added an xfail test for it and an issue 10029194063
@IvoDD IvoDD force-pushed the run-all-pipeline-tests branch from d7d1877 to 8629499 Compare September 12, 2025 08:14
@IvoDD IvoDD changed the base branch from master to fix-sparse-buffer-processing September 12, 2025 08:15
@IvoDD IvoDD force-pushed the run-all-pipeline-tests branch from 8629499 to 8df2f0e Compare September 12, 2025 13:38
@IvoDD IvoDD force-pushed the fix-sparse-buffer-processing branch 2 times, most recently from 6570743 to c8a0745 Compare September 17, 2025 09:25
Base automatically changed from fix-sparse-buffer-processing to master September 17, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant