-
Notifications
You must be signed in to change notification settings - Fork 10
feat: support decimal #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@CodeRabbit review |
✅ Actions performedReview triggered.
|
WalkthroughAdds comprehensive decimal type support to the QuestDB Python client, including protocol version 3, binary serialization paths for Pandas/Arrow decimals, a mpdecimal compatibility layer for internal decimal representation, updated type signatures, and corresponding test coverage across multiple integration points. Changes
Sequence DiagramsequenceDiagram
participant User as User Code
participant API as Buffer/SenderTransaction
participant Parse as Decimal Parser
participant Serialize as Serializer
participant ILP as ILP Payload
User->>API: row(columns={...Decimal...})
API->>Parse: Detect Decimal type
Parse->>Parse: Extract mpd_t from PyDecObject
alt Decimal is NaN/Inf
Parse-->>Serialize: Special handling (empty)
else Decimal is normal
Parse->>Parse: Build unscaled integer from digits
Parse->>Parse: Calculate scale (-exponent)
activate Serialize
Serialize->>Serialize: Encode scale as uint
Serialize->>Serialize: Convert unscaled to big-endian bytes
Serialize->>ILP: Append decimal column data
deactivate Serialize
end
ILP-->>User: Serialized ILP message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: Heterogeneous changes spanning new internal APIs (mpdecimal compatibility layer), protocol version bumps, multi-variant serialization paths (5 decimal source types), updated public signatures, error-code mappings, and cross-file dependencies. Dense logic in decimal conversion (scale handling, binary representation, flag extraction) and multiple interconnected updates to enum dispatch codes and target/source mappings require careful verification of consistency across layers. Substantial test additions validate behavior but add review scope. Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/questdb/ingress.pyx (1)
1043-1049
: Minor: fix valid-types error message (missing comma concatenates entries)
'datetime.datetime' 'numpy.ndarray'
concatenates into one token.Apply this diff:
- 'TimestampMicros', - 'datetime.datetime' - 'numpy.ndarray')) + 'TimestampMicros', + 'datetime.datetime', + 'numpy.ndarray'))src/questdb/ingress.pyi (1)
1030-1039
: AddDecimal
to thecolumns
type signature.The
Sender.row
method'scolumns
parameter is missingDecimal
in its type union, while bothBuffer.row
(line 386) andSenderTransaction.row
(line 207) include it. This inconsistency will cause type checking errors when users try to pass Decimal values toSender.row
.Apply this diff to add Decimal to the type union:
columns: Optional[ - Dict[str, Union[bool, int, float, str, TimestampMicros, datetime, np.ndarray]] + Dict[str, Union[bool, int, float, str, TimestampMicros, datetime, np.ndarray, Decimal]] ] = None,
🧹 Nitpick comments (6)
src/questdb/ingress.pyx (1)
2414-2429
: Doc nit: mention protocol version 3Property doc explains v1 and v2 only. Consider adding a short note for v3 to avoid confusion.
src/questdb/dataframe.md (1)
96-106
: Decimal docs read wellClear coverage across pandas/NumPy/Arrow with examples; fits new tests. Consider adding the supported scale range (0–76) note here for completeness.
Also applies to: 129-157
src/questdb/ingress.pyi (1)
709-711
: Clarify null representation for Decimal columns.The table shows
Y (NaN)
for nulls in the Decimal row. However,NaN
is typically associated with float types. For Decimal objects, nulls are represented asNone
orpandas.NA
, notNaN
. Consider changing this to justY
orY (None)
for clarity.Apply this diff if you agree:
* - ``'object'`` (``Decimal`` objects) - - Y (``NaN``) + - Y - ``DECIMAL``src/questdb/mpdecimal_compat.h (1)
1-19
: Document CPython version compatibility assumptions.This compatibility layer relies on CPython's internal
Decimal
implementation details (struct layout and limb size). These internals may change between CPython versions. Consider:
- Adding a comment documenting which CPython versions are supported (e.g., 3.8+)
- Adding runtime checks in the Cython code to verify struct layout hasn't changed
- Noting in documentation that this is a best-effort compatibility layer
Example comment to add:
+/* + * Compatibility layer for CPython's decimal module (libmpdec). + * Tested with CPython 3.8 through 3.12. + * May break with future CPython versions if internal Decimal layout changes. + */ + /* Determine the limb type used by CPython's libmpdec build. */ #if SIZE_MAX == UINT64_MAXsrc/questdb/dataframe.pxi (2)
59-73
: Add comments explaining byte-swap usage for Arrow decimals.The
bswap32
andbswap64
functions are used later for Arrow decimal types (lines 2226, 2245, etc.), but it's not immediately clear why byte-swapping is needed. Arrow stores decimal values in big-endian format, while the ILP protocol expects a specific byte order.Add a comment explaining the endianness conversion:
+# Arrow decimal types store values in big-endian format (network byte order). +# These functions convert to the format expected by the ILP protocol. cdef inline uint32_t bswap32(uint32_t value):
2213-2295
: LGTM! Arrow decimal serialization correctly handles all bit widths.All four Arrow decimal serialization functions properly:
- Check Arrow validity bitmaps
- Send NULL for invalid values
- Perform correct byte-swapping for endianness conversion
- Use the stored scale from column metadata
The 128-bit and 256-bit handlers correctly swap both byte order within each 64-bit word and reverse the word order.
Optional: Consider a helper function to reduce duplication.
The four functions have similar structure. You could extract common logic:
cdef void_int _arrow_decimal_to_bytes( col_t* col, size_t byte_count, uint64_t* out_buffer, bint* valid_out) noexcept nogil: """Extract and byte-swap Arrow decimal to output buffer.""" # Common extraction and swapping logicThis would reduce duplication and make maintenance easier, though the current approach is also acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
c-questdb-client
(1 hunks)src/questdb/dataframe.md
(1 hunks)src/questdb/dataframe.pxi
(17 hunks)src/questdb/ingress.pyi
(7 hunks)src/questdb/ingress.pyx
(5 hunks)src/questdb/line_sender.pxd
(3 hunks)src/questdb/mpdecimal_compat.h
(1 hunks)src/questdb/mpdecimal_compat.pxd
(1 hunks)test/test.py
(4 hunks)test/test_dataframe.py
(5 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
src/questdb/mpdecimal_compat.h
[error] 4-4: 'Python.h' file not found
(clang-diagnostic-error)
🪛 Ruff (0.14.1)
test/test_dataframe.py
90-90: Avoid specifying long messages outside the exception class
(TRY003)
93-93: Avoid specifying long messages outside the exception class
(TRY003)
95-95: Avoid specifying long messages outside the exception class
(TRY003)
97-97: Avoid specifying long messages outside the exception class
(TRY003)
102-103: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_arm64)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_x86_64)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel macos_x64)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_pypy)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_i686)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_musllinux)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_manylinux_x86_64)
- GitHub Check: questdb.py-questdb-client (1) (cibuildwheel start_linux_arm64_agent_aws)
- GitHub Check: questdb.py-questdb-client (Building and testing on windows-msvc-2019)
- GitHub Check: questdb.py-questdb-client (Building and testing on mac)
- GitHub Check: questdb.py-questdb-client (Building and testing on linux-qdb-master)
- GitHub Check: questdb.py-questdb-client (Building and testing on linux-old-pandas)
- GitHub Check: questdb.py-questdb-client (Building and testing on linux)
- GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion2x)
- GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion1x)
🔇 Additional comments (29)
c-questdb-client (1)
1-1
: Submodule reference update requires verification of C extension changes.This file contains only a pointer update to the C extension submodule. The actual Decimal support implementation in the C extension (
questdb/c-questdb-client
) is not accessible for review from this context.Given that the broader PR adds significant Decimal support across the Python wrapper (dataframe serialization, ILP ingestion, type signatures), ensure that:
- The C extension at commit
5b17715...
includes corresponding Decimal serialization/deserialization logic.- The binary protocol changes (if any) are compatible with the Python-side changes.
- The submodule commit has been tested with the tandem QuestDB core PR (questdb/questdb#6068).
Note: This PR is marked "DO NOT MERGE" and depends on upstream changes.
To verify C extension compatibility, you may want to:
- Inspect the C extension diff at the target commit to ensure Decimal support aligns with the Python wrapper changes.
- Confirm that protocol version 3 support (mentioned in the AI summary) is implemented in the C extension.
- Verify integration tests pass with the updated submodule.
test/test.py (3)
45-49
: V3 pandas tests import looks goodKeeps suite discoverable only when pandas is present.
415-425
: Protocol-version validation updates are correctTreating 3 as valid and 4/'4' as invalid with the updated error text matches the new Sender/Buffer checks.
If CI still runs the “unsupported client for V3” test, please confirm it’s updated (or gated) to reflect that the client now supports V3.
Also applies to: 430-432
1478-1479
: Public name fix for V2Renaming to “protocol version 2” is consistent with the class.
test/test_dataframe.py (3)
84-121
: Decimal payload helpers are solidHelpers make intent clear and align with the binary format used in assertions.
570-585
: Comprehensive decimal test coverageCovers pyobj decimals (incl. special values) and Arrow decimals across widths; version-gated appropriately.
Also applies to: 586-597, 598-608, 609-646
1705-1709
: Updated error-message regexThe new wording (“Unsupported arrow type …”) matches current behavior.
src/questdb/mpdecimal_compat.pxd (1)
24-71
: Decimal → ILP conversion helper looks correct
- Handles NaN/Inf as nulls.
- Builds unscaled integer from mpd limbs correctly (LE limbs × MPD_RADIX).
- Enforces max scale 76 and applies sign.
One note: zero encodes to an empty mantissa (length 0), which matches the tests’ “special values” treatment; confirm your wire format also expects empty mantissa for numeric zero, or adjust to emit a single 0x00 byte.
src/questdb/ingress.pyx (1)
1241-1260
: Decimal support is limited to dataframe() path, not row(); verify PR description scope and consider scoping Decimal to dataframe onlyThe review comment is accurate. After examining the codebase:
- Buffer.row() columns parameter type hint excludes Decimal (only:
bool, int, float, str, TimestampMicros, TimestampNanos, datetime.datetime, numpy.ndarray
)- Decimal support via
decimal_pyobj_to_binary
is implemented only in dataframe.pxi- No
_column_decimal
method exists in the Buffer class; only_column_bool
,_column_i64
,_column_f64
,_column_str
,_column_ts_micros
,_column_ts_nanos
,_column_numpy
- Decimal in dataframe requires protocol v3 (tests skip for
version < 3
)If the PR description shows
sender.row(... Decimal(...))
, the documentation/example is inconsistent with the implementation. Either add Decimal support to row() (requiring_column_decimal
and protocol v3 guard) or scope examples/docs to dataframe-only.src/questdb/line_sender.pxd (2)
43-56
: LGTM! Protocol version and error code additions are well-structured.The addition of
line_sender_error_invalid_decimal
andline_sender_protocol_version_3
follows existing conventions and provides the necessary foundation for Decimal support.
268-282
: LGTM! Decimal buffer functions follow established patterns.The two new functions
line_sender_buffer_column_dec_str
andline_sender_buffer_column_dec
are well-designed:
- Consistent with existing column buffer functions
- Support both text (string) and binary formats
- Include proper error handling via
err_out
parametersrc/questdb/ingress.pyi (4)
43-61
: LGTM! Import and error code additions are correct.The import of
Decimal
and the addition ofDecimalError
to theIngressErrorCode
enum are necessary for type checking support.
207-207
: LGTM! Type signature correctly includes Decimal.The addition of
Decimal
to thecolumns
parameter type union inSenderTransaction.row
enables proper type checking for decimal column values.
386-386
: LGTM! Type signature correctly includes Decimal.The addition of
Decimal
to thecolumns
parameter type union inBuffer.row
enables proper type checking for decimal column values.
407-456
: LGTM! Documentation clearly illustrates Decimal usage.The example usage and type mapping table additions help users understand:
- How to pass Decimal values in the
columns
dict- The mapping from Python
Decimal
to ILPDECIMAL
typesrc/questdb/mpdecimal_compat.h (3)
21-35
: Add runtime validation for struct layout assumptions.The
mpd_t
andPyDecObject
struct definitions assume a specific memory layout that matches CPython's internal implementation. If CPython changes these internals, this code will silently produce incorrect results or crash.Consider adding runtime checks in the Cython initialization code (e.g., in
mpdecimal_compat.pxd
or module init) to verify:
- Size of Python Decimal objects matches expectations
- Basic sanity checks on extracted values (e.g., comparing against
decimal
module's official API)Example validation approach:
# At module initialization test_decimal = Decimal("123.45") # Extract using compatibility layer # Also extract using official decimal API # Assert they matchThis would catch breaking changes early rather than producing silent corruption.
37-44
: LGTM! Accessor functions correctly handle inline vs heap storage.The
decimal_digits()
function properly handles both storage modes:
- Heap-allocated: uses
dec->dec.data
- Inline (small decimals): uses
dec->data[4]
This matches CPython's optimization for small decimal values.
46-54
: LGTM! Flag definitions match mpdecimal constants.The flag enum and
MPD_RADIX
constant definitions are correct and consistent with libmpdec's public interface.src/questdb/dataframe.pxi (11)
96-110
: LGTM! Enum additions for decimal target are correct.The addition of
col_target_column_decimal = 9
and updatingcol_target_at = 10
maintains the enum sequence. The target name "decimal" in_TARGET_NAMES
is consistent with other entries.
152-179
: LGTM! Decimal source types cover all supported formats.The five decimal source types provide comprehensive coverage:
col_source_decimal_pyobj
: PythonDecimal
objectscol_source_decimal32/64/128/256_arrow
: Arrow decimal types of different bit widthsThe inclusion in
_PYOBJ_SOURCE_DESCR
enables clear error messages.
249-272
: LGTM! Target-to-source mappings are complete.The
_TARGET_TO_SOURCES
mapping correctly includes all five decimal source types for thecol_target_column_decimal
target. The addition to_FIELD_TARGETS
ensures decimal columns are recognized as field columns.
397-406
: LGTM! Dispatch codes follow established patterns.The five dispatch codes combining
col_target_column_decimal
with each decimal source type enable efficient routing in the serialization switch statement. This follows the same pattern used for other column types.
427-432
: LGTM! Scale field addition is well-documented.The
scale
field incol_t
correctly stores the decimal scale for Arrow types. The comment clearly indicates it's only used for Arrow decimals and defaults to 0.uint8_t
is sufficient for the 0-76 scale range.
956-979
: LGTM! Arrow decimal type resolution is comprehensive.The
_dataframe_series_resolve_arrow
function correctly:
- Handles all four Arrow decimal bit widths (32/64/128/256)
- Validates scale is within the supported 0-76 range
- Provides clear error messages with GitHub issue link
- Stores the scale for later use in serialization
1046-1047
: LGTM! Decimal detection in object columns is correct.The
isinstance(<object>obj, Decimal)
check properly identifies Decimal objects in pandas object-dtype columns and sets the appropriate source type. This is consistent with other type detection logic in_dataframe_series_sniff_pyobj
.
1163-1164
: LGTM! ArrowDtype support enables Arrow-backed decimal columns.The handling of
_PANDAS.ArrowDtype
by delegating to_dataframe_series_resolve_arrow
correctly enables support for Arrow-backed columns in pandas, including Arrow decimal types.
1300-1315
: LGTM! Formatting improves readability of bitwise operations.The reformatting of the bitwise validity checks with explicit parentheses and line breaks makes the bit manipulation logic clearer without changing behavior.
2171-2210
: LGTM! Python Decimal serialization handles all cases correctly.The serialization function properly handles:
- Null values (returns early)
- Special values like NaN/Inf (sends as NULL to server)
- Mantissa size validation (127-byte limit)
- Error reporting via IngressError
The use of
decimal_pyobj_to_binary
encapsulates the complex decimal-to-binary conversion logic.
2456-2465
: LGTM! Dispatch switch correctly routes all decimal types.The five decimal dispatch cases properly route to their respective serialization functions. The GIL handling is correct:
decimal_pyobj
doesn't passgs
(requires GIL)- Arrow variants pass
gs
(can release GIL)
This PR adds decimal support.
This is a tandem pr for:
Usage
Decimal object
Progress
Summary by CodeRabbit
New Features
Documentation
Bug Fixes