Skip to content

Conversation

RaphDal
Copy link

@RaphDal RaphDal commented Oct 13, 2025

This PR adds decimal support.

This is a tandem pr for:

Usage

Decimal object

from decimal import Decimal

sender.row(
    'trades',
    symbols={
        'symbol': 'ETH-USD',
        'side': 'sell'},
    columns={
        'price': 2615.54,
        'amount': Decimal(0.00044),
        },
    at=TimestampNanos.now())

Progress

  • support binary and text formats
  • update questdb version when decimal is released

Summary by CodeRabbit

  • New Features

    • Added support for Decimal (fixed-point) data types in dataframe serialization and ingestion APIs.
    • Users can now pass Decimal values when writing data to QuestDB.
  • Documentation

    • Added comprehensive documentation on Decimal datatype mappings across Pandas, NumPy, and PyArrow with usage examples.
  • Bug Fixes

    • Added new error code handling for invalid decimal operations.

@RaphDal RaphDal changed the title feat: decimal support feat: support decimal Oct 17, 2025
@RaphDal
Copy link
Author

RaphDal commented Oct 21, 2025

@CodeRabbit review

@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Oct 21, 2025

Walkthrough

Adds comprehensive decimal type support to the QuestDB Python client, including protocol version 3, binary serialization paths for Pandas/Arrow decimals, a mpdecimal compatibility layer for internal decimal representation, updated type signatures, and corresponding test coverage across multiple integration points.

Changes

Cohort / File(s) Summary
Submodule and documentation
c-questdb-client, src/questdb/dataframe.md
Submodule reference updated to new commit; documentation expanded with Decimals section covering Pandas, NumPy, and PyArrow datatype mappings and examples.
Core decimal serialization support
src/questdb/dataframe.pxi
Implements decimal type handling throughout dataframe serialization: adds col_target_column_decimal target, decimal source variants (pyobj, arrow32/64/128/256), byte-swap utilities, Arrow decimal parsing and scale validation, and serialization cells for each decimal type variant.
mpdecimal compatibility layer
src/questdb/mpdecimal_compat.h, src/questdb/mpdecimal_compat.pxd
New header providing CPython/libmpdec compatibility with platform-dependent typedefs, mpd_t and PyDecObject struct definitions, inline accessors, and a Cython-level function to convert Decimal objects to ILP binary components with scale encoding.
Public API type signatures and error codes
src/questdb/ingress.pyi, src/questdb/ingress.pyx, src/questdb/line_sender.pxd
Extends Decimal support in Buffer.row and SenderTransaction.row type hints; adds IngressErrorCode.DecimalError and line_sender_error_invalid_decimal; introduces line_sender_buffer_column_dec_str and line_sender_buffer_column_dec functions; updates protocol version validation to accept v3; maps C error codes to Python.
Testing infrastructure
test/test.py, test/test_dataframe.py
Adds protocol version 3 validation tests, decimal payload decoding utilities, decimal-specific dataframe tests (pyobj, Arrow variants, special values), TestPandasProtocolVersionV3 class, and corrects test naming and error message expectations.

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant API as Buffer/SenderTransaction
    participant Parse as Decimal Parser
    participant Serialize as Serializer
    participant ILP as ILP Payload

    User->>API: row(columns={...Decimal...})
    API->>Parse: Detect Decimal type
    Parse->>Parse: Extract mpd_t from PyDecObject
    alt Decimal is NaN/Inf
        Parse-->>Serialize: Special handling (empty)
    else Decimal is normal
        Parse->>Parse: Build unscaled integer from digits
        Parse->>Parse: Calculate scale (-exponent)
        activate Serialize
        Serialize->>Serialize: Encode scale as uint
        Serialize->>Serialize: Convert unscaled to big-endian bytes
        Serialize->>ILP: Append decimal column data
        deactivate Serialize
    end
    ILP-->>User: Serialized ILP message
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: Heterogeneous changes spanning new internal APIs (mpdecimal compatibility layer), protocol version bumps, multi-variant serialization paths (5 decimal source types), updated public signatures, error-code mappings, and cross-file dependencies. Dense logic in decimal conversion (scale handling, binary representation, flag extraction) and multiple interconnected updates to enum dispatch codes and target/source mappings require careful verification of consistency across layers. Substantial test additions validate behavior but add review scope.

Poem

🐰 Decimals dance in scales and signs,
Binary bytes in perfect lines,
From Python objects to Arrow streams,
Precision flows through all our schemes!
Protocol three makes room for more,
Decimal magic at our door!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: support decimal" directly and accurately reflects the primary objective of the changeset. The changes span multiple components—documentation, DataFrame serialization, type definitions, low-level C APIs, and test coverage—all unified around adding decimal type support to the QuestDB Python client. The title is concise, uses the conventional "feat:" prefix appropriately for a new feature, and is specific enough that a teammate reviewing the history would immediately understand the purpose of these changes without ambiguity.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rd_decimal

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/questdb/ingress.pyx (1)

1043-1049: Minor: fix valid-types error message (missing comma concatenates entries)

'datetime.datetime' 'numpy.ndarray' concatenates into one token.

Apply this diff:

-                'TimestampMicros',
-                'datetime.datetime'
-                'numpy.ndarray'))
+                'TimestampMicros',
+                'datetime.datetime',
+                'numpy.ndarray'))
src/questdb/ingress.pyi (1)

1030-1039: Add Decimal to the columns type signature.

The Sender.row method's columns parameter is missing Decimal in its type union, while both Buffer.row (line 386) and SenderTransaction.row (line 207) include it. This inconsistency will cause type checking errors when users try to pass Decimal values to Sender.row.

Apply this diff to add Decimal to the type union:

     columns: Optional[
-            Dict[str, Union[bool, int, float, str, TimestampMicros, datetime, np.ndarray]]
+            Dict[str, Union[bool, int, float, str, TimestampMicros, datetime, np.ndarray, Decimal]]
     ] = None,
🧹 Nitpick comments (6)
src/questdb/ingress.pyx (1)

2414-2429: Doc nit: mention protocol version 3

Property doc explains v1 and v2 only. Consider adding a short note for v3 to avoid confusion.

src/questdb/dataframe.md (1)

96-106: Decimal docs read well

Clear coverage across pandas/NumPy/Arrow with examples; fits new tests. Consider adding the supported scale range (0–76) note here for completeness.

Also applies to: 129-157

src/questdb/ingress.pyi (1)

709-711: Clarify null representation for Decimal columns.

The table shows Y (NaN) for nulls in the Decimal row. However, NaN is typically associated with float types. For Decimal objects, nulls are represented as None or pandas.NA, not NaN. Consider changing this to just Y or Y (None) for clarity.

Apply this diff if you agree:

             * - ``'object'`` (``Decimal`` objects)
-              - Y (``NaN``)
+              - Y
               - ``DECIMAL``
src/questdb/mpdecimal_compat.h (1)

1-19: Document CPython version compatibility assumptions.

This compatibility layer relies on CPython's internal Decimal implementation details (struct layout and limb size). These internals may change between CPython versions. Consider:

  1. Adding a comment documenting which CPython versions are supported (e.g., 3.8+)
  2. Adding runtime checks in the Cython code to verify struct layout hasn't changed
  3. Noting in documentation that this is a best-effort compatibility layer

Example comment to add:

+/* 
+ * Compatibility layer for CPython's decimal module (libmpdec).
+ * Tested with CPython 3.8 through 3.12.
+ * May break with future CPython versions if internal Decimal layout changes.
+ */
+
 /* Determine the limb type used by CPython's libmpdec build. */
 #if SIZE_MAX == UINT64_MAX
src/questdb/dataframe.pxi (2)

59-73: Add comments explaining byte-swap usage for Arrow decimals.

The bswap32 and bswap64 functions are used later for Arrow decimal types (lines 2226, 2245, etc.), but it's not immediately clear why byte-swapping is needed. Arrow stores decimal values in big-endian format, while the ILP protocol expects a specific byte order.

Add a comment explaining the endianness conversion:

+# Arrow decimal types store values in big-endian format (network byte order).
+# These functions convert to the format expected by the ILP protocol.
 cdef inline uint32_t bswap32(uint32_t value):

2213-2295: LGTM! Arrow decimal serialization correctly handles all bit widths.

All four Arrow decimal serialization functions properly:

  • Check Arrow validity bitmaps
  • Send NULL for invalid values
  • Perform correct byte-swapping for endianness conversion
  • Use the stored scale from column metadata

The 128-bit and 256-bit handlers correctly swap both byte order within each 64-bit word and reverse the word order.

Optional: Consider a helper function to reduce duplication.

The four functions have similar structure. You could extract common logic:

cdef void_int _arrow_decimal_to_bytes(
        col_t* col, 
        size_t byte_count,
        uint64_t* out_buffer,
        bint* valid_out) noexcept nogil:
    """Extract and byte-swap Arrow decimal to output buffer."""
    # Common extraction and swapping logic

This would reduce duplication and make maintenance easier, though the current approach is also acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 996b251 and 1d4c69f.

📒 Files selected for processing (10)
  • c-questdb-client (1 hunks)
  • src/questdb/dataframe.md (1 hunks)
  • src/questdb/dataframe.pxi (17 hunks)
  • src/questdb/ingress.pyi (7 hunks)
  • src/questdb/ingress.pyx (5 hunks)
  • src/questdb/line_sender.pxd (3 hunks)
  • src/questdb/mpdecimal_compat.h (1 hunks)
  • src/questdb/mpdecimal_compat.pxd (1 hunks)
  • test/test.py (4 hunks)
  • test/test_dataframe.py (5 hunks)
🧰 Additional context used
🪛 Clang (14.0.6)
src/questdb/mpdecimal_compat.h

[error] 4-4: 'Python.h' file not found

(clang-diagnostic-error)

🪛 Ruff (0.14.1)
test/test_dataframe.py

90-90: Avoid specifying long messages outside the exception class

(TRY003)


93-93: Avoid specifying long messages outside the exception class

(TRY003)


95-95: Avoid specifying long messages outside the exception class

(TRY003)


97-97: Avoid specifying long messages outside the exception class

(TRY003)


102-103: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_arm64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel macos_x64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_pypy)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel windows_i686)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_musllinux)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel linux_x64_cpython_manylinux_x86_64)
  • GitHub Check: questdb.py-questdb-client (1) (cibuildwheel start_linux_arm64_agent_aws)
  • GitHub Check: questdb.py-questdb-client (Building and testing on windows-msvc-2019)
  • GitHub Check: questdb.py-questdb-client (Building and testing on mac)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-qdb-master)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux-old-pandas)
  • GitHub Check: questdb.py-questdb-client (Building and testing on linux)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion2x)
  • GitHub Check: questdb.py-questdb-client (Building and testing TestsAgainstVariousNumpyVersion1x)
🔇 Additional comments (29)
c-questdb-client (1)

1-1: Submodule reference update requires verification of C extension changes.

This file contains only a pointer update to the C extension submodule. The actual Decimal support implementation in the C extension (questdb/c-questdb-client) is not accessible for review from this context.

Given that the broader PR adds significant Decimal support across the Python wrapper (dataframe serialization, ILP ingestion, type signatures), ensure that:

  1. The C extension at commit 5b17715... includes corresponding Decimal serialization/deserialization logic.
  2. The binary protocol changes (if any) are compatible with the Python-side changes.
  3. The submodule commit has been tested with the tandem QuestDB core PR (questdb/questdb#6068).

Note: This PR is marked "DO NOT MERGE" and depends on upstream changes.

To verify C extension compatibility, you may want to:

  • Inspect the C extension diff at the target commit to ensure Decimal support aligns with the Python wrapper changes.
  • Confirm that protocol version 3 support (mentioned in the AI summary) is implemented in the C extension.
  • Verify integration tests pass with the updated submodule.
test/test.py (3)

45-49: V3 pandas tests import looks good

Keeps suite discoverable only when pandas is present.


415-425: Protocol-version validation updates are correct

Treating 3 as valid and 4/'4' as invalid with the updated error text matches the new Sender/Buffer checks.

If CI still runs the “unsupported client for V3” test, please confirm it’s updated (or gated) to reflect that the client now supports V3.

Also applies to: 430-432


1478-1479: Public name fix for V2

Renaming to “protocol version 2” is consistent with the class.

test/test_dataframe.py (3)

84-121: Decimal payload helpers are solid

Helpers make intent clear and align with the binary format used in assertions.


570-585: Comprehensive decimal test coverage

Covers pyobj decimals (incl. special values) and Arrow decimals across widths; version-gated appropriately.

Also applies to: 586-597, 598-608, 609-646


1705-1709: Updated error-message regex

The new wording (“Unsupported arrow type …”) matches current behavior.

src/questdb/mpdecimal_compat.pxd (1)

24-71: Decimal → ILP conversion helper looks correct

  • Handles NaN/Inf as nulls.
  • Builds unscaled integer from mpd limbs correctly (LE limbs × MPD_RADIX).
  • Enforces max scale 76 and applies sign.

One note: zero encodes to an empty mantissa (length 0), which matches the tests’ “special values” treatment; confirm your wire format also expects empty mantissa for numeric zero, or adjust to emit a single 0x00 byte.

src/questdb/ingress.pyx (1)

1241-1260: Decimal support is limited to dataframe() path, not row(); verify PR description scope and consider scoping Decimal to dataframe only

The review comment is accurate. After examining the codebase:

  • Buffer.row() columns parameter type hint excludes Decimal (only: bool, int, float, str, TimestampMicros, TimestampNanos, datetime.datetime, numpy.ndarray)
  • Decimal support via decimal_pyobj_to_binary is implemented only in dataframe.pxi
  • No _column_decimal method exists in the Buffer class; only _column_bool, _column_i64, _column_f64, _column_str, _column_ts_micros, _column_ts_nanos, _column_numpy
  • Decimal in dataframe requires protocol v3 (tests skip for version < 3)

If the PR description shows sender.row(... Decimal(...)), the documentation/example is inconsistent with the implementation. Either add Decimal support to row() (requiring _column_decimal and protocol v3 guard) or scope examples/docs to dataframe-only.

src/questdb/line_sender.pxd (2)

43-56: LGTM! Protocol version and error code additions are well-structured.

The addition of line_sender_error_invalid_decimal and line_sender_protocol_version_3 follows existing conventions and provides the necessary foundation for Decimal support.


268-282: LGTM! Decimal buffer functions follow established patterns.

The two new functions line_sender_buffer_column_dec_str and line_sender_buffer_column_dec are well-designed:

  • Consistent with existing column buffer functions
  • Support both text (string) and binary formats
  • Include proper error handling via err_out parameter
src/questdb/ingress.pyi (4)

43-61: LGTM! Import and error code additions are correct.

The import of Decimal and the addition of DecimalError to the IngressErrorCode enum are necessary for type checking support.


207-207: LGTM! Type signature correctly includes Decimal.

The addition of Decimal to the columns parameter type union in SenderTransaction.row enables proper type checking for decimal column values.


386-386: LGTM! Type signature correctly includes Decimal.

The addition of Decimal to the columns parameter type union in Buffer.row enables proper type checking for decimal column values.


407-456: LGTM! Documentation clearly illustrates Decimal usage.

The example usage and type mapping table additions help users understand:

  • How to pass Decimal values in the columns dict
  • The mapping from Python Decimal to ILP DECIMAL type
src/questdb/mpdecimal_compat.h (3)

21-35: Add runtime validation for struct layout assumptions.

The mpd_t and PyDecObject struct definitions assume a specific memory layout that matches CPython's internal implementation. If CPython changes these internals, this code will silently produce incorrect results or crash.

Consider adding runtime checks in the Cython initialization code (e.g., in mpdecimal_compat.pxd or module init) to verify:

  1. Size of Python Decimal objects matches expectations
  2. Basic sanity checks on extracted values (e.g., comparing against decimal module's official API)

Example validation approach:

# At module initialization
test_decimal = Decimal("123.45")
# Extract using compatibility layer
# Also extract using official decimal API
# Assert they match

This would catch breaking changes early rather than producing silent corruption.


37-44: LGTM! Accessor functions correctly handle inline vs heap storage.

The decimal_digits() function properly handles both storage modes:

  • Heap-allocated: uses dec->dec.data
  • Inline (small decimals): uses dec->data[4]

This matches CPython's optimization for small decimal values.


46-54: LGTM! Flag definitions match mpdecimal constants.

The flag enum and MPD_RADIX constant definitions are correct and consistent with libmpdec's public interface.

src/questdb/dataframe.pxi (11)

96-110: LGTM! Enum additions for decimal target are correct.

The addition of col_target_column_decimal = 9 and updating col_target_at = 10 maintains the enum sequence. The target name "decimal" in _TARGET_NAMES is consistent with other entries.


152-179: LGTM! Decimal source types cover all supported formats.

The five decimal source types provide comprehensive coverage:

  • col_source_decimal_pyobj: Python Decimal objects
  • col_source_decimal32/64/128/256_arrow: Arrow decimal types of different bit widths

The inclusion in _PYOBJ_SOURCE_DESCR enables clear error messages.


249-272: LGTM! Target-to-source mappings are complete.

The _TARGET_TO_SOURCES mapping correctly includes all five decimal source types for the col_target_column_decimal target. The addition to _FIELD_TARGETS ensures decimal columns are recognized as field columns.


397-406: LGTM! Dispatch codes follow established patterns.

The five dispatch codes combining col_target_column_decimal with each decimal source type enable efficient routing in the serialization switch statement. This follows the same pattern used for other column types.


427-432: LGTM! Scale field addition is well-documented.

The scale field in col_t correctly stores the decimal scale for Arrow types. The comment clearly indicates it's only used for Arrow decimals and defaults to 0. uint8_t is sufficient for the 0-76 scale range.


956-979: LGTM! Arrow decimal type resolution is comprehensive.

The _dataframe_series_resolve_arrow function correctly:

  • Handles all four Arrow decimal bit widths (32/64/128/256)
  • Validates scale is within the supported 0-76 range
  • Provides clear error messages with GitHub issue link
  • Stores the scale for later use in serialization

1046-1047: LGTM! Decimal detection in object columns is correct.

The isinstance(<object>obj, Decimal) check properly identifies Decimal objects in pandas object-dtype columns and sets the appropriate source type. This is consistent with other type detection logic in _dataframe_series_sniff_pyobj.


1163-1164: LGTM! ArrowDtype support enables Arrow-backed decimal columns.

The handling of _PANDAS.ArrowDtype by delegating to _dataframe_series_resolve_arrow correctly enables support for Arrow-backed columns in pandas, including Arrow decimal types.


1300-1315: LGTM! Formatting improves readability of bitwise operations.

The reformatting of the bitwise validity checks with explicit parentheses and line breaks makes the bit manipulation logic clearer without changing behavior.


2171-2210: LGTM! Python Decimal serialization handles all cases correctly.

The serialization function properly handles:

  • Null values (returns early)
  • Special values like NaN/Inf (sends as NULL to server)
  • Mantissa size validation (127-byte limit)
  • Error reporting via IngressError

The use of decimal_pyobj_to_binary encapsulates the complex decimal-to-binary conversion logic.


2456-2465: LGTM! Dispatch switch correctly routes all decimal types.

The five decimal dispatch cases properly route to their respective serialization functions. The GIL handling is correct:

  • decimal_pyobj doesn't pass gs (requires GIL)
  • Arrow variants pass gs (can release GIL)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant