Skip to content

Conversation

arseny114
Copy link
Contributor

Six functions have been developed to extract information from the pages of the RUM index:

  1. rum_metapage_info() -- is used to examine the information that is placed on the meta page (flags: {meta}).
  2. rum_page_opaque_info() -- is used to examine information that is placed in the opaque area of the index page (any index page).
  3. rum_leaf_data_page_items() -- is used to examine the information that is placed on the leaf pages of the posting tree (flags: {leaf, data}).
  4. rum_internal_data_page_items() -- it is intended for viewing information that is located on the internal pages of the posting tree (flags {data}).
  5. rum_leaf_entry_page_items() -- it is intended for viewing information that is located on the leaf pages of the entry tree (flags {leaf}).
  6. rum_internal_entry_page_items() -- it is intended for viewing information that is located on the internal pages of the entry tree (flags {}).

To extract information, all these functions need to pass the index name and the page number.

Tags: rum

@arseny114 arseny114 changed the title Added functions for exploring the pages of the rum index. [PGPRO-12159] Added functions for exploring the pages of the rum index. May 13, 2025
@arseny114 arseny114 force-pushed the PGPRO-12159 branch 3 times, most recently from d13ae48 to 63ecfd5 Compare May 15, 2025 09:01
@arseny114 arseny114 force-pushed the PGPRO-12159 branch 6 times, most recently from e1ddf10 to 94bf9c1 Compare July 3, 2025 11:45
@arseny114 arseny114 force-pushed the PGPRO-12159 branch 4 times, most recently from f6af174 to fd282ce Compare July 23, 2025 13:53
@arseny114 arseny114 force-pushed the PGPRO-12159 branch 6 times, most recently from af94bb0 to ec20040 Compare September 23, 2025 07:32
arseny114 and others added 9 commits October 7, 2025 18:35
This commit adds three functions for low-level exploration
of the index's rum pages:

1) rum_metapage_info() -- is used to examine the information
posted on the meta page (flags: {meta}).
2) rum_page_opaque_info() -- is used to examine information
that is placed in the opaque area of the index page (any
index page).
3) rum_leaf_data_page_items() -- is used to examine the
information that is placed on the leaf pages of the
posting tree (flags: {leaf, data}).

To extract information, all these functions need to pass
the index name and the page number.

Tags: rum
1) rum_internal_data_page_items() - it is intended for viewing information
   that is located on the internal pages of the posting tree (flags {data}).
2) rum_leaf_entry_page_items() - it is intended for viewing information
   that is located on the leaf pages of the entry tree (flags {leaf}).
3) rum_internal_entry_page_items() - it is intended for viewing information
   that is located on the internal pages of the entry tree (flags {}).

To extract information, all these functions need to pass the index name
and the page number.

Tags: rum
If you create an index with the operator class rum_tsvector_ops,
the positions of the lexemes will be saved as additional information.
The positions are stored in compressed form in bytea.

There is a problem that is related to the fact that in the posting tree,
additional information for the senior keys is stored in a different way,
which is why it has not yet been possible to output it. For all other
cases, the output of additional information works correctly.

Tags: rum
If the index is created with the appropriate class of operators,
then in addition to the positions of the lexemes, weights (A, B, C, D)
are also stored in the additional information. Their output has been added.

In addition, Asserts have been added to the find_add_info_atr_num() and
find_add_info_oid() functions, which check that there is only one (or zero)
type of additional information in the index.

Tags: rum
The crashes were due to the fact that the construct_array_builtin()
function is not defined on versions below 16.

Tags: rum
The crashes were due to the fact that the errdetail_relkind_not_supported()
function is not defined on versions below 15.

Tags: rum
The crashes were due to the fact that the
MONEYOID is not defined on versions below 14.

Tags: rum
Added functions for checks. Added auxiliary
functions to reduce code duplication.

Tags: rum
Added an auxiliary universal function for scanning
leaf and internal pages of the Posting Tree.
Added an auxiliary function for reading the
high key from the pages of the Posting Tree.

Tags: rum
Arseny Kositsyn added 9 commits October 7, 2025 18:35
rum_debug_funcs tests added only for enterprise.

Tags: rum
Refactoring has been performed. The code is
divided into more logical blocks using added
functions and macros with clear names.

Tags: rum
Previously, it was assumed that the RUM index could not contain
attributes with different types of additional information (and some
functions from PGPRO-12159 relied on this), but it is possible. If
you create an index for two columns of the tsvector type and attach
another column to one of them, it turns out that one tsvector has
the attach attribute as additional information, and the second one
will have positions.

In order to fix this, the functions that find the attribute number
of the key for which the posting tree was built were redesigned.
Now they scan the index and look for the key (and its attribute
number) for the scanned page.

Tags: rum
The regression tests have been removed.

Tags: rum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant