Skip to content

Conversation

CloseChoice
Copy link

@CloseChoice CloseChoice commented Sep 30, 2025

Change Summary

Check explicitly for ordereddict types and keep the order. If we use a generic mapping, the order is preserved. So we use a PyOnceLock to extract the type and compare against it. I tested different solutions and went with the most optimized one performance wise. This comes at the cost of ~15 lines of code. This can be removed though, see this commit for the less optimized/less code version.

Related issue number

fixes pydantic/pydantic#12273

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Copy link

codecov bot commented Sep 30, 2025

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/input/input_python.rs 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

codspeed-hq bot commented Sep 30, 2025

CodSpeed Performance Report

Merging #1801 will not alter performance

Comparing CloseChoice:fix-12273-dict (939940d) with main (70bd6f9)

Summary

✅ 163 untouched

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

}

fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to fix strict_dict too.

Probably rather than checking specifically for OrderedDict you should return GenericPyMapping::Mapping for all cases where it's a subclass of dict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright I did that. For the strict_dict this approach doesn't work since we explicitly check explicitly that an error is thrown if we hand over a Mapping (see here), and it makes sense in strict mode to only allow dicts. To not break code (OrderedDicts were previously working, just didn't keep order), I added an explict check for OrderedDict in the fashion I previously used for lax_dict. Let me know what you think here

@CloseChoice
Copy link
Author

CloseChoice commented Oct 1, 2025

Thanks for the review I really appreciate it. Will look after the graalpy runs once I got that up and running properly on my system. Will also fix the linting errors. Would convert this PR to draft mode, to not disturb you on pushes and comments, but seems like you don't have this here.

EDIT: will try to fix the performance regression as well. Since bef11c6 involved a fix already, I think we can get this to work without performance degradation

@CloseChoice
Copy link
Author

CloseChoice commented Oct 2, 2025

Alright, so this is actually getting more difficult than I thought. The main point, is that graalpy has a bug when it comes to OrderedDict casted to PyMapping the order information gets lost. Here is an example:

//lib.rs
use pyo3::prelude::*;
use pyo3::types::{PyMapping};

#[pyfunction]
fn iterate_as_mapping(_py: Python, obj: &Bound<'_, PyAny>) {
    println!("[Rust] Received object: {:?}", obj);

    let mapping = obj.downcast::<PyMapping>().unwrap();

    let items = mapping.items().unwrap();
    println!("[Rust] Got items from mapping.items(): {:?}", items);
}

#[pymodule]
fn test_graalpy(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(iterate_as_mapping, m)?)?;
    Ok(())
}
# dummy script to use the package
from collections import OrderedDict
import test_graalpy  # name of the package

od = OrderedDict({'a': 1, 'b': 2})
od.move_to_end('a')
print(f"[Python] Keys order: {list(od.keys())}")

test_graalpy.iterate_as_mapping(od)

This outputs:

[Python] Keys order: ['b', 'a']
[Rust] Received object: OrderedDict({'b': 2, 'a': 1})
[Rust] Got items from mapping.items(): [('a', 1), ('b', 2)]

What I found is that if we iterate in rust over the ordereddict before casting, order is preserved. So a dirty workaround would be to check if we have an ordereddict, if so, then create a new dict and iterate over the ordereddict and fill the new dict with its key-value pairs. I have a branch for this but it gets quite messy.
There are two more options:

  • disable the affected tests for graalpy with a message indicating the bug
  • simply convert an ordereddict a dict in Python, why don't we go that route? That would make preserving the order quite trivial and now changes need to any rust code

Would be very glad to get your input on this @davidhewitt


fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> {
if let Ok(dict) = self.downcast::<PyDict>() {
if check_if_ordered_dict(self) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidhewitt when I implemented just downcasting to pymapping this resulted in huge performance dips (see here: https://codspeed.io/pydantic/pydantic-core/branches/CloseChoice%3Afix-12273-dict, commit 0e40c5c), therefore I went with this optimzed approach. I can simplify if desired though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should do something like

if let Ok(dict) = self.downcast_exact() {
    Ok(GenericPyMapping::Dict(dict))
} else if let Ok(mapping) = self.downcast() { 
    // i.e. treat all subclasses of dict as mappings
    Ok(GenericPyMapping::Mapping))
}

@Viicos
Copy link
Member

Viicos commented Oct 3, 2025

@CloseChoice thanks for the contribution. I'm wondering if a more general solution fixing the general issue mentioned in pydantic/pydantic#12273 (comment) would make more sense. We are special casing OrderedDict here, but I can see this happening in a number of other cases as well (e.g. some kind of user defined ordereddict).

Using OrderedDict also defines a different core schema, where Python validators are used. Here is the core schema of Model in the related issue:

{
│   'type': 'model',
│   'cls': <class '__main__.Model'>,
│   'schema': {
│   │   'type': 'model-fields',
│   │   'fields': {
│   │   │   'foo': {
│   │   │   │   'type': 'model-field',
│   │   │   │   'schema': {
│   │   │   │   │   'type': 'lax-or-strict',
│   │   │   │   │   'lax_schema': {
│   │   │   │   │   │   'type': 'function-after',
│   │   │   │   │   │   'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   },
│   │   │   │   │   'strict_schema': {
│   │   │   │   │   │   'type': 'chain',
│   │   │   │   │   │   'steps': [
│   │   │   │   │   │   │   {
│   │   │   │   │   │   │   │   'type': 'json-or-python',
│   │   │   │   │   │   │   │   'json_schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False},
│   │   │   │   │   │   │   │   'python_schema': {'type': 'is-instance', 'cls': <class 'collections.OrderedDict'>}
│   │   │   │   │   │   │   },
│   │   │   │   │   │   │   {
│   │   │   │   │   │   │   │   'type': 'function-after',
│   │   │   │   │   │   │   │   'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│   │   │   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   │   │   }
│   │   │   │   │   │   ]
│   │   │   │   │   },
│   │   │   │   │   'serialization': {
│   │   │   │   │   │   'type': 'function-wrap',
│   │   │   │   │   │   'function': <function GenerateSchema._mapping_schema.<locals>.<lambda> at 0x7eb4b8fe1620>,
│   │   │   │   │   │   'info_arg': False,
│   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   }
│   │   │   │   },
│   │   │   │   'metadata': {}
│   │   │   }
│   │   },
│   │   'model_name': 'Model',
│   │   'computed_fields': []
│   },
│   'config': {'title': 'Model'},
│   'ref': '__main__.Model:784545184',
│   'metadata': {'<stripped>'}
}

Presumably we could try to fix things here if we already hit Python code? (And that would avoid the extra check in core, that may affect performance).

@Viicos Viicos changed the title Fix 12273 dict Preserve order for collections.OrderedDict Oct 3, 2025
Comment on lines 427 to 428
if self.is_exact_instance_of::<PyDict>() {
Ok(GenericPyMapping::Dict(self.downcast::<PyDict>()?))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB downcast != downcast_exact, so this now forbids all subclasses of dict which are not OrderedDict. See my other comment.


fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> {
if let Ok(dict) = self.downcast::<PyDict>() {
if check_if_ordered_dict(self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should do something like

if let Ok(dict) = self.downcast_exact() {
    Ok(GenericPyMapping::Dict(dict))
} else if let Ok(mapping) = self.downcast() { 
    // i.e. treat all subclasses of dict as mappings
    Ok(GenericPyMapping::Mapping))
}

assert exc_info.value.errors(include_url=False) == expected


@pytest.mark.skipif(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that we skip here due to the bug reported here: #1801 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting, does GraalPy show the bug in a pure Python repro? Otherwise this might imply a PyO3 bug on GraalPy (or a GraalPy C API issue).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

havent managed to recreate this in pure python. So true, this might be a bug in pyo3, but as I understand it, pyo3 is calling the c api mapping code and this is where it goes wrong, therefore I assumed this is ok graalpy's c api, the mapping protocol is not directly exposed in python, so it makes sense that this cannot be reproduced there.

@CloseChoice
Copy link
Author

@CloseChoice thanks for the contribution. I'm wondering if a more general solution fixing the general issue mentioned in pydantic/pydantic#12273 (comment) would make more sense. We are special casing OrderedDict here, but I can see this happening in a number of other cases as well (e.g. some kind of user defined ordereddict).

Using OrderedDict also defines a different core schema, where Python validators are used. Here is the core schema of Model in the related issue:

{
│   'type': 'model',
│   'cls': <class '__main__.Model'>,
│   'schema': {
│   │   'type': 'model-fields',
│   │   'fields': {
│   │   │   'foo': {
│   │   │   │   'type': 'model-field',
│   │   │   │   'schema': {
│   │   │   │   │   'type': 'lax-or-strict',
│   │   │   │   │   'lax_schema': {
│   │   │   │   │   │   'type': 'function-after',
│   │   │   │   │   │   'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   },
│   │   │   │   │   'strict_schema': {
│   │   │   │   │   │   'type': 'chain',
│   │   │   │   │   │   'steps': [
│   │   │   │   │   │   │   {
│   │   │   │   │   │   │   │   'type': 'json-or-python',
│   │   │   │   │   │   │   │   'json_schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False},
│   │   │   │   │   │   │   │   'python_schema': {'type': 'is-instance', 'cls': <class 'collections.OrderedDict'>}
│   │   │   │   │   │   │   },
│   │   │   │   │   │   │   {
│   │   │   │   │   │   │   │   'type': 'function-after',
│   │   │   │   │   │   │   │   'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│   │   │   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   │   │   }
│   │   │   │   │   │   ]
│   │   │   │   │   },
│   │   │   │   │   'serialization': {
│   │   │   │   │   │   'type': 'function-wrap',
│   │   │   │   │   │   'function': <function GenerateSchema._mapping_schema.<locals>.<lambda> at 0x7eb4b8fe1620>,
│   │   │   │   │   │   'info_arg': False,
│   │   │   │   │   │   'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│   │   │   │   │   }
│   │   │   │   },
│   │   │   │   'metadata': {}
│   │   │   }
│   │   },
│   │   'model_name': 'Model',
│   │   'computed_fields': []
│   },
│   'config': {'title': 'Model'},
│   'ref': '__main__.Model:784545184',
│   'metadata': {'<stripped>'}
}

Presumably we could try to fix things here if we already hit Python code? (And that would avoid the extra check in core, that may affect performance).

thanks for the comment and the insights. With the help of @davidhewitt we managed to get rid of special casing OrderedDict here. I hope this lifts your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Key order not conserved in OrderedDicts
3 participants