-
Notifications
You must be signed in to change notification settings - Fork 306
Preserve order for collections.OrderedDict
#1801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging #1801 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
src/input/input_python.rs
Outdated
} | ||
|
||
fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will need to fix strict_dict
too.
Probably rather than checking specifically for OrderedDict
you should return GenericPyMapping::Mapping
for all cases where it's a subclass of dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright I did that. For the strict_dict
this approach doesn't work since we explicitly check explicitly that an error is thrown if we hand over a Mapping
(see here), and it makes sense in strict mode to only allow dicts. To not break code (OrderedDicts were previously working, just didn't keep order), I added an explict check for OrderedDict
in the fashion I previously used for lax_dict
. Let me know what you think here
Thanks for the review I really appreciate it. Will look after the graalpy runs once I got that up and running properly on my system. Will also fix the linting errors. Would convert this PR to draft mode, to not disturb you on pushes and comments, but seems like you don't have this here. EDIT: will try to fix the performance regression as well. Since |
… into fix-12273-dict
Alright, so this is actually getting more difficult than I thought. The main point, is that graalpy has a bug when it comes to //lib.rs
use pyo3::prelude::*;
use pyo3::types::{PyMapping};
#[pyfunction]
fn iterate_as_mapping(_py: Python, obj: &Bound<'_, PyAny>) {
println!("[Rust] Received object: {:?}", obj);
let mapping = obj.downcast::<PyMapping>().unwrap();
let items = mapping.items().unwrap();
println!("[Rust] Got items from mapping.items(): {:?}", items);
}
#[pymodule]
fn test_graalpy(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(iterate_as_mapping, m)?)?;
Ok(())
} # dummy script to use the package
from collections import OrderedDict
import test_graalpy # name of the package
od = OrderedDict({'a': 1, 'b': 2})
od.move_to_end('a')
print(f"[Python] Keys order: {list(od.keys())}")
test_graalpy.iterate_as_mapping(od) This outputs: [Python] Keys order: ['b', 'a']
[Rust] Received object: OrderedDict({'b': 2, 'a': 1})
[Rust] Got items from mapping.items(): [('a', 1), ('b', 2)] What I found is that if we iterate in rust over the ordereddict before casting, order is preserved. So a dirty workaround would be to check if we have an ordereddict, if so, then create a new dict and iterate over the ordereddict and fill the new dict with its key-value pairs. I have a branch for this but it gets quite messy.
Would be very glad to get your input on this @davidhewitt |
src/input/input_python.rs
Outdated
|
||
fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> { | ||
if let Ok(dict) = self.downcast::<PyDict>() { | ||
if check_if_ordered_dict(self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhewitt when I implemented just downcasting to pymapping this resulted in huge performance dips (see here: https://codspeed.io/pydantic/pydantic-core/branches/CloseChoice%3Afix-12273-dict, commit 0e40c5c), therefore I went with this optimzed approach. I can simplify if desired though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do something like
if let Ok(dict) = self.downcast_exact() {
Ok(GenericPyMapping::Dict(dict))
} else if let Ok(mapping) = self.downcast() {
// i.e. treat all subclasses of dict as mappings
Ok(GenericPyMapping::Mapping))
}
@CloseChoice thanks for the contribution. I'm wondering if a more general solution fixing the general issue mentioned in pydantic/pydantic#12273 (comment) would make more sense. We are special casing Using {
│ 'type': 'model',
│ 'cls': <class '__main__.Model'>,
│ 'schema': {
│ │ 'type': 'model-fields',
│ │ 'fields': {
│ │ │ 'foo': {
│ │ │ │ 'type': 'model-field',
│ │ │ │ 'schema': {
│ │ │ │ │ 'type': 'lax-or-strict',
│ │ │ │ │ 'lax_schema': {
│ │ │ │ │ │ 'type': 'function-after',
│ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│ │ │ │ │ │ 'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│ │ │ │ │ },
│ │ │ │ │ 'strict_schema': {
│ │ │ │ │ │ 'type': 'chain',
│ │ │ │ │ │ 'steps': [
│ │ │ │ │ │ │ {
│ │ │ │ │ │ │ │ 'type': 'json-or-python',
│ │ │ │ │ │ │ │ 'json_schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False},
│ │ │ │ │ │ │ │ 'python_schema': {'type': 'is-instance', 'cls': <class 'collections.OrderedDict'>}
│ │ │ │ │ │ │ },
│ │ │ │ │ │ │ {
│ │ │ │ │ │ │ │ 'type': 'function-after',
│ │ │ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <class 'collections.OrderedDict'>},
│ │ │ │ │ │ │ │ 'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│ │ │ │ │ │ │ }
│ │ │ │ │ │ ]
│ │ │ │ │ },
│ │ │ │ │ 'serialization': {
│ │ │ │ │ │ 'type': 'function-wrap',
│ │ │ │ │ │ 'function': <function GenerateSchema._mapping_schema.<locals>.<lambda> at 0x7eb4b8fe1620>,
│ │ │ │ │ │ 'info_arg': False,
│ │ │ │ │ │ 'schema': {'type': 'dict', 'keys_schema': {'type': 'any'}, 'values_schema': {'type': 'any'}, 'strict': False}
│ │ │ │ │ }
│ │ │ │ },
│ │ │ │ 'metadata': {}
│ │ │ }
│ │ },
│ │ 'model_name': 'Model',
│ │ 'computed_fields': []
│ },
│ 'config': {'title': 'Model'},
│ 'ref': '__main__.Model:784545184',
│ 'metadata': {'<stripped>'}
} Presumably we could try to fix things here if we already hit Python code? (And that would avoid the extra check in core, that may affect performance). |
src/input/input_python.rs
Outdated
if self.is_exact_instance_of::<PyDict>() { | ||
Ok(GenericPyMapping::Dict(self.downcast::<PyDict>()?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB downcast
!= downcast_exact
, so this now forbids all subclasses of dict
which are not OrderedDict
. See my other comment.
src/input/input_python.rs
Outdated
|
||
fn lax_dict<'a>(&'a self) -> ValResult<GenericPyMapping<'a, 'py>> { | ||
if let Ok(dict) = self.downcast::<PyDict>() { | ||
if check_if_ordered_dict(self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do something like
if let Ok(dict) = self.downcast_exact() {
Ok(GenericPyMapping::Dict(dict))
} else if let Ok(mapping) = self.downcast() {
// i.e. treat all subclasses of dict as mappings
Ok(GenericPyMapping::Mapping))
}
assert exc_info.value.errors(include_url=False) == expected | ||
|
||
|
||
@pytest.mark.skipif( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that we skip here due to the bug reported here: #1801 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, does GraalPy show the bug in a pure Python repro? Otherwise this might imply a PyO3 bug on GraalPy (or a GraalPy C API issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
havent managed to recreate this in pure python. So true, this might be a bug in pyo3, but as I understand it, pyo3 is calling the c api mapping code and this is where it goes wrong, therefore I assumed this is ok graalpy's c api, the mapping protocol is not directly exposed in python, so it makes sense that this cannot be reproduced there.
thanks for the comment and the insights. With the help of @davidhewitt we managed to get rid of special casing |
Change Summary
Check explicitly for ordereddict types and keep the order. If we use a generic mapping, the order is preserved. So we use a
PyOnceLock
to extract the type and compare against it. I tested different solutions and went with the most optimized one performance wise. This comes at the cost of ~15 lines of code. This can be removed though, see this commit for the less optimized/less code version.Related issue number
fixes pydantic/pydantic#12273
Checklist
pydantic-core
(except for expected changes)