Skip to content

Conversation

asinghvi17
Copy link

@asinghvi17 asinghvi17 commented Oct 10, 2025

Adds a Julia implementation of the benchmarks, modeled off the GeoPandas code, using the JuliaGeo/GeometryOps.jl ecosystem.

@jiayuasu
Copy link
Member

great work!!!

@asinghvi17
Copy link
Author

I was trying to test this against the Python output, but it seems that the Python code no longer works - it throws errors about GeoSeries not existing. Maybe I got the wrong version of Geopandas? Would someone happen to know what I could do to fix?

@asinghvi17
Copy link
Author

asinghvi17 commented Oct 13, 2025

Also, I just noticed that these Parquet files are not GeoParquet. Would it make sense to make them GeoParquet files as well? Then it's a bit easier to load on the Julia side. I think it's just a bit of metadata changes for them (but @evetion is the expert here).

@jiayuasu
Copy link
Member

@prantogg please advise. thanks!

@jiayuasu
Copy link
Member

jiayuasu commented Oct 13, 2025

@asinghvi17 We intentionally avoided using GeoParquet in SpatialBench v0.1.0 for the following reasons:

  1. Parquet Geo type adoption: We decided to wait until the Parquet Geo type gains broader adoption. Our goal is to use the Parquet Geo type instead of GeoParquet 1.0 / 1.1. The Sedona team is one of the main driving forces behind GeoParquet / Parquet Geo, and we are currently implementing Parquet Geo support in multiple languages, including Rust. We prefer Parquet Geo because it supports both Geometry and Geography types, which provides better flexibility than GeoParquet.

  2. Avoiding spatial pruning effects: We intentionally skipped writing spatial statistics in v0.1.0 to avoid the impact of Parquet’s data pruning. The spatial ordering of features within files can significantly affect pruning performance, and we didn’t want this factor to influence benchmark comparability. Handling data sorting and spatial locality is planned for a future release.

Does that make sense?

@prantogg
Copy link
Contributor

I was trying to test this against the Python output, but it seems that the Python code no longer works - it throws errors about GeoSeries not existing. Maybe I got the wrong version of Geopandas? Would someone happen to know what I could do to fix?

@asinghvi17 Are you referring to the tools/generate_data.py script for generating SpatialBench data?
If so, you may need to run cargo install --path ./spatialbench-cli to make sure all relevant crates get installed before
running tools/generate_data.py --scale-factor 10 --mb-per-file 256 --output-dir sf10-parquet.

Let me know if that works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants