-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix incorrect scan_parquet().with_row_index()
with non-zero slice or with streaming collect
#19609
Conversation
@@ -149,6 +149,7 @@ def test_meta_tree_format(namespace_files_path: Path) -> None: | |||
def test_meta_show_graph(namespace_files_path: Path) -> None: | |||
e = (pl.col("foo") * pl.col("bar")).sum().over(pl.col("ham")) / 2 | |||
dot = e.meta.show_graph(show=False, raw_output=True) | |||
assert dot is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by fix mypy lint on main
@@ -830,7 +833,7 @@ fn rg_to_dfs_par_over_rg( | |||
if let Some(rc) = &row_index { | |||
df.with_row_index_mut( | |||
rc.name.clone(), | |||
Some(row_count_start as IdxSize + rc.offset), | |||
Some(row_count_start as IdxSize + rc.offset + slice.0 as IdxSize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fn rg_to_dfs*
, we also add slice.0
(offset), as it may be non-zero in a negative-slice case
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19609 +/- ##
==========================================
- Coverage 79.84% 79.84% -0.01%
==========================================
Files 1536 1536
Lines 211405 211456 +51
Branches 2445 2445
==========================================
+ Hits 168790 168829 +39
- Misses 42060 42072 +12
Partials 555 555 ☔ View full report in Codecov by Sentry. |
…ice or with streaming collect (pola-rs#19609)
Fixes #19607
Fixes #19606
Issues affect queries of the form:
For the negative-slice case we need to adjust the
RowIndex
offset accordingly. If we see that we need to add a row index we cannot stop early during reversed metadata scan, but instead need to scan through the metadata of the entire list of files to figure out the correct offset to begin from.The other issue with streaming parquet was just forgetting to add the current row offset.