-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix panic with empty delta scan, or empty parquet scan with a provided schema #19884
fix: Fix panic with empty delta scan, or empty parquet scan with a provided schema #19884
Conversation
!expanded_paths.is_empty() && (paths[0].as_ref() != expanded_paths[0].as_ref()) | ||
// For cloud paths, we determine that the input path isn't a file by checking that the | ||
// output path differs. | ||
expanded_paths.is_empty() || (paths[0].as_ref() != expanded_paths[0].as_ref()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this to also cover the empty directory case
{ | ||
polars_bail!( | ||
ComputeError: | ||
"a hive schema was given but hive_partitioning was disabled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error if someone accidentally does scan_parquet(..., hive_schema={...}, hive_partitioning=False)
file_options.hive_options.hive_start_idx = hive_start_idx; | ||
|
||
Ok(Self::Paths(expanded_paths)) | ||
}, | ||
v => { | ||
file_options.hive_options.enabled = Some(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't override this here
if file_options.hive_options.enabled.is_none() | ||
&& expanded_from_single_directory(paths, expanded_paths.as_ref()) | ||
{ | ||
file_options.hive_options.enabled = Some(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does the same thing (only overwriting if the existing value is None), the old code did hive_enabled.unwrap_or_else
which made it a bit hard to see
tmp_path.mkdir(exist_ok=True) | ||
path = str(tmp_path) | ||
df = pl.DataFrame({}, [("p", pl.Int64)]) | ||
df.write_delta(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ion-elgreco , can you check this one? I get an error writing empty tables on deltalake 0.21.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should create empty tables with something like this Deltatable.create(path, schema=df.to_arroe().schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - thanks for the tip!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #19884 +/- ##
===========================================
+ Coverage 59.38% 79.43% +20.05%
===========================================
Files 1554 1554
Lines 215612 215611 -1
Branches 2452 2452
===========================================
+ Hits 128035 171280 +43245
+ Misses 87019 43773 -43246
Partials 558 558 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
3eb93bb
to
a31039f
Compare
Thanks for putting in the effort @nameexhaustion! Let me know if I can test some stuff to ensure this resolves #19890 e.g. building from source and testing locally |
Fixes #19876
Fixes #19854
Fixes #19890
For the case when
scan_parquet
is done on an empty directory andschema
was given, we should return an empty DataFrame. Currently this case isn't accounted for in some areas of the code causing some errors / panics.Also fixes a mypy lint issue due to a new pydantic release