Releases: pola-rs/polars
Python Polars 1.16.0
🚀 Performance improvements
✨ Enhancements
- Enable creation of independently reusable
Config
instances (#20053) - Improved error message on invalid Python
Enum
init (#20060) - Improve Polars
Enum
dtype init from standard Python enums (#19997) - Add optimized row encoding for Decimals (#20050)
- Add
drop_nans
method to DataFrame and LazyFrame (#20029)
🐞 Bug fixes
- Improve
hist
binning around breakpoints (#20054) - Fix invalid len due to projection pushdown selection of scalar (#20049)
- Fix empty scalar agg type (#20051)
- Improve binning in
Series.hist
withbin_count
when all values are the same (#20034) - Less intrusive forking warnings (#20032)
- Reading nullable sliced / masked Categoricals from Parquet (#20024)
- Regression in
hist
panicking on out of bounds index (#20016) - Fix starts_with out of bounds (#20006)
- Fix incorrect column order for parquet scan with hive columns in file (#19996)
- Incorrectly gave
list.len()
for masked-out rows (#19999) - Bug fix in existing fast path for sorted series (#20004)
- Incorrect
collect_schema()
forfill_null()
after an aggregation expression in group-by context (#19993) - Fix
row_by_key
typing (#19888)
📖 Documentation
📦 Build system
- Pin maturin (#20063)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @gab23r, @lukemanley, @mcrumiller, @nameexhaustion, @ritchie46, @siddharth-vi, @stijnherfst and @stinodego
Python Polars 1.15.0
🚀 Performance improvements
- Reduce the size of row encoding UTF-8 (#19911)
- Memoize duplicates in rolling-gb-dyn (#19939)
- More efficient row encoding for
pl.List
(#19907) - Half the size of Booleans in row encoding (#19927)
- Rolling 'iter_lookbehind' breeze through duplicates (#19922)
- Initially trim leading and trailing filtered rows (#19850)
✨ Enhancements
- Catch use of 'polars' in
to_string
for non-Duration dtypes and raise an informative error (#19977) - Add AhoCorasick backed 'find_many' (#19952)
- Allow Python Enums as dtype inputs (#19926)
- Speed up starts_with for small prefixes (#19904)
- Auto-enable hive partitioning if hive_schema was given (#19902)
- Add
pl.concat_arr
to concatenate columns into an Array column (#19881) - Support both "iso" and "iso:strict" format options for
dt.to_string
(#19840) - Add rounding for Decimal type (#19760)
- Improved array arithmetic support (#19837)
🐞 Bug fixes
- Fix Decimal type fill_null (#19981)
- Fix panic on schema merge for prefiltering (#19972)
- Fix lazy frame join expression (#19974)
- Fix
gather_every
forScalar
(#19964) - Toggle 'fast_unique' on new_from_index (#19956)
- Parse uppercase config keys (#19852)
- Raise proper error message when too small interval is passed to datetime_range (#19955)
- Fix scalar object (#19940)
- Raise InvalidOperationError for invalid float to decimal casts (e.g. Inf, NaN) (#19938)
- Address indexing edge-case with
numpy
arrays (#19895) - Fix panic with combination of hive and parquet prefiltering (#19905)
- Fix panic when joining with empty frame (debug only) (#19896)
- Fix incorrect result from inequality filter after join on LazyFrame (#19898)
- Misleading
ShapeError
error message on dataframe creation (#19901) - Fix panic with empty delta scan, or empty parquet scan with a provided schema (#19884)
- Ensure type object of inputs for cached any-value conversion functions are kept alive (#19866)
- Improve export from 2D Array dtype columns to PyTorch Tensors (
to_torch
) and Jax Arrays (to_jax
) (#19862) - Fix panic using
scan_parquet().with_row_index()
with hive partitioning enabled (#19865) - Improve histogram bin logic (#18761)
- Raise informative error instead of panicking for list arithmetic on some invalid dtypes (#19841)
- Properly handle Zero-Field Structs in row encoding (#19846)
- Incorrect explode schema for
LazyFrame.explode()
(#19860) - DataFrame
rows_by_key
returning key tuples with elements in wrong order (#19486) - Ensure
List
element truncation ellipses respectASCII*
table formats (#19835)
📖 Documentation
- Remove duplicate sentence in
Series.bottom_k
docstring (#19947) - Complete parameters description and add an example for
clip()
(#19875) - Fix some warnings during docs build (#19848)
📦 Build system
- Use public windows runners in python release (#19982)
- Add windows-aarch64 to python binaries (#19966)
🛠️ Other improvements
- Minor non-breaking space (
) tweak for HTML rendering (#19864) - Implement nested row encoding / decoding (#19874)
- Switch back to PyO3 0.22 (#19851)
- Adjust flaky
with_columns
test (#19844) - Add proper tests for row encoding (#19843)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @barak1412, @coastalwhite, @etiennebacher, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @mhogervo, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @stinodego
Python Polars 1.14.0
🚀 Performance improvements
- Increase default async thread count for low core count systems (#19829)
- Move row group decode off async thread for local streaming parquet scan (#19828)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
✨ Enhancements
- Raise informative error on Unknown unnest (#19830)
- Support DataFrame init from raw SQLAlchemy rows (#19820)
- Support use of Duration in
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697) - Add an
is_literal
method to expressionmeta
namespace (#19773) - A different approach to warning users of fork() issues with Polars (#19197)
🐞 Bug fixes
- Fix
read_database(…,iter_batches=True)
type annotations (#19832) - Validate subnodes in validate IR (#19831)
- Raise if merge non-global categoricals in unpivot (#19826)
- Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
- Don't panic if column not found (#19824)
- Fix gather of Scalar null + idx w/ validity (#19823)
- Replace _kwargs in collect method (#19618)
- Fix object chunked gather (#19811)
- Fix filter scalar nulls (#19786)
- Replace spaces with to support showing multiple spaces in HTML repr (#19783)
- Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
- Respect schema_overrides in batched csv reader (#19755)
- Fix scanning google cloud with service account credentials file (#19782)
- Release the GIL in Python APIs, part 2 of 2 (#19762)
- Fix incorrect filter after right-join on LazyFrame (#19775)
- Fix incorrect lazy schema for explode on array columns (#19776)
- Fixed typo in file lazy.py (#19769)
📖 Documentation
- Update bokeh to use cdn to avoid Bokeh Error (#19788)
- Change dprint config (#19747)
- Mention
row_by_keys
in theto_dict
documentation (#19767) - Fix link to Graphviz download (#19791)
🛠️ Other improvements
- Add ToField context for common args (#19833)
- Use polars parquet reader for delta scan (#19103)
- Migrate polars-expr AggregationContext to use
Column
(#19736)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @YichiZhang0613, @alexander-beedie, @braaannigan, @coastalwhite, @engylemure, @gab23r, @iliya-malecki, @ion-elgreco, @itamarst, @jackxxu, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao and @sn0rkmaiden
Python Polars 1.13.1
✨ Enhancements
- Add IPC source node for new streaming engine (#19454)
🐞 Bug fixes
- Release GIL in Python APIs, part 1 (#19705)
- Fix incorrect lazy schema for aggregations (#19753)
- Address incorrect
selector & col
expansion (#19742)
📖 Documentation
- Fix formatting of nested list (#19746)
- Add
meta.is_column
to API docs (#19744) - Fix join API reference links (#19745)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @etiennebacher, @itamarst, @nameexhaustion, @orlp, @ritchie46 and @rodrigogiraoserrao
Python Polars 1.13.0
🚀 Performance improvements
- Improve
DataFrame.sort().limit/top_k
performance (#19731) - Improve cloud scan performance (#19728)
- Fix quadratic 'with_columns' behavior (#19701)
- Improve hive partition pruning with datetime predicates from SQL (#19680)
- Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
- Reorder conditions in is_leap_year (#19602)
- Rechunk in DataFrame.rows if needed (#19628)
- Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
- Use faster iteration in 'starts_with'/'ends_with' (#19583)
- Branchless Parquet Prefiltering (#19190)
- Reduce size of IdxVec from 24 -> 16 bytes (#19550)
✨ Enhancements
- Try to support native SAP HANA driver via
read_database
(#19733) - Implement max/min methods for dtypes (#19494)
- Improve
n_chunks
typing (#19727) - Improve hive partition pruning with datetime predicates from SQL (#19680)
- Identify inefficient use of Python string
removeprefix
,removesuffix
, andzfill
inmap_elements
(#19672) - Automatically use boto3 / google-auth if installed when scanning cloud (#19677)
- Identify inefficient use of Python string
replace
inmap_elements
(#19668) - Parallel IPC sink for the new streaming engine (#19622)
- Add SQL support for
RIGHT JOIN
, fix an issue with wildcard aliasing (#19626) - Add show_graph to display a GraphViz plot for expressions (#19365)
- Streamline use of predicates connected by
&
with IEJoin (join_where
) (#19552) - Support use of
is_between
range predicate with IEJoin operations (join_where
) (#19547)
🐞 Bug fixes
- Use
cls
forto_python
(#19726) - Fix validation for inner and left join when join_nulls unflaged (#19698)
- SQL
ELSE
clause should be implicitlyNULL
when omitted (#19714) - Improve
n_chunks
typing (#19727) - Ensure
NoDataError
raised consistently between engines for Excel reads (#19712) - In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
- Only allow
list.to_struct
to be elementwise when width is fixed (#19688) - Make Array arithmetic ops fully elementwise (#19682)
- Address inconsistency with use of Python types in frame-level
cast
(#19657) - Update line-splitting logic in batched CSV reader (#19508)
- Fix incorrect lazy schema for
explode()
inagg()
(#19629) - Fix fill null types (#19656)
- Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
- Fix typing for SchemaDefinition (#19647)
- Ensure
mean_horizontal
raises on non-numeric input (#19648) - Reorder conditions in is_leap_year (#19602)
- Copy height in .vstack() for empty dataframes (#19641) (#19642)
- Correct wildcard and input expansion for some more functions (#19588)
- Allow
.struct.with_fields
insidelist.eval
(#19617) - Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
- Fix incorrect
scan_parquet().with_row_index()
with non-zero slice or with streaming collect (#19609) - Fix mask and validity confusion in Parquet String decoding (#19614)
- Parquet decoding of nested dictionary values (#19605)
- Do not attempt to load default credentials when
credential_provider
is given (#19589) - Fix gather len in group-by state (#19586)
- Added input validation for
explode
operation in the array namespace (#19163) - Improve error message (#19546)
- Fix predicate pushdown into inequality joins (#19582)
- Correct categorical namespace error message (#19558)
- Fix performance regression for sort/gather on list/array columns (#19564)
- Ignore quoted newlines when skipping lines in CSV (#19543)
- Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
- Make Duration parsing fallible and not panic (#19490)
📖 Documentation
- Revise and rework user-guide/expressions (#19360)
- Update Excel page of user guide to refer to fastexcel as the default engine (#19691)
- Alter examples for round_sig_figs to make behaviour clearer (#19667)
- Assorted fixes to Rust API docs (#19664)
- Improve
replace
andreplace_all
docstring explanation of the "$" character with reference to capture groups (vs use as a literal) (#19529) - Add credential provider section and examples to user guide (#19487)
- Fix various instances of repeated words in docs and comments (#19516)
📦 Build system
- Bump Rust toolchain to
nightly-2024-10-28
(#19492)
🛠️ Other improvements
- Remove unused Excel code (#19710)
- Use
Column
for the{try,}_apply_columns{_par,}
functions onDataFrame
(#19683) - Remove more
@scalar-opt
(#19666) - Move Series bitops to
std::ops::Bit...
(#19673) - Mark test_parquet.py test_dict_slices as slow (#19675)
- Get
Column
intopolars-expr
(#19660) - Streamline internal SQL join condition processing (#19658)
- Factor out logic for re-use by new streaming CSV source (#19637)
- Configure grouped Dependabot updates (#19604)
- Fix PyO3 error in CI (#19545)
- Update nightly compiler version (#19590)
- Added input validation for
explode
operation in the array namespace (#19163) - Fix lint (#19584)
- Add a
Column::Partitioned
variant (#19557) - Move to fast-float2 (#19578)
- Only run remote bench on rust changes (#19581)
- Remove unsafe *_release functions (#19554)
- Fix
test_rolling_by_integer
not using parameterized dtype (#19555) - Add
mindebug-dev
rust profile (#19524) - Add CI step to process benchmark results (#19530)
- Add CI benchmark on merge (#19518)
- Skip client check with env var (#19517)
- Improve makefile build commands (#19498)
Thank you to all our contributors for making this release possible!
@3tilley, @HansBambel, @MarcoGorelli, @alexander-beedie, @barak1412, @braaannigan, @cmdlineluser, @coastalwhite, @corwinjoy, @dependabot, @dependabot[bot], @eitsupi, @janpipek, @jqnatividad, @letkemann, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego and @wence-
Rust Polars 0.44.2
🚀 Performance improvements
- Reduce size of IdxVec from 24 -> 16 bytes (#19550)
✨ Enhancements
- Streamline use of predicates connected by
&
with IEJoin (join_where
) (#19552) - Support use of
is_between
range predicate with IEJoin operations (join_where
) (#19547)
🐞 Bug fixes
- Correct categorical namespace error message (#19558)
- Fix performance regression for sort/gather on list/array columns (#19564)
- Ignore quoted newlines when skipping lines in CSV (#19543)
🛠️ Other improvements
- Remove ad-hoc buffer pool (#19553)
- Remove SyncCounter (#19556)
- Removed unnecessary flatten function (#19551)
- Remove unsafe *_release functions (#19554)
- Improve new-streaming groupby performance for high cardinality (#19537)
- Add
mindebug-dev
rust profile (#19524) - Add CI step to process benchmark results (#19530)
Thank you to all our contributors for making this release possible!
@HansBambel, @alexander-beedie, @barak1412, @coastalwhite, @nameexhaustion, @orlp and @ritchie46
Rust Polars 0.44.1
🐞 Bug fixes
- Incorrect gather for FixedSizeList with outer validity but no inner validities (#19489)
- Make Duration parsing fallible and not panic (#19490)
📖 Documentation
- Fix various instances of repeated words in docs and comments (#19516)
📦 Build system
🛠️ Other improvements
- Add CI benchmark on merge (#19518)
- Skip client check with env var (#19517)
- Rename ComputeNode::spawn parameters (#19514)
- Enable new_streaming feature by default (#19502)
- Add groupby partitioning and parallel groupby finishing to new-streaming engine (#19451)
- Improve makefile build commands (#19498)
- Reduce Vec allocations in new-streaming parquet source (#19493)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @nameexhaustion, @orlp, @ritchie46 and @stinodego
Rust Polars 0.44.0
💥 Breaking changes
- Purge arrow-rs support (#19312)
🚀 Performance improvements
- Address inadvertent quadratic behaviour in
expand_columns
(#19469) - Move rolling_corr/cov to an actual implementation on Series (#19466)
- Don't split par if cast to categorical (#19462)
- Improve var/cov/corr performance (#19381)
- Reduce memcopy in parquet (#19350)
- Optimize array and list gather (#19327)
- Add/fix unordered row decode, change unordered format (#19284)
- Fast decision for Parquet dictionary encoding (#19256)
- Make date_range / datetime_range ~10x faster for constant durations (#19216)
- Batch utf8-validation in csv
18%
/25%
on 1.9.0 (#19124) - Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17%
(#19088) - Use List's TotalEqKernel (#18984)
- Improve rename performace for Lazy API (#18890)
- Collapse cross-joins to faster joins (#18633)
- Cache register plugin function (#18860)
✨ Enhancements
- Implement nested Parquet writing for High-Precision Decimals (#19476)
- Improve
read_database
typing (#19444) - Add IPC sink in new streaming engine (#19431)
- Added
escape_regex
operation to thestr
namespace and as a global function (#19257) - Add SQL support for
bit_count
and bitwise&
,|
, andxor
operators (#19114) - Add credential provider utility classes for AWS, GCP (#19297)
- Support decoding Float16 in Parquet (#19278)
- Experimental
credential_provider
argument forscan_parquet
(#19271) - Allow DeltaTable input to scan_delta and read_delta (#19229)
- Make FlightConsumer Send and support compressed data (#19262)
- New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
- Conserve Parquet
SortingColumns
for ints (#19251) - Low level flight interface (#19239)
- Improved list arithmetic support (#19162)
- Expose LTS CPU in show_versions() (#19193)
- Check Python version when deserializing UDFs (#19175)
- Quantile function in SQL (#18047)
- Improve scalar strict message (#19117)
- Add Series::{first, last, approx_n_unique} (#19093)
- Allow for rolling_*_by to use index count as window (#19071)
- Delay deserialization of python function until physical plan (#19069)
- Add cum(_min/_max) for pl.Boolean (#19061)
- Bitwise operations / aggregations (#18994)
- Improved error message DSL -> IR resolving (#19032)
- Add
strict
param to eager/lazy frame "rename" (#19017) - Support
schema
arg inread/scan_parquet()
(#19013) - Add
allow_missing_columns
option toread/scan_parquet
(#18922) - Use FFI to extract Series from different Polars binaries (#18964)
- Allow for zero-width fixed size lists (#18940)
- Improve scalar strict message (#18904)
- Support arithmetic between Series with dtype list (#17823)
- Relaxed schema alignment for parquet file list read (#18803)
- Always preserve sorted flag for .dt.date (#18692)
- Implement single inequality joins for join_where (#18727)
🐞 Bug fixes
- Include Array in
to_physical
(#19474) - Don't panic in SQL temporal string check; raise suitable
ColumnNotFound
error (#19473) - Properly raise on mean_horizontal with wrong dtypes (#19472)
- Make output dtype known for
list.to_struct
whenfields
are passed (#19439) - Address inadvertent quadratic behaviour in
expand_columns
(#19469) - Ensure sorted flag is unset after Int->String cast (#19470)
- Fix row_index of batched reader (#19465)
- Fix perfect groupby (#19461)
- Correct wildcard expansion for functions (#19449)
- Ensure struct
eq/ne_missing
also compares outer validity (#19443) - Fix incorrect reverse on struct containing NULLs (#19446)
- Faulty
escape_regex
example (#19440) - Capture groups should be ignored in replace when literal=True (#19413)
- Fix
ColumnNotFound
when usingpl.element()
insidelist.eval
(#19438) - Updates error message in csv parser to recommend schema_overrides instead of deprecated dtypes argument (#19416)
- Incorrect
.join(..., how="left").head(N)
ifN <= left_df.height()
and there are duplicate matches (#19422) - Support Array type in more DataType methods (#19427)
- Bug in group_tuples_perfect, tail was not processed properly (#19417)
- Ensure that
ASCII*
table formats do not use the UTF8 ellipsis char when truncating rows/cols/values (#19404) - Allow .get(null) in groupby context (#19401)
- Fix
include_file_paths
andwith_row_index
for streaming CSV scan (#19394) - Flaky parametric parquet test (#19393)
- Raise on data mismatch in
str.json_decode
(#19347) - Fix unsoundness in group_tuples_perfect (#19359)
- Ensure Python version matches version used to serialize credential provider (#19375)
- Capture groups should be ignored in replace_all when literal=True (#19366)
- Ignore Parquet
is_{min,max}_value_exact
when set totrue
(#19344) - Projection pushdown was ignored by
include_file_paths
(#19341) - Don't produce duplicate column names in Series.to_dummies (#19326)
- Use of
HAVING
outside ofGROUP BY
should raise a suitable SQLSyntaxError (#19320) - Fix empty array gather (#19316)
- Merge categorical rev-map in
unpivot
(#19313) - DataFrame descending sorting by single list element (#19233)
- Fix cse union schema (#19305)
- Correctly load Parquet statistics for f16 (#19296)
- Error on invalid query (#19303)
- Fix enum scalar output (#19301)
- Fix list gather invalid fast path (#19299)
- Fix quoting style of decimal csv output (#19298)
- Don't vertically parallelize literal select (#19295)
- Fix struct reshape fast path (#19294)
- Also split on forward slashes during hive path inference on Windows (#19282)
- Don't cse
as_struct
(#19280) - Only apply string parsing to String dtype (#19222)
- Compilation error missing use JsonLineReader (#19244)
- Don't remember Parquet statistics if filtered (#19248)
- Do not check dtypes of non-projected columns for parquet (#19254)
- Parquet predicate pushdown for
lit(_) !=
(#19246) - Use all chunks in
Series
from arrow struct (#19218) - Implement is_nested_null for Null Array (#19219)
- Fix struct literals (#19214)
- Plotting was not interacting well with Altair schema wrappers (#19213)
- Fixing infer_schema for DataType::Null (#19201)
- Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
- Don't unwrap() expansion (#19196)
- Properly handle non-nullable nested Parquet (#19192)
- Fix invalid list collection in expression engine (#19191)
- Implement to_arrow functionality properly for Arrays (#19077)
- Fix incorrect
(eq|ne)_missing
on List/Array types (#19155) - Properly broadcast Struct when then validity (#19148)
- Allow partial name overlap in join_where resolution (#19128)
- Fix floordiv / modulo with scalar 0 on LHS (#19143)
- Ensure aligned chunks in OOC sort (#19118)
- Recursively align when converting to ArrowArray (#19097)
- Raise on invalid shape of shape 1, empty combination (#19113)
- Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17%
(#19088) - Allow converting
DatetimeOwned
toChunkedArray
(#19094) - Throw proper error for empty char params in scan_csv (#19100)
- Ensure parquet
schema
arg is propagated to IR (#19084) - Only rewrite numeric ineq joins (#19083)
- Check validity of columns of keys/aggs in dsl->ir (#19082)
- Bitwise aggregations should ignore null values (#19067)
- Remove failing datetime subclass test (#19068)
- Fix ser/de PlSmallStr error (#19060)
- Remove failing temporal lit tests (#19056)
- Divide-by-zero in OOC sort (#19048)
- Ensure
must_flush
flag is not reset (#19046) - Error node should be on top (#19045)
- Force nested struct
missing
equality (#19031) - Fix invalid alias udf (#19021)
- Raise invalid predicate join_where (#19020)
- Fix nested flag of functions with multiple arguments (#19016)
- Fix projection pushdown bug in IEJOINS (#19015)
- Separate temporal tests (#19012)
- Return the truth values of
ne_missing
andeq_missing
operations for struct instead ofnull
(#18930) - Fix struct broadcasting comparisons (#19003)
- Wrong result on
when().then().otherwise()
on struct when both result are broadcast (#19000) - Improve literals for temporal subclasses (#18998)
- Ensure same fmt in Series/AnyValue to string cast (#18982)
- Return correct value for
when().then().else()
on structs when usingfirst()
\last()
(#18969) - IPC don't write variadic_buffer_counts in blocks, but only dictionaries (#18980)
- Respect allow_threading in TernaryExpr (#18977)
- Make join test order-agnostic (#18975)
- Window function had incorrect output name on ExprIR (#18970)
- Fix
lit().shrink_dtype()
broadcasting (#18958) - Parallel evaluation of
cumulative_eval
(#18959) - Properly implement AnyValue::Binary
into_py
(#18960) - Fix
Expr.over
withorder_by
did not take effect if group keys were sorted (#18947) - Properly fetch type of full None List Series (#18916)
- Incorrect mode for sorted input (#18945)
- Properly choose inner physical type for Array (#18942)
- Disable very old date in timezone test for CI (#18935)
- Infer reshape dims when determining schema (#18923)
- Incorrect broadcasting on list-of-string set ops (#18918)
- Adding
with_row_index()
to previously collected lazy scan does not take effect (#18913) - Properly zip struct validities (#18886)
- Ensure ListPrimitiveBuilder dtype invariant is asserted (#18889)
- Out-of-bounds gather in categorical->int cast (#18897)
- AnyValue Series from Categorical/Enum (#18893)
- Properly cast AnyValue string (#18888)
- Fix SO in json inference (#18887)
- Use proper thread pool in cumulative_eval (#18885)
- Properly calculate duration units (#18869)
- Check values in strict cast Int to Time (#18854)
- Fix typo in DuplicateError error message (#18855)
- Properly merge live- and dead columns in prefiltered (#18862)
- DataFrame plot was raising when some extra keywords were pas...
Python Polars 1.12.0
⚠️ Deprecations
- Make some parameters of
dt.add_business_days
keyword-only (#19428)
🚀 Performance improvements
- Address inadvertent quadratic behaviour in
expand_columns
(#19469) - Move rolling_corr/cov to an actual implementation on Series (#19466)
- Don't split par if cast to categorical (#19462)
✨ Enhancements
- Implement nested Parquet writing for High-Precision Decimals (#19476)
- Improve
read_database
typing (#19444) - Respect
include_index
for pandas series (#19453) - Add
credential_provider
argument to more read functions (#19421) - Add IPC sink in new streaming engine (#19431)
- Support querying specific snapshot by id in
scan_iceberg
(#19388)
🐞 Bug fixes
- Include Array in
to_physical
(#19474) - Don't panic in SQL temporal string check; raise suitable
ColumnNotFound
error (#19473) - Properly raise on mean_horizontal with wrong dtypes (#19472)
- Make output dtype known for
list.to_struct
whenfields
are passed (#19439) - Address inadvertent quadratic behaviour in
expand_columns
(#19469) - Ensure sorted flag is unset after Int->String cast (#19470)
- Fix row_index of batched reader (#19465)
- Fix perfect groupby (#19461)
- Correct wildcard expansion for functions (#19449)
- Ensure struct
eq/ne_missing
also compares outer validity (#19443) - Fix incorrect reverse on struct containing NULLs (#19446)
- Faulty
escape_regex
example (#19440) - Capture groups should be ignored in replace when literal=True (#19413)
- Fix
ColumnNotFound
when usingpl.element()
insidelist.eval
(#19438) - Updates error message in csv parser to recommend schema_overrides instead of deprecated dtypes argument (#19416)
- Incorrect
.join(..., how="left").head(N)
ifN <= left_df.height()
and there are duplicate matches (#19422) - Support Array type in more DataType methods (#19427)
- Bug in group_tuples_perfect, tail was not processed properly (#19417)
- Ensure that
ASCII*
table formats do not use the UTF8 ellipsis char when truncating rows/cols/values (#19404)
📖 Documentation
- Fix docstrings for ATAN2 and ATAN2D SQL functions (#19351)
🛠️ Other improvements
- Undo conflicting fix (#19463)
- Simplify rust side of
datetime
(#19459) - Add tests for data mismatch on
read_json
(#19425) - Remove code in
examples
folder in favor of the user guide (#19430)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @cmdlineluser, @coastalwhite, @corleyma, @corwinjoy, @dvillaveces, @eitsupi, @gab23r, @janscholten, @nameexhaustion, @orlp, @ritchie46, @siddharth-vi, @stinodego and @wakabame
Python Polars 1.11.0
🚀 Performance improvements
- Improve var/cov/corr performance (#19381)
- Reduce memcopy in parquet (#19350)
- Optimize array and list gather (#19327)
✨ Enhancements
- Various
Schema
improvements (equality/init dtype checks) (#19379) - AssumeRole support for AWS Credential Provider (#19346)
- Added
escape_regex
operation to thestr
namespace and as a global function (#19257) - Improve
read_database_uri
typing (#19334)
🐞 Bug fixes
- Allow .get(null) in groupby context (#19401)
- Fix
include_file_paths
andwith_row_index
for streaming CSV scan (#19394) - Flaky parametric parquet test (#19393)
- Release GIL in
gather_with_series()
and friend (#19383) - Raise on data mismatch in
str.json_decode
(#19347) - Ensure Python version matches version used to serialize credential provider (#19375)
- Capture groups should be ignored in replace_all when literal=True (#19366)
- Ignore Parquet
is_{min,max}_value_exact
when set totrue
(#19344) - Projection pushdown was ignored by
include_file_paths
(#19341)
📖 Documentation
📦 Build system
- Revert PyO3 version back to
0.21
(#19376)
🛠️ Other improvements
- Expose group_by_dynamic in pyir (#19385)
- Add
AlignedBytes
types (#19308) - Remove unsued bytes->BytesIO conversion (#19369)
- Improve error message for Zero-Field Structs with Parquet (#19370)
- Reduce memcopy in parquet (#19350)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @barak1412, @benrutter, @coastalwhite, @corwinjoy, @itamarst, @max-muoto, @nameexhaustion, @orlp, @ritchie46, @stinodego, @wence- and @wolfgang-noichl