-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support use of Duration in to_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
#19697
Conversation
to_string
to_string
, ergonomic improvement, tz-aware Datetime bugfix
to_string
, ergonomic improvement, tz-aware Datetime bugfixto_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one, just left some comments!
i also notice that this does about 2-3 things together - is it possible to do any of them separately (like the tz-aware datetime bugfix)?
9f6c6f3
to
646cbf6
Compare
The bug is fixed as a side-effect of explicitly/centrally declaring the right ISO strings, and everything else is connected, so there isn't really anything to separate out 😅 |
546fb33
to
9d99e1e
Compare
fef1443
to
620a23f
Compare
to_string
, ergonomic/perf improvement, tz-aware Datetime bugfixto_string
, ergonomic/perf improvement, tz-aware Datetime bugfix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining!
I think the logic is right (still doing a final check), but I do want to check about what's going on in Display::fmt
. It looks to me like the current logic is very careful to just write to f
, and that fmt_duration_ns
/ fmt_duration_us
/ fmt_duration_ms
only do integer arithmetic without allocating any new Strings
With this PR, on the other hand, fmt_duration_string
always allocates a new String
A comment Ritchie had made on something similar some time ago was:
But those string allocations will make this slower down the line as well. As they fragment your heap.
So, I need to defer to @ritchie46 on the perf / memory implications here
c7d4912
to
95b578e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19697 +/- ##
========================================
Coverage 79.55% 79.56%
========================================
Files 1544 1544
Lines 213240 213341 +101
Branches 2441 2442 +1
========================================
+ Hits 169643 169741 +98
- Misses 43048 43051 +3
Partials 549 549 ☔ View full report in Codecov by Sentry. |
85dbbce
to
f37f800
Compare
f37f800
to
43d4cb3
Compare
Closes #7174.
(Supersedes the now-closed #19663).
New feature
dt.to_string()
to Duration cols, transforming them to an ISO8601 duration string (see: https://en.wikipedia.org/wiki/ISO_8601#Durations for details on this format).to_string
expression (for all temporal dtypes) can now take the shortcut format string"iso"
, which represents the dtype-specific ISO8601 format for the given column(s). Setting no explicit format is equivalent to setting "iso", leading to some nice ergonomic improvements (detailed below).(Note: the ISO duration string conversion was validated against an established package that does the same conversion; created an ad-hoc parametric test against many tens of thousands of randomly-generated timedeltas, and everything tied-out. Included various edge-cases in the unit tests).
Ergonomics
If you do not supply a format string (previously not possible) or set it to
"iso"
, the associated dtype-specific ISO8601 format is used. This means that you can now make a single expression that will properly format all temporal columns with their dtype-appropriate ISO format, rather than have to separately associate an explicit format string with each dtype.So, can now write...
...instead of:
Performance
itoa
conversions andpush/push_str
calls instead of relying onwrite!
to do the integer transforms into the output string (~3x faster in local testing) 🚀Bugfix
cast
) did not include the expected ISO timezone offset - spotted this with some new tests and fixed it.Examples
Cast to dtype-specific ISO8601 string (the default, if no explicit format string given):
Durations do not support strftime; instead
to_string
recognises only"iso"
or"polars"
as formats for durations - the latter results in the same string output that we display in the frame repr: