Sort by key #7963

bluthej · 2023-10-15T11:07:52Z

Summary

Refactor for isort implementation. Closes #7738.

I introduced a NatOrdString and a NatOrdStr type to have a naturally ordered String and &str, and I pretty much went back to the original implementation based on module_key, member_key and sorted_by_cached_key from itertools. I tried my best to avoid unnecessary allocations but it may have been clumsy in some places, so feedback is appreciated! I also renamed the Prefix enum to MemberType (and made some related adjustments) because I think this fits more what it is, and it's closer to the wording found in the isort documentation.

I think the result is nicer to work with, and it should make implementing #1567 and the like easier :)

Of course, I am very much open to any and all remarks on what I did!

Test Plan

I didn't add any test, I am relying on the existing tests since this is just a refactor.

Sort straight imports with `sorted_by_cached_key`

I think it describes what it is more accurately and it's more consistent with the isort terminology

Remove all the `cmp_*` functions

github-actions · 2023-10-15T11:24:42Z

PR Check Results

Ecosystem

✅ ecosystem check detected no linter changes.

charliermarsh · 2023-10-16T00:09:56Z

crates/ruff_linter/src/rules/isort/sorting.rs

+        .unwrap_or_default();
+    let force_to_top = name.map(|name| !settings.force_to_top.contains(name)); // `false` < `true` so we get forced to top first
+    let maybe_lower_case_name = name
+        .and_then(|name| (!settings.case_sensitive).then_some(NatOrdString(name.to_lowercase())));


Can we avoid allocating here (name.to_lowercase()) in the event that the string is already lowercase?

I guess to do that we would need to use a Cow. What we could do is have a single NatOrdStr struct like this:

use std::borrow::Cow; struct NatOrdStr<'a>(Cow<'a, str>);

whose contents are either owned or borrowed.

I haven't played around with Cows much, so I don't know if the overhead is low enough to make it worthwhile to avoid that allocation in case the string is already lowercase, what do you think? If you think it is I'll gladly tweak my implementation :) but in that case I think it's better to use that Cow-based struct everywhere rather than just for maybe_lower_case_name.

I think it's worth it given that these are gonna lower-cased in the majority of cases.

Just pushed the modification, I had it ready just in case :)

charliermarsh · 2023-10-16T00:12:43Z

crates/ruff_linter/src/rules/isort/sorting.rs

-            } else {
-                natord::compare_ignore_case(alias1.name, alias2.name)
-                    .then_with(|| natord::compare(alias1.name, alias2.name))
-            }


Is this logic still being captured in the refactor?

It should be because both module_key and member_key return a tuple with first maybe_lower_case_name (which is Some(name.to_lowercase()) if settings.case_sensitive is false, otherwise None) and then the module/member name.

So if settings.case_sensitive is true, then we just compare the names using natord::compare, and if it's false we first compare the lowercase names with natord::compare, and then the actual names.

I guess the question is, is comparing two strings that have been turned into lowercase with natord::compare the same as comparing them directly with natord::compare_ignore_case? I looked at the source code for natord::compare_ignore_case and they're just calling to_lowercase on the chars individually, but apart from that it seems like what I did should be equivalent.

I think it is slightly different, because they also map integers to avoid sorting them lexicographically, right? For example, to avoid putting module10 before module2.

Oh I thought you were talking about the fact that there are two function calls (one case sensitive and one case insensitive).

The correct natrural ordering (typically module2 comes before module10) is indeed correctly captured in my refactor because I'm relying on natord::compare to implement the Ord trait for my types. Was that the question?

If the module name is already lowercase we don't have to call `to_lowercase`, which avoids some allocations (probably a lot since most modules are lowercase)

charliermarsh · 2023-10-30T02:28:37Z

(Sorry for the delay. This is on me to review and merge, but I'd like to do further testing for correctness since it's a really important code path.)

charliermarsh

I did some benchmarking and additional testing -- looks good to me. Thank you, and sorry for the delay here.

bluthej · 2023-10-30T07:18:47Z

Awesome :)

Absolutely no need to apologize, I totally understand and I'm very grateful that you took the time to look deeper into my proposal 🙏
I really appreciate that and I'm super excited this got merged! 😁

Now I can go back to implementing length sort ^^

bluthej added 9 commits October 15, 2023 12:38

Re-implement module_key function + sort with it

8661004

Sort straight imports with `sorted_by_cached_key`

Re-implement member_key function + use it

4050471

Rename prefix related things to member type

e6ef94c

I think it describes what it is more accurately and it's more consistent with the isort terminology

Unify module key between straight and from imports

8b82da0

Replace last call to cmp_either_import

a902a90

Remove all the `cmp_*` functions

Make meaning of two bools clearer

452d7ef

Remove useless Display impl

5a4b9dd

Add NatOrdStr to remove some allocations

087bd57

Remove remaining unnecessary allocations

e01bead

charliermarsh reviewed Oct 16, 2023

View reviewed changes

Use Cow to avoid unnecessary allocations

56b462f

If the module name is already lowercase we don't have to call `to_lowercase`, which avoids some allocations (probably a lot since most modules are lowercase)

Merge branch 'main' into sort-by-key

97914c4

charliermarsh force-pushed the sort-by-key branch from 2f564f8 to 8163c04 Compare October 30, 2023 04:24

Add docstring

8163c04

charliermarsh approved these changes Oct 30, 2023

View reviewed changes

charliermarsh added the isort Related to import sorting label Oct 30, 2023

charliermarsh enabled auto-merge (squash) October 30, 2023 04:25

Remove i64

c2a15e3

charliermarsh merged commit 5776ec1 into astral-sh:main Oct 30, 2023
16 checks passed

charliermarsh mentioned this pull request Oct 30, 2023

Feature request: Add Length-Sort config to isort #1567

Closed

miccal mentioned this pull request Nov 3, 2023

ruff 0.1.4 Homebrew/homebrew-core#153286

Merged

cmsetzer mentioned this pull request Nov 13, 2023

Regression in Ruff's ordering of import aliases with force-sort-within-sections #8661

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort by key #7963

Sort by key #7963

bluthej commented Oct 15, 2023

github-actions bot commented Oct 15, 2023 •

edited

Loading

charliermarsh Oct 16, 2023

bluthej Oct 17, 2023

charliermarsh Oct 20, 2023

bluthej Oct 20, 2023

charliermarsh Oct 16, 2023

bluthej Oct 17, 2023

charliermarsh Oct 20, 2023

bluthej Oct 20, 2023

charliermarsh commented Oct 30, 2023

charliermarsh left a comment

bluthej commented Oct 30, 2023

Sort by key #7963

Sort by key #7963

Conversation

bluthej commented Oct 15, 2023

Summary

Test Plan

github-actions bot commented Oct 15, 2023 • edited Loading

PR Check Results

Ecosystem

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charliermarsh commented Oct 30, 2023

charliermarsh left a comment

Choose a reason for hiding this comment

bluthej commented Oct 30, 2023

github-actions bot commented Oct 15, 2023 •

edited

Loading