-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fast-path for comment detection #9808
Conversation
905202d
to
dd1fe54
Compare
CodSpeed Performance ReportMerging #9808 will improve performances by 4.84%Comparing Summary
Benchmarks breakdown
|
@@ -182,7 +182,7 @@ fn to_keyword_or_other(source: &str) -> SimpleTokenKind { | |||
"case" => SimpleTokenKind::Case, | |||
"with" => SimpleTokenKind::With, | |||
"yield" => SimpleTokenKind::Yield, | |||
_ => SimpleTokenKind::Other, // Potentially an identifier, but only if it isn't a string prefix. We can ignore this for now https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals | |||
_ => SimpleTokenKind::Name, // Potentially an identifier, but only if it isn't a string prefix. We can ignore this for now https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we avoid returning a Name
for a string prefix?
03a1844
to
a78aec3
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable having such an important constraint in an inline comment that violates the basic properties of SimpleTokenizer
. We should explore if we can support proper name lexing in SimpleTokenizer
without degrading performance.
a78aec3
to
03dcbf5
Compare
03dcbf5
to
c63bc5e
Compare
Summary
When we fall through to parsing, the comment-detection rule is a significant portion of lint time. This PR adds an additional fast heuristic whereby we abort if a comment contains two consecutive name tokens (via the zero-allocation lexer). For the
ctypeslib.py
, which has a few cases that are now caught by this, it's a 2.5x speedup for the rule (and a 20% speedup for token-based rules).