Skip to content

Conversation

@alex-700
Copy link
Contributor

@alex-700 alex-700 commented Feb 7, 2024

Summary

Implement implicit readlines (FURB129) lint.

Notes

I need a help/an opinion about suggested implementations.

This implementation differs from the original one from refurb in the following way. This implementation checks syntactically the call of the method with the name readlines() inside for {loop|generator expression}. The implementation from refurb also checks that callee is a variable with a type io.TextIOWrapper or io.BufferedReader.

  • I do not see a simple way to implement the same logic.
  • The best I can have is something like
checker.semantic().binding(checker.semantic().resolve_name(attr_expr.value.as_name_expr()?)?).statement(checker.semantic())

and analyze cases. But this will be not about types, but about guessing the type by assignment (or with) expression.

  • Also this logic has several false negatives, when the callee is not a variable, but the result of function call (e.g. open(...)).
  • On the other side, maybe it is good to lint this on other things, where this suggestion is not safe, and push the developers to change their interfaces to be less surprising, comparing with the standard library.
  • Anyway while the current implementation has false-positives (I mentioned some of them in the test) I marked the fixes to be unsafe.

Test Plan

cargo test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+7 -0 violations, +0 -0 fixes in 3 projects; 40 projects unchanged)

apache/airflow (+5 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ dev/breeze/src/airflow_breeze/global_constants.py:361:21: FURB129 Instead of calling `readlines()`, iterate over file object directly
+ dev/perf/sql_queries.py:143:41: FURB129 Instead of calling `readlines()`, iterate over file object directly
+ docs/exts/redirects.py:43:21: FURB129 Instead of calling `readlines()`, iterate over file object directly
+ scripts/ci/pre_commit/pre_commit_newsfragments.py:35:43: FURB129 Instead of calling `readlines()`, iterate over file object directly
+ tests/system/providers/google/cloud/gcs/resources/transform_script.py:26:39: FURB129 Instead of calling `readlines()`, iterate over file object directly

zulip/zulip (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ tools/lib/provision_inner.py:101:51: FURB129 Instead of calling `readlines()`, iterate over file object directly

indico/indico (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ setup.py:18:30: FURB129 Instead of calling `readlines()`, iterate over file object directly

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
FURB129 7 7 0 0 0

@charliermarsh charliermarsh self-requested a review February 13, 2024 02:15
@charliermarsh charliermarsh added the rule Implementing or modifying a lint rule label Feb 13, 2024
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added some extensions to our (basic) type inference system to support this.

@charliermarsh charliermarsh merged commit dd0ba16 into astral-sh:main Feb 13, 2024
@alex-700 alex-700 deleted the latyshev/furb129 branch February 13, 2024 09:34
nkxxll pushed a commit to nkxxll/ruff that referenced this pull request Mar 10, 2024
## Summary
Implement [implicit readlines
(FURB129)](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/implicit_readlines.py)
lint.

## Notes
I need a help/an opinion about suggested implementations.

This implementation differs from the original one from `refurb` in the
following way. This implementation checks syntactically the call of the
method with the name `readlines()` inside `for` {loop|generator
expression}. The implementation from refurb also
[checks](https://github.com/dosisod/refurb/blob/master/refurb/checks/iterable/implicit_readlines.py#L43)
that callee is a variable with a type `io.TextIOWrapper` or
`io.BufferedReader`.

- I do not see a simple way to implement the same logic.
- The best I can have is something like
```rust
checker.semantic().binding(checker.semantic().resolve_name(attr_expr.value.as_name_expr()?)?).statement(checker.semantic())
```
and analyze cases. But this will be not about types, but about guessing
the type by assignment (or with) expression.
- Also this logic has several false negatives, when the callee is not a
variable, but the result of function call (e.g. `open(...)`).
- On the other side, maybe it is good to lint this on other things,
where this suggestion is not safe, and push the developers to change
their interfaces to be less surprising, comparing with the standard
library.
- Anyway while the current implementation has false-positives (I
mentioned some of them in the test) I marked the fixes to be unsafe.
dylwil3 added a commit that referenced this pull request Apr 28, 2025
This PR promotes the fix applicability of [readlines-in-for
(FURB129)](https://docs.astral.sh/ruff/rules/readlines-in-for/#readlines-in-for-furb129)
to always safe.

In the original PR (#9880), the
author marked the rule as unsafe because Ruff's type inference couldn't
quite guarantee that we had an `IOBase` object in hand. Some false
positives were recorded in the test fixture. However, before the PR was
merged, Charlie added the necessary type inference and the false
positives went away.

According to the [Python
documentation](https://docs.python.org/3/library/io.html#io.IOBase), I
believe this fix is safe for any proper implementation of `IOBase`:

>[IOBase](https://docs.python.org/3/library/io.html#io.IOBase) (and its
subclasses) supports the iterator protocol, meaning that an
[IOBase](https://docs.python.org/3/library/io.html#io.IOBase) object can
be iterated over yielding the lines in a stream. Lines are defined
slightly differently depending on whether the stream is a binary stream
(yielding bytes), or a text stream (yielding character strings). See
[readline()](https://docs.python.org/3/library/io.html#io.IOBase.readline)
below.

and then in the [documentation for
`readlines`](https://docs.python.org/3/library/io.html#io.IOBase.readlines):

>Read and return a list of lines from the stream. hint can be specified
to control the number of lines read: no more lines will be read if the
total size (in bytes/characters) of all lines so far exceeds hint. [...]
>Note that it’s already possible to iterate on file objects using for
line in file: ... without calling file.readlines().

I believe that a careful reading of our [versioning
policy](https://docs.astral.sh/ruff/versioning/#version-changes)
requires that this change be deferred to a minor release - but please
correct me if I'm wrong!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants