Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenize in 3.12 makes copies of each line, 3.11 does not #119654

Closed
nedbat opened this issue May 28, 2024 · 3 comments
Closed

tokenize in 3.12 makes copies of each line, 3.11 does not #119654

nedbat opened this issue May 28, 2024 · 3 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented May 28, 2024

Bug report

Bug description:

The tokenize module creates TokenInfo objects with a .line attribute. In Python 3.11, each token on a line used the same string object for .line. In 3.12, each token has a new copy of the same string.

This is part of a memory issue reported against coverage.py: nedbat/coveragepy#1791

# tok.py

import io
import sys
import tokenize

print(f"{sys.version = }")

text = "lorem ipsum quia dolor sit amet consectetur adipisci velit"
readline = io.StringIO(text).readline
toks = list(tokenize.generate_tokens(readline))

print(f"{toks[0].line = }")
print(f"{(toks[0].line == toks[1].line) = }")
print(f"{(toks[0].line is toks[1].line) = }")

3.11 re-uses string objects:

% python3.11 /tmp/tok.py
sys.version = '3.11.9 (main, Apr  8 2024, 14:01:56) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = True

3.12 (and above) makes new string objects:

% python3.12 /tmp/tok.py
sys.version = '3.12.3 (main, Apr  9 2024, 15:45:14) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = False

CPython versions tested on:

3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

macOS

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label May 28, 2024
@Eclips4 Eclips4 added the stdlib Python modules in the Lib dir label May 28, 2024
@Eclips4
Copy link
Member

Eclips4 commented May 28, 2024

cc @pablogsal @lysnikolaou

@devdanzin
Copy link
Contributor

This was the finding of #119118 and will be fixed by #119615.

@pablogsal
Copy link
Member

Closing in favour of #119118

@terryjreedy terryjreedy closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants