Skip to content

Test collection is very slow for test class TestXYZ #8415

@seberg

Description

@seberg

The NumPy test collection always seemed to have crept up over time, which has been bugging me a bit for a while. Right now test collection takes 1/3 or the whole execution time (for the typical test run, it was especially annoying me when running in python3.9-dbg where the collection-phase takes even longer).

Here is the relevant part of the cProfile output (sorted by cumtime):

        1    0.000    0.000   47.352   47.352 main.py:589(perform_collect)
62851/669    0.008    0.000   46.542    0.070 {method 'extend' of 'list' objects}
93099/16003    0.050    0.000   46.539    0.003 main.py:806(genitems)
     2141    0.005    0.000   46.222    0.022 runner.py:541(collect_one_node)
     2141    0.005    0.000   46.128    0.022 runner.py:370(pytest_make_collect_report)
     2141    0.006    0.000   46.118    0.022 runner.py:317(from_call)
     2141    0.002    0.000   46.108    0.022 runner.py:371(<lambda>)
     1134    0.033    0.000   42.511    0.037 python.py:409(collect)
      161    0.001    0.000   42.299    0.263 python.py:504(collect)
     8972    0.015    0.000   39.868    0.004 {method 'sort' of 'list' objects}
    16944    0.019    0.000   39.840    0.002 python.py:440(sort_key)
    16944    0.050    0.000   39.818    0.002 python.py:324(reportinfo)
    16944    0.026    0.000   39.476    0.002 code.py:1189(getfslineno)
      974    0.945    0.001   38.952    0.040 source.py:119(findsource)
      974    0.010    0.000   37.794    0.039 inspect.py:809(findsource)

Digging into it, at least for NumPy that findsource is only used to get the lineno of class TestFunction style test-classes. If I do a silly change (this is obviously nonsense, so probably might as ):

diff --git a/src/_pytest/_code/code.py b/src/_pytest/_code/code.py
index b85217560..45d985f8e 100644
--- a/src/_pytest/_code/code.py
+++ b/src/_pytest/_code/code.py
@@ -1210,11 +1210,6 @@ def getfslineno(obj: object) -> Tuple[Union[str, Path], int]:
 
         fspath = fn and absolutepath(fn) or ""
         lineno = -1
-        if fspath:
-            try:
-                _, lineno = findsource(obj)
-            except OSError:
-                pass
         return fspath, lineno
 
     return code.path, code.firstlineno

(or any other logic). EDIT: Forgot to say, this gives me about ~6x speedup of collection, I think.

The problem here is can be semi-mitigated in NumPy, or be blamed on the slow AST parser called by inspect. It is that we have large test files with quite many class TestFunction classes, and that effectively scales quadratic, since the AST parser takes longer the more tests there are!

Anyway, this might just be a duplicate of gh-2206 but I am wondering if there isn't some trivial solution, maybe the lineno can just be avoided since the module.__dict__.keys() should is probably ordered, at least on newer Python versions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: collectionrelated to the collection phasetype: performanceperformance or memory problem/improvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions