Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Prevent unneeded test setup/teardown #10619

Merged
merged 30 commits into from
Nov 5, 2024

Conversation

tyler-hoffman
Copy link
Contributor

@tyler-hoffman tyler-hoffman commented Nov 2, 2024

See the comment toward the bottom of conftest.py for details on the approach. But a couple notes on some of the changes here:

  • We now have a mapping of TestConfig -> BatchTestSetup that is shared across test runs so that we can use them when appropriate and only run setup/teardown once
  • For this behavior to work, we need TestConfig to be hashable, so I implemented both __hash__ and __eq__.
  • Because of ^, it was easiest just to push extra_data down to the base BatchTestSetup. I'm interested in reworking this a bit in a subsequent refactor.
  • Because gx doesn't support use of multiple concurrent contexts, there's now a call to set_context in our fixture before we run yield to the test. Without this, tests were able to run with different tests' contexts, resulting in errors around not finding the datasource referenced by batches
  • Because of ^, BatchTestSetup._context was made public as BatchTestSetup.context
  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Nov 2, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 0467e98
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/672a7d166e65380008d49a7d

Copy link

codecov bot commented Nov 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.31%. Comparing base (da65e16) to head (0467e98).
Report is 1 commits behind head on develop.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10619      +/-   ##
===========================================
- Coverage    80.31%   80.31%   -0.01%     
===========================================
  Files          463      463              
  Lines        40117    40117              
===========================================
- Hits         32221    32220       -1     
- Misses        7896     7897       +1     
Flag Coverage Δ
3.10 68.03% <ø> (ø)
3.10 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 clickhouse ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 spark_connect ?
3.10 trino ?
3.11 68.03% <ø> (ø)
3.11 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.11 aws_deps ?
3.11 big ?
3.11 clickhouse ?
3.11 filesystem ?
3.11 mssql ?
3.11 mysql ?
3.11 postgresql ?
3.11 spark_connect ?
3.11 trino ?
3.12 68.03% <ø> (ø)
3.12 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.41% <ø> (ø)
3.12 aws_deps 46.14% <ø> (ø)
3.12 big 54.75% <ø> (ø)
3.12 databricks 47.88% <ø> (ø)
3.12 filesystem 61.71% <ø> (ø)
3.12 mssql 50.25% <ø> (ø)
3.12 mysql 50.31% <ø> (ø)
3.12 postgresql 54.63% <ø> (ø)
3.12 snowflake 48.85% <ø> (-0.01%) ⬇️
3.12 spark 58.06% <ø> (ø)
3.12 spark_connect 46.44% <ø> (ø)
3.12 trino 52.68% <ø> (ø)
3.9 68.06% <ø> (-0.01%) ⬇️
3.9 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.41% <ø> (ø)
3.9 aws_deps 46.17% <ø> (ø)
3.9 big 54.76% <ø> (ø)
3.9 clickhouse 43.03% <ø> (ø)
3.9 databricks 47.89% <ø> (ø)
3.9 filesystem 61.72% <ø> (ø)
3.9 mssql 50.23% <ø> (ø)
3.9 mysql 50.30% <ø> (ø)
3.9 postgresql 54.61% <ø> (ø)
3.9 snowflake 48.86% <ø> (-0.01%) ⬇️
3.9 spark 58.02% <ø> (ø)
3.9 spark_connect 46.45% <ø> (ø)
3.9 trino 52.66% <ø> (ø)
cloud 0.00% <ø> (ø)
docs-basic 53.36% <ø> (ø)
docs-creds-needed 52.93% <ø> (ø)
docs-spark 52.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tyler-hoffman tyler-hoffman changed the title M/core 567/improve test performance [MAINTENANCE] Prevent unneeded test setup/teardown Nov 2, 2024
@@ -148,29 +148,29 @@ class TestExpectTableRowCountToEqualOtherTable:
data_source_configs=[
PostgreSQLDatasourceTestConfig(
column_types={"col_a": sqltypes.INTEGER},
extra_assets={"test_table_two": {"col_b": sqltypes.VARCHAR}},
extra_assets={"test_table_a": {"col_b": sqltypes.VARCHAR}},
Copy link
Contributor Author

@tyler-hoffman tyler-hoffman Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO in a subsequent task: figure out how to generate these names ourselves like we do for the main asset. https://greatexpectations.atlassian.net/browse/CORE-586

@tyler-hoffman tyler-hoffman marked this pull request as ready for review November 2, 2024 20:10
@tyler-hoffman tyler-hoffman requested a review from a team November 2, 2024 20:22
tests/integration/conftest.py Outdated Show resolved Hide resolved
# We need to implement this ourselves to call `.equals` on dataframes.`
if not isinstance(value, TestConfig):
return False
return all(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do an equality check on the hashes? Not sure if that's better but this seems to very similar to that and then we'd only have 1 block of code to update if this changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to push back a bit here. I considered similar approaches. I went with this because, while more tedious, it saves us from the (very unlikely) case of hash collisions. But the likelihood of that is sooo small, so I'm totally happy to change to a hash comparison, but wanted to lay out my thoughts. LMK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is better.


@override
def __hash__(self) -> int:
hashable_col_types = dict_to_tuple(self.column_types) if self.column_types else None
Copy link
Contributor

@billdirks billdirks Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2 objects have the same value of __eq__ are they guaranteed to have the same hash? This seems to be a requirement of hash (eg so putting them in dictionaries works as expected): https://docs.python.org/3/reference/datamodel.html#object.__hash__

Copy link
Contributor Author

@tyler-hoffman tyler-hoffman Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2 objects have the same value of eq are they guaranteed to have the same hash

If they have the same value of eq AND implement hash, it must be the same, but most (probably all well defined?) mutable objects do not implement hash.

>>> hash({"foo": "bar"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

The trick I've always seen is to use tuples, but maybe there's something cleaner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we are talking about the same thing? Using tuples is fine. I'm asking if a == b are we guaranteed that a.__hash__() == b.__hash__()? Maybe it's true but it wasn't apparent when I read this code.

My link didn't work because the last characters got removed from the url (fixed now) but I'm referring to this from docs:

The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple.

If we use this for a key in a dict, and this isn't true, the lookup can fail since both hash and equality are used in the lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I misunderstood your question. My intention was definitely the implement this so if a == b are we guaranteed that a.hash() == b.hash(). I'm pretty confident it does this. But LMK if you aren't convinced (or if I'm still missing something 😄 )

Copy link
Member

@joshua-stauffer joshua-stauffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyler-hoffman tyler-hoffman added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@tyler-hoffman tyler-hoffman added this pull request to the merge queue Nov 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 5, 2024
@tyler-hoffman tyler-hoffman added this pull request to the merge queue Nov 5, 2024
@tyler-hoffman tyler-hoffman removed this pull request from the merge queue due to a manual request Nov 5, 2024
@tyler-hoffman tyler-hoffman added this pull request to the merge queue Nov 5, 2024
Merged via the queue into develop with commit dbd4746 Nov 5, 2024
70 checks passed
@tyler-hoffman tyler-hoffman deleted the m/CORE-567/improve-test-performance branch November 5, 2024 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants