Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

Merged
merged 24 commits into from
Nov 19, 2024

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented Nov 13, 2024

A few relevant changes here:

  • Expectation equality checks should be done with the actual object (and not .configuration)
    • This is due to the fact that __eq__ has specific logic around rendered content and other fields that could go awry
    • I've decided that both notes and meta should be excluded from __eq__; they are too volatile due to misc updates from Cloud and represent "metadata" as opposed to actual state in my opinion
  • We should ignore id when determining uniqueness with suite.add_expectation() (users can very easily add duplicates without realizing)
for _ in range(10):
    e = gxe.ExpectColumnValuesToBeBetween(...)
    suite.add_expectation(e)
print(len(suite.expectations)) # 1 if this `id` change but 10 otherwise - we want the former
  • Update docstring to point to using suite indexing when deleting

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Nov 13, 2024

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit a941c80
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/673ca1c23b60080008cc2a6f
😎 Deploy Preview https://deploy-preview-10662.docs.greatexpectations.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Nov 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.47%. Comparing base (7bf34a7) to head (a941c80).
Report is 1 commits behind head on develop.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10662      +/-   ##
===========================================
- Coverage    80.47%   80.47%   -0.01%     
===========================================
  Files          462      462              
  Lines        40107    40106       -1     
===========================================
- Hits         32278    32277       -1     
  Misses        7829     7829              
Flag Coverage Δ
3.10 68.17% <100.00%> (-0.01%) ⬇️
3.10 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 bigquery ?
3.10 clickhouse ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 spark_connect ?
3.11 68.17% <100.00%> (-0.01%) ⬇️
3.11 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.11 aws_deps ?
3.11 big ?
3.11 bigquery ?
3.11 clickhouse ?
3.11 filesystem ?
3.11 mssql ?
3.11 mysql ?
3.11 postgresql ?
3.11 spark_connect ?
3.12 68.15% <100.00%> (-0.01%) ⬇️
3.12 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.49% <75.00%> (+<0.01%) ⬆️
3.12 aws_deps 46.14% <25.00%> (+<0.01%) ⬆️
3.12 big 54.74% <75.00%> (+<0.01%) ⬆️
3.12 bigquery 45.91% <25.00%> (+<0.01%) ⬆️
3.12 databricks 48.12% <75.00%> (+<0.01%) ⬆️
3.12 filesystem 61.70% <100.00%> (-0.02%) ⬇️
3.12 mssql 51.52% <25.00%> (+<0.01%) ⬆️
3.12 mysql 51.58% <25.00%> (+<0.01%) ⬆️
3.12 postgresql 54.64% <75.00%> (+<0.01%) ⬆️
3.12 snowflake 48.89% <75.00%> (+<0.01%) ⬆️
3.12 spark 58.08% <75.00%> (+<0.01%) ⬆️
3.12 spark_connect 46.44% <25.00%> (+<0.01%) ⬆️
3.12 trino 52.67% <75.00%> (+<0.01%) ⬆️
3.9 68.20% <100.00%> (+0.01%) ⬆️
3.9 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.50% <75.00%> (+<0.01%) ⬆️
3.9 aws_deps 46.16% <25.00%> (+<0.01%) ⬆️
3.9 big 54.75% <75.00%> (+<0.01%) ⬆️
3.9 bigquery 45.92% <25.00%> (+<0.01%) ⬆️
3.9 clickhouse 43.03% <25.00%> (+<0.01%) ⬆️
3.9 databricks 48.14% <75.00%> (+<0.01%) ⬆️
3.9 filesystem 61.72% <100.00%> (-0.02%) ⬇️
3.9 mssql 51.50% <25.00%> (+<0.01%) ⬆️
3.9 mysql 51.57% <25.00%> (+<0.01%) ⬆️
3.9 postgresql 54.63% <75.00%> (+<0.01%) ⬆️
3.9 snowflake 48.91% <75.00%> (+<0.01%) ⬆️
3.9 spark 58.04% <75.00%> (+<0.01%) ⬆️
3.9 spark_connect 46.45% <25.00%> (+<0.01%) ⬆️
3.9 trino 52.66% <75.00%> (+<0.01%) ⬆️
cloud 0.00% <0.00%> (ø)
docs-basic 53.40% <100.00%> (+<0.01%) ⬆️
docs-creds-needed 52.97% <100.00%> (+<0.01%) ⬆️
docs-spark 52.45% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@@ -145,19 +145,13 @@ def add_expectation(self, expectation: _TExpectation) -> _TExpectation:
)
should_save_expectation = self._has_been_saved()
expectation_is_unique = all(
expectation.configuration != existing_expectation.configuration
for existing_expectation in self.expectations
expectation != existing_expectation for existing_expectation in self.expectations
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equality checks should utilize Expectation.__eq__ rather than configuration

Comment on lines -155 to -160
try:
expectation = self._store.add_expectation(suite=self, expectation=expectation)
self.expectations[-1].id = expectation.id
except Exception as exc:
self.expectations.pop()
raise exc # noqa: TRY201
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just wait to append until we succeed with persistence - also, the id is already attached to the input expectation

@@ -368,7 +368,7 @@ def __eq__(self, other: object) -> bool:

# rendered_content is derived from the rest of the expectation, and can/should
# be excluded from equality checks
exclude: set[str] = {"rendered_content"}
exclude: set[str] = {"rendered_content", "id"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legacy equality checks (i.e. using configuration) exclude id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to maintain that? Does still comparing the id break anything? Would it be a bug if the ids don't match? If we need this logic, maybe add it to the comment above about rendered_content?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it but if we look at the script I mentioned in Slack, we get different behavior based on this:

suite = context.suites.add(gx.ExpectationSuite("chetan-test-2024-11-13-v21"))

for i in range(10):
    e = gxe.ExpectColumnValuesToBeBetween(column="age", min_value=0, max_value=100)
    suite.add_expectation(e)

print(len(suite.expectations)) # How many should we see?

We historically have gotten 1 from this but making id part of the __eq__ check gets us 10.

@cdkini cdkini changed the title [BUGFIX] Enable proper expectation deletion with Cloud-backed suites [MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites Nov 14, 2024
)
def test_expectation_equality_with_id(self, id_a: str | None, id_b: str | None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new test

Copy link
Contributor

@tyler-hoffman tyler-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a blocker on 1.5 smallish things - LMK if I'm just confused

except Exception as exc:
self.expectations.pop()
raise exc # noqa: TRY201
expectation = self._store.add_expectation(suite=self, expectation=expectation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this seems much more clear!

@@ -218,13 +212,15 @@ def _process_expectation(
@public_api
def delete_expectation(self, expectation: Expectation) -> Expectation:
"""Delete an Expectation from the collection.
The input Expectation must be in the suite and referenced by index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to throw a blocker on here. The second part of this seems wrong. How could it matter how you retrieve the expectation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is fair - I want to encourage users to delete by using suite.expectations[idx] instead of gxe.ExpectWhateverWhatever. I think the example here is good enough.

Copy link
Contributor

@tyler-hoffman tyler-hoffman Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the example is fine with me. FWIW I'd personally probs find it by like next(e for e in suite.expectations where isinstance(e, MyExpectation))

@@ -368,7 +368,7 @@ def __eq__(self, other: object) -> bool:

# rendered_content is derived from the rest of the expectation, and can/should
# be excluded from equality checks
exclude: set[str] = {"rendered_content"}
exclude: set[str] = {"rendered_content", "id"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to maintain that? Does still comparing the id break anything? Would it be a bug if the ids don't match? If we need this logic, maybe add it to the comment above about rendered_content?

Copy link
Contributor

@tyler-hoffman tyler-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about equality being true when both ids exist and are different, but I think we can table that discussion and merge this. Thanks for fixing this!

def _determine_if_expectation_is_unique(self, expectation: Expectation) -> bool:
# Expectation is deemed unique if it is not already in the suite
# We do not consider the id of the expectation in this check
expectation_copies = copy.deepcopy(self.expectations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we also want to account for rendered_content, right?

Also does this get expensive when we have rendered content? If we want to save ourselves the deep copy (and mutation), you could do something like model1.dict(exclude={"id"}) == model2.dict(exclude={"id"}) in a loop, or that's what chatgpt tells me at least 😄

pytest.param(None, None, id="both_none"),
pytest.param({}, None, id="both_falsy"),
pytest.param({"author": "Bob Dylan"}, None, id="missing_meta"),
pytest.param({"author": "Bob Dylan"}, {"author": "John Lennon"}, id="different_meta"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially a huge blocker for me: did these guys actually write books?

@@ -368,18 +368,12 @@ def __eq__(self, other: object) -> bool:

# rendered_content is derived from the rest of the expectation, and can/should
# be excluded from equality checks
exclude: set[str] = {"rendered_content"}
# notes and meta are simple metadata and should be excluded from equality checks
exclude: set[str] = {"rendered_content", "notes", "meta"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really exclude these for equality? Why not just handle these kind of weird situations in the helper method above?

remaining_expectations = [
exp for exp in self.expectations if exp.configuration != expectation.configuration
]
remaining_expectations = [exp for exp in self.expectations if exp != expectation]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using the same kind of logic as in _determine_if_expectation_is_unique wrt ignoring ids?

Copy link
Contributor

@tyler-hoffman tyler-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking aloud: I wonder if some of this gets easier if we implement something like an is_equivalent or is_equalish on expectations? Then we can let the __eq__ method have fewer exceptions, and the logic will be in a somewhat findable/reusable place?

expectation: The expectation to check for.

Returns:
A tuple of a boolean indicating whether the expectation is already in the suite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: the method return type feels like a bit of a smell, and designed for a very specific use case. Would it be simpler if we took in 2 expectations, and just returned a bool, then let delete_expectation be responsible for looping?

Don't feel like you need to over-index on this - we've already had a ton of back and forth, and I appreciate your patience!!

@cdkini cdkini added this pull request to the merge queue Nov 19, 2024
Merged via the queue into develop with commit f72dafa Nov 19, 2024
71 checks passed
@cdkini cdkini deleted the b/core-633/expectation_deletion branch November 19, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants