[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

cdkini · 2024-11-13T18:03:57Z

A few relevant changes here:

Expectation equality checks should be done with the actual object (and not .configuration)
- This is due to the fact that __eq__ has specific logic around rendered content and other fields that could go awry
- I've decided that both notes and meta should be excluded from __eq__; they are too volatile due to misc updates from Cloud and represent "metadata" as opposed to actual state in my opinion
We should ignore id when determining uniqueness with suite.add_expectation() (users can very easily add duplicates without realizing)

for _ in range(10):
    e = gxe.ExpectColumnValuesToBeBetween(...)
    suite.add_expectation(e)
print(len(suite.expectations)) # 1 if this `id` change but 10 otherwise - we want the former

Update docstring to point to using suite indexing when deleting

Description of PR changes above includes a link to an existing GitHub issue
PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
Code is linted - run invoke lint (uses ruff format + ruff check)
Appropriate tests and docs have been updated

For more information about contributing, visit our community resources.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

netlify · 2024-11-13T18:04:13Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`a941c80`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/673ca1c23b60080008cc2a6f
😎 Deploy Preview	https://deploy-preview-10662.docs.greatexpectations.io
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

codecov · 2024-11-13T18:06:49Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.47%. Comparing base (7bf34a7) to head (a941c80).
Report is 1 commits behind head on develop.

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #10662      +/-   ##
===========================================
- Coverage    80.47%   80.47%   -0.01%     
===========================================
  Files          462      462              
  Lines        40107    40106       -1     
===========================================
- Hits         32278    32277       -1     
  Misses        7829     7829

Flag	Coverage Δ
3.10	`68.17% <100.00%> (-0.01%)`	⬇️
3.10 athena or openpyxl or pyarrow or project or sqlite or aws_creds	`?`
3.10 aws_deps	`?`
3.10 big	`?`
3.10 bigquery	`?`
3.10 clickhouse	`?`
3.10 filesystem	`?`
3.10 mssql	`?`
3.10 mysql	`?`
3.10 postgresql	`?`
3.10 spark_connect	`?`
3.11	`68.17% <100.00%> (-0.01%)`	⬇️
3.11 athena or openpyxl or pyarrow or project or sqlite or aws_creds	`?`
3.11 aws_deps	`?`
3.11 big	`?`
3.11 bigquery	`?`
3.11 clickhouse	`?`
3.11 filesystem	`?`
3.11 mssql	`?`
3.11 mysql	`?`
3.11 postgresql	`?`
3.11 spark_connect	`?`
3.12	`68.15% <100.00%> (-0.01%)`	⬇️
3.12 athena or openpyxl or pyarrow or project or sqlite or aws_creds	`55.49% <75.00%> (+<0.01%)`	⬆️
3.12 aws_deps	`46.14% <25.00%> (+<0.01%)`	⬆️
3.12 big	`54.74% <75.00%> (+<0.01%)`	⬆️
3.12 bigquery	`45.91% <25.00%> (+<0.01%)`	⬆️
3.12 databricks	`48.12% <75.00%> (+<0.01%)`	⬆️
3.12 filesystem	`61.70% <100.00%> (-0.02%)`	⬇️
3.12 mssql	`51.52% <25.00%> (+<0.01%)`	⬆️
3.12 mysql	`51.58% <25.00%> (+<0.01%)`	⬆️
3.12 postgresql	`54.64% <75.00%> (+<0.01%)`	⬆️
3.12 snowflake	`48.89% <75.00%> (+<0.01%)`	⬆️
3.12 spark	`58.08% <75.00%> (+<0.01%)`	⬆️
3.12 spark_connect	`46.44% <25.00%> (+<0.01%)`	⬆️
3.12 trino	`52.67% <75.00%> (+<0.01%)`	⬆️
3.9	`68.20% <100.00%> (+0.01%)`	⬆️
3.9 athena or openpyxl or pyarrow or project or sqlite or aws_creds	`55.50% <75.00%> (+<0.01%)`	⬆️
3.9 aws_deps	`46.16% <25.00%> (+<0.01%)`	⬆️
3.9 big	`54.75% <75.00%> (+<0.01%)`	⬆️
3.9 bigquery	`45.92% <25.00%> (+<0.01%)`	⬆️
3.9 clickhouse	`43.03% <25.00%> (+<0.01%)`	⬆️
3.9 databricks	`48.14% <75.00%> (+<0.01%)`	⬆️
3.9 filesystem	`61.72% <100.00%> (-0.02%)`	⬇️
3.9 mssql	`51.50% <25.00%> (+<0.01%)`	⬆️
3.9 mysql	`51.57% <25.00%> (+<0.01%)`	⬆️
3.9 postgresql	`54.63% <75.00%> (+<0.01%)`	⬆️
3.9 snowflake	`48.91% <75.00%> (+<0.01%)`	⬆️
3.9 spark	`58.04% <75.00%> (+<0.01%)`	⬆️
3.9 spark_connect	`46.45% <25.00%> (+<0.01%)`	⬆️
3.9 trino	`52.66% <75.00%> (+<0.01%)`	⬆️
cloud	`0.00% <0.00%> (ø)`
docs-basic	`53.40% <100.00%> (+<0.01%)`	⬆️
docs-creds-needed	`52.97% <100.00%> (+<0.01%)`	⬆️
docs-spark	`52.45% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

JS Bundle Analysis - Avoid shipping oversized bundles

cdkini · 2024-11-13T20:54:29Z

great_expectations/core/expectation_suite.py

@@ -145,19 +145,13 @@ def add_expectation(self, expectation: _TExpectation) -> _TExpectation:
            )
        should_save_expectation = self._has_been_saved()
        expectation_is_unique = all(
-            expectation.configuration != existing_expectation.configuration
-            for existing_expectation in self.expectations
+            expectation != existing_expectation for existing_expectation in self.expectations


Equality checks should utilize Expectation.__eq__ rather than configuration

cdkini · 2024-11-13T20:55:02Z

great_expectations/core/expectation_suite.py

-                try:
-                    expectation = self._store.add_expectation(suite=self, expectation=expectation)
-                    self.expectations[-1].id = expectation.id
-                except Exception as exc:
-                    self.expectations.pop()
-                    raise exc  # noqa: TRY201


We can just wait to append until we succeed with persistence - also, the id is already attached to the input expectation

cdkini · 2024-11-13T20:55:29Z

great_expectations/expectations/expectation.py

@@ -368,7 +368,7 @@ def __eq__(self, other: object) -> bool:

        # rendered_content is derived from the rest of the expectation, and can/should
        # be excluded from equality checks
-        exclude: set[str] = {"rendered_content"}
+        exclude: set[str] = {"rendered_content", "id"}


Legacy equality checks (i.e. using configuration) exclude id

Do we want to maintain that? Does still comparing the id break anything? Would it be a bug if the ids don't match? If we need this logic, maybe add it to the comment above about rendered_content?

I don't love it but if we look at the script I mentioned in Slack, we get different behavior based on this:

suite = context.suites.add(gx.ExpectationSuite("chetan-test-2024-11-13-v21")) for i in range(10): e = gxe.ExpectColumnValuesToBeBetween(column="age", min_value=0, max_value=100) suite.add_expectation(e) print(len(suite.expectations)) # How many should we see?

We historically have gotten 1 from this but making id part of the __eq__ check gets us 10.

cdkini · 2024-11-14T14:42:03Z

tests/expectations/test_expectation.py

    )
+    def test_expectation_equality_with_id(self, id_a: str | None, id_b: str | None):


This is the new test

…_expectations into b/core-633/expectation_deletion

tyler-hoffman

Left a blocker on 1.5 smallish things - LMK if I'm just confused

tyler-hoffman · 2024-11-14T14:52:38Z

great_expectations/core/expectation_suite.py

-                except Exception as exc:
-                    self.expectations.pop()
-                    raise exc  # noqa: TRY201
+                expectation = self._store.add_expectation(suite=self, expectation=expectation)


well this seems much more clear!

tyler-hoffman · 2024-11-14T14:55:00Z

great_expectations/core/expectation_suite.py

@@ -218,13 +212,15 @@ def _process_expectation(
    @public_api
    def delete_expectation(self, expectation: Expectation) -> Expectation:
        """Delete an Expectation from the collection.
+        The input Expectation must be in the suite and referenced by index.


I'm going to throw a blocker on here. The second part of this seems wrong. How could it matter how you retrieve the expectation?

Hmm this is fair - I want to encourage users to delete by using suite.expectations[idx] instead of gxe.ExpectWhateverWhatever. I think the example here is good enough.

Yeah, the example is fine with me. FWIW I'd personally probs find it by like next(e for e in suite.expectations where isinstance(e, MyExpectation))

tyler-hoffman · 2024-11-14T14:58:43Z

great_expectations/expectations/expectation.py

@@ -368,7 +368,7 @@ def __eq__(self, other: object) -> bool:

        # rendered_content is derived from the rest of the expectation, and can/should
        # be excluded from equality checks
-        exclude: set[str] = {"rendered_content"}
+        exclude: set[str] = {"rendered_content", "id"}


Do we want to maintain that? Does still comparing the id break anything? Would it be a bug if the ids don't match? If we need this logic, maybe add it to the comment above about rendered_content?

tyler-hoffman

I have mixed feelings about equality being true when both ids exist and are different, but I think we can table that discussion and merge this. Thanks for fixing this!

tyler-hoffman · 2024-11-15T16:25:41Z

great_expectations/core/expectation_suite.py

+    def _determine_if_expectation_is_unique(self, expectation: Expectation) -> bool:
+        # Expectation is deemed unique if it is not already in the suite
+        # We do not consider the id of the expectation in this check
+        expectation_copies = copy.deepcopy(self.expectations)


I'm guessing we also want to account for rendered_content, right?

Also does this get expensive when we have rendered content? If we want to save ourselves the deep copy (and mutation), you could do something like model1.dict(exclude={"id"}) == model2.dict(exclude={"id"}) in a loop, or that's what chatgpt tells me at least 😄

tyler-hoffman · 2024-11-15T16:27:38Z

tests/expectations/test_expectation.py

+        pytest.param(None, None, id="both_none"),
+        pytest.param({}, None, id="both_falsy"),
+        pytest.param({"author": "Bob Dylan"}, None, id="missing_meta"),
+        pytest.param({"author": "Bob Dylan"}, {"author": "John Lennon"}, id="different_meta"),


Potentially a huge blocker for me: did these guys actually write books?

tyler-hoffman · 2024-11-15T16:28:48Z

great_expectations/expectations/expectation.py

@@ -368,18 +368,12 @@ def __eq__(self, other: object) -> bool:

        # rendered_content is derived from the rest of the expectation, and can/should
        # be excluded from equality checks
-        exclude: set[str] = {"rendered_content"}
+        # notes and meta are simple metadata and should be excluded from equality checks
+        exclude: set[str] = {"rendered_content", "notes", "meta"}


Should we really exclude these for equality? Why not just handle these kind of weird situations in the helper method above?

tyler-hoffman · 2024-11-15T16:29:22Z

great_expectations/core/expectation_suite.py

-        remaining_expectations = [
-            exp for exp in self.expectations if exp.configuration != expectation.configuration
-        ]
+        remaining_expectations = [exp for exp in self.expectations if exp != expectation]


Should this be using the same kind of logic as in _determine_if_expectation_is_unique wrt ignoring ids?

tyler-hoffman

Just thinking aloud: I wonder if some of this gets easier if we implement something like an is_equivalent or is_equalish on expectations? Then we can let the __eq__ method have fewer exceptions, and the logic will be in a somewhat findable/reusable place?

…great-expectations/great_expectations into b/core-633/expectation_deletion

tyler-hoffman · 2024-11-18T14:24:55Z

great_expectations/core/expectation_suite.py

+            expectation: The expectation to check for.
+
+        Returns:
+            A tuple of a boolean indicating whether the expectation is already in the suite


Non-blocking: the method return type feels like a bit of a smell, and designed for a very specific use case. Would it be simpler if we took in 2 expectations, and just returned a bool, then let delete_expectation be responsible for looping?

Don't feel like you need to over-index on this - we've already had a ton of back and forth, and I appreciate your patience!!

simplify deletion checks

3190a7a

exclude id from check

64711bb

cdkini commented Nov 13, 2024

View reviewed changes

cdkini changed the title ~~[BUGFIX] Enable proper expectation deletion with Cloud-backed suites~~ [MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites Nov 14, 2024

add tests

6ec88cb

cdkini commented Nov 14, 2024

View reviewed changes

cdkini added 2 commits November 14, 2024 09:42

Merge branch 'develop' of https://github.com/great-expectations/great…

c33183c

…_expectations into b/core-633/expectation_deletion

add comment

84e412f

tyler-hoffman requested changes Nov 14, 2024

View reviewed changes

misc updates

35ffdd9

tyler-hoffman approved these changes Nov 14, 2024

View reviewed changes

cdkini added 7 commits November 14, 2024 12:01

move biz logic to suite

4e85910

misc cleanup

7aa007d

update tests

8caf19d

more tests

8521434

more cleanup

294055c

another one

c3b70ac

Merge branch 'develop' into b/core-633/expectation_deletion

271eb63

tyler-hoffman reviewed Nov 15, 2024

View reviewed changes

cdkini requested a review from tyler-hoffman November 15, 2024 16:54

cdkini added 5 commits November 15, 2024 12:06

update logic

c4d50f2

more cleanu

a875ca8

more cleanup

0a46d3d

more tests

3a8e858

Merge branch 'develop' into b/core-633/expectation_deletion

5b77284

cdkini added 3 commits November 17, 2024 20:20

Merge branch 'develop' into b/core-633/expectation_deletion

c0ab743

update logic

59870a5

Merge branch 'b/core-633/expectation_deletion' of https://github.com/…

2bff2c4

…great-expectations/great_expectations into b/core-633/expectation_deletion

tyler-hoffman approved these changes Nov 18, 2024

View reviewed changes

cdkini enabled auto-merge November 18, 2024 14:25

cdkini disabled auto-merge November 18, 2024 14:25

cdkini added 3 commits November 18, 2024 09:31

simplify logic

dec3b31

fix typo

7153e2c

Merge branch 'develop' into b/core-633/expectation_deletion

a941c80

cdkini enabled auto-merge November 19, 2024 14:33

cdkini added this pull request to the merge queue Nov 19, 2024

Merged via the queue into develop with commit f72dafa Nov 19, 2024
71 checks passed

cdkini deleted the b/core-633/expectation_deletion branch November 19, 2024 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

cdkini commented Nov 13, 2024 •

edited

Loading

netlify bot commented Nov 13, 2024 •

edited

Loading

codecov bot commented Nov 13, 2024 •

edited

Loading

cdkini Nov 13, 2024

cdkini Nov 13, 2024

cdkini Nov 13, 2024

tyler-hoffman Nov 14, 2024

cdkini Nov 14, 2024

cdkini Nov 14, 2024

tyler-hoffman left a comment

tyler-hoffman Nov 14, 2024

tyler-hoffman Nov 14, 2024

cdkini Nov 14, 2024

tyler-hoffman Nov 14, 2024 •

edited

Loading

tyler-hoffman Nov 14, 2024

tyler-hoffman left a comment

tyler-hoffman Nov 15, 2024

tyler-hoffman Nov 15, 2024

tyler-hoffman Nov 15, 2024

tyler-hoffman Nov 15, 2024

tyler-hoffman left a comment

tyler-hoffman Nov 18, 2024

		)
		def test_expectation_equality_with_id(self, id_a: str \| None, id_b: str \| None):

[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

[MAINTENANCE] Improve experience around expectation deletion with Cloud-backed suites #10662

Conversation

cdkini commented Nov 13, 2024 • edited Loading

netlify bot commented Nov 13, 2024 • edited Loading

✅ Deploy Preview for niobium-lead-7998 ready!

codecov bot commented Nov 13, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-hoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-hoffman Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-hoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-hoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdkini commented Nov 13, 2024 •

edited

Loading

netlify bot commented Nov 13, 2024 •

edited

Loading

codecov bot commented Nov 13, 2024 •

edited

Loading

tyler-hoffman Nov 14, 2024 •

edited

Loading