Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Fix typing on mostly and value_set fields #10571

Merged
merged 18 commits into from
Oct 28, 2024

Conversation

tyler-hoffman
Copy link
Contributor

@tyler-hoffman tyler-hoffman commented Oct 25, 2024

Context

We were previously using Annotated incorrectly. The correct way to use it is Annotated[T, x], where T is the base case, and x is the metadata. We had those type args backward. Pydantic uses the first argument for validation, and schema generation, so schemas were generated as expected, but mypy was upset because e.g. mostly=1 does not satisfy the type hint that mostly must be an instance of _Mostly. So we had to change a bunch of stuff

About our validation vs schema.

Our validation was fairly lax, so I kept it that way in this PR. e.g. multiple_of was not enforced, but was just there for the schema. Pydantic actually has support out of the box for most of what we were doing, including multiple_of, but since we didn't enforce this before, I didn't want to start enforcing it now.

The problem

From experimenting, it looks like constraints on nested annotations are not supported in schema generation. See #10577 for how I'd like to have written the solution, and how it fails in CI.

Implementation details

  • We're using the correct ordering in Annotated type args
  • I'm taking advantage of pydantic.Field allowing for arbitrary fields to be passed in AND the schema_extra override on the base Expectation class. Basically during schema_extra, we pop out a custom field, schema_overrides, and write the contents of it to whatever property it was on.

I just copy/pasted what was in the snapshots of our expectation schema json for value-set. For mostly, I used out of the box functionality where I could, and only passed the multiple_of in schema_overrides. The change in how these fields are added to the schema results in a bunch of noise in the PR due to ordering changes.

  • Description of PR changes above includes a link to an existing GitHub issue
  • PR title is prefixed with one of: [BUGFIX], [FEATURE], [DOCS], [MAINTENANCE], [CONTRIB]
  • Code is linted - run invoke lint (uses ruff format + ruff check)
  • Appropriate tests and docs have been updated

For more information about contributing, see Contribute.

After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!

Copy link

netlify bot commented Oct 25, 2024

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit b531378
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/671f96d6c820f50008849fa8

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.18%. Comparing base (abcf9f0) to head (b531378).
Report is 3 commits behind head on develop.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #10571      +/-   ##
===========================================
- Coverage    80.21%   80.18%   -0.03%     
===========================================
  Files          461      461              
  Lines        40002    39946      -56     
===========================================
- Hits         32088    32032      -56     
  Misses        7914     7914              
Flag Coverage Δ
3.10 67.82% <100.00%> (-0.04%) ⬇️
3.10 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.10 aws_deps ?
3.10 big ?
3.10 clickhouse ?
3.10 filesystem ?
3.10 mssql ?
3.10 mysql ?
3.10 postgresql ?
3.10 spark_connect ?
3.10 trino ?
3.11 67.82% <100.00%> (-0.04%) ⬇️
3.11 athena or openpyxl or pyarrow or project or sqlite or aws_creds ?
3.11 aws_deps ?
3.11 big ?
3.11 clickhouse ?
3.11 filesystem ?
3.11 mssql ?
3.11 mysql ?
3.11 postgresql ?
3.11 spark_connect ?
3.12 67.82% <100.00%> (-0.02%) ⬇️
3.12 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.25% <78.94%> (-0.02%) ⬇️
3.12 aws_deps 45.98% <78.94%> (-0.04%) ⬇️
3.12 big 54.62% <78.94%> (-0.02%) ⬇️
3.12 databricks 47.73% <78.94%> (-0.03%) ⬇️
3.12 filesystem 61.56% <100.00%> (-0.03%) ⬇️
3.12 mssql 50.08% <78.94%> (-0.04%) ⬇️
3.12 mysql 50.14% <78.94%> (-0.04%) ⬇️
3.12 postgresql 54.43% <78.94%> (-0.04%) ⬇️
3.12 snowflake 48.58% <78.94%> (-0.03%) ⬇️
3.12 spark 57.92% <78.94%> (-0.03%) ⬇️
3.12 spark_connect 46.28% <78.94%> (-0.04%) ⬇️
3.12 trino 52.52% <78.94%> (-0.04%) ⬇️
3.9 67.86% <100.00%> (-0.02%) ⬇️
3.9 athena or openpyxl or pyarrow or project or sqlite or aws_creds 55.25% <78.94%> (-0.02%) ⬇️
3.9 aws_deps 46.00% <78.94%> (-0.04%) ⬇️
3.9 big 54.64% <78.94%> (-0.02%) ⬇️
3.9 clickhouse 42.85% <78.94%> (-0.04%) ⬇️
3.9 databricks 47.74% <78.94%> (-0.03%) ⬇️
3.9 filesystem 61.57% <100.00%> (-0.03%) ⬇️
3.9 mssql 50.06% <78.94%> (-0.04%) ⬇️
3.9 mysql 50.12% <78.94%> (-0.04%) ⬇️
3.9 postgresql 54.41% <78.94%> (-0.04%) ⬇️
3.9 snowflake 48.59% <78.94%> (-0.03%) ⬇️
3.9 spark 57.88% <78.94%> (-0.03%) ⬇️
3.9 spark_connect 46.29% <78.94%> (-0.04%) ⬇️
3.9 trino 52.50% <78.94%> (-0.04%) ⬇️
cloud 0.00% <0.00%> (ø)
docs-basic 52.60% <78.94%> (-0.03%) ⬇️
docs-creds-needed 52.83% <78.94%> (-0.03%) ⬇️
docs-spark 52.24% <78.94%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -81,7 +81,7 @@
},
"value_set": {
"title": "Value Set",
"description": "A list of potential values to match.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only instance where we had a different description for the value set. I think it makes sense to make it the same as the others.

@@ -80,12 +80,12 @@
},
"mostly": {
"title": "Mostly",
"default": 1.0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lotta noise on these. It's functionally the same, just a different ordering for mostly.

@tyler-hoffman tyler-hoffman changed the title [BUGFIX] Fix typing on mostly and value_set [BUGFIX] Fix typing on mostly and value_set fields Oct 28, 2024
@tyler-hoffman tyler-hoffman mentioned this pull request Oct 28, 2024
4 tasks
@tyler-hoffman tyler-hoffman requested review from NathanFarmer and a team October 28, 2024 13:08
Copy link
Member

@joshua-stauffer joshua-stauffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - thanks for fixing this

@tyler-hoffman tyler-hoffman added this pull request to the merge queue Oct 28, 2024
Merged via the queue into develop with commit 581b7d2 Oct 28, 2024
71 checks passed
@tyler-hoffman tyler-hoffman deleted the b/core-412/value-set-and-mostly-2 branch October 28, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants