-
Notifications
You must be signed in to change notification settings - Fork 1.5k
chore(wren-ai-service): fix add quotes #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds SQL keyword detection to avoid quoting reserved words/data types during identifier quoting and updates token-guard logic in is_ident. Extends tests with many cases covering date/time, timezone, window functions, CTEs, dotted identifiers, literals, and formatting to validate add_quotes behavior. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Engine as Engine.add_quotes
participant Tokenizer
participant Ident as is_ident
participant KW as is_sql_keyword
Client->>Engine: submit SQL string
Engine->>Tokenizer: tokenize(SQL)
loop for each token (right-to-left edits applied later)
Engine->>Ident: inspect token and type
Ident->>KW: check token_text against keyword set
alt token_text is keyword
KW-->>Ident: true
Ident-->>Engine: mark as not-to-quote
else token_text not keyword
KW-->>Ident: false
Ident-->>Engine: if token_type in {VAR,SCHEMA,TABLE,COLUMN,DATABASE,INDEX,VIEW} -> eligible
Engine->>Engine: queue quote edit for token
end
end
Engine->>Engine: apply queued edits right-to-left
Engine-->>Client: return modified SQL and error (if any)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
wren-ai-service/src/core/engine.py (1)
73-188: Consider maintenance impact of hardcoded keyword listWhile the keyword detection approach is sound, maintaining a hardcoded list of SQL keywords may become challenging over time. Consider these potential improvements:
- The list may need updates as SQL dialects evolve
- Some entries like "CTE" (line 111) are not actually SQL keywords
- "WITH" appears twice (lines 109 and 133)
Consider externalizing this list to a configuration file or using a SQL parsing library that already maintains keyword lists:
- def is_sql_keyword(text: str) -> bool: - """Check if the text is a SQL keyword that should not be quoted.""" - # Common SQL keywords that should never be quoted - sql_keywords = { - # ... (lines 77-187) - } - return text.upper() in sql_keywords + def is_sql_keyword(text: str) -> bool: + """Check if the text is a SQL keyword that should not be quoted.""" + # Consider loading from a config file or using sqlparse.keywords + from sqlparse import keywords + return keywords.is_keyword(text.upper())Alternatively, store the keywords in a separate module or JSON file for easier maintenance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
wren-ai-service/src/core/engine.py(1 hunks)wren-ai-service/tests/pytest/test_engine.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
wren-ai-service/tests/pytest/test_engine.py (1)
wren-ai-service/src/core/engine.py (1)
add_quotes(67-285)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/core/engine.py (2)
192-208: Guard clause pattern improves readabilityGood refactoring of the
is_identfunction using early returns. The guard clause pattern makes the logic clearer and easier to follow.
204-206: Token boundary assumption confirmedI’ve verified with SQLGlot v27.8.0 that slicing the SQL string using
sql[tok.start : tok.end + 1]correctly reproduces eachtok.textfor a variety of queries, confirming thattok.endis inclusive. No changes are needed here—this extraction is safe as written.wren-ai-service/tests/pytest/test_engine.py (4)
143-158: Comprehensive date function test coverageExcellent test coverage for date/time functions. The test properly validates that function names remain unquoted while their arguments are quoted appropriately.
159-427: Excellent comprehensive test coverage for timezone operationsThe extensive test coverage for timezone-related SQL operations is thorough and well-structured. Tests cover:
- Time literals and formatting
- Timezone conversions and offsets
- Interval arithmetic
- Window functions with temporal ordering
- CTEs with timezone operations
This provides confidence that the keyword detection won't incorrectly quote SQL temporal functions.
11-142: Well-structured test organizationThe existing tests provide good coverage of basic SQL quoting scenarios including:
- Simple identifiers
- Dotted references
- Already quoted identifiers
- Wildcard patterns
- Function calls
- Complex queries with joins
The test structure is clear and follows good naming conventions.
446-446: Ignore the CTE quoting assertion comment
Theadd_quotesfunction intentionally wraps all identifiers—including CTE names—in double quotes. As a result, asserting that'"timezone_adjusted" AS'appears in the output is correct and should remain unchanged.Likely an incorrect or invalid review comment.
Summary by CodeRabbit
Bug Fixes
Tests