chore(wren-ai-service): fix add quotes #1917

cyyeh · 2025-08-27T07:13:37Z

Summary by CodeRabbit

Bug Fixes
- Improved SQL identifier quoting so keywords and data-types are not quoted; functions, literals and format strings remain unquoted while valid identifiers (including dotted references) are correctly quoted across date/time, timezone, window functions, casts, intervals, and CTEs.
Tests
- Added extensive tests validating quoting behavior across date/time and timestamp functions, timezones, window clauses, intervals, CTEs, aggregates, and complex comparisons.

coderabbitai · 2025-08-27T07:13:43Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds SQL keyword detection to avoid quoting reserved words/data types during identifier quoting and updates token-guard logic in is_ident. Extends tests with many cases covering date/time, timezone, window functions, CTEs, dotted identifiers, literals, and formatting to validate add_quotes behavior. No public API changes.

Changes

Cohort / File(s)	Summary
Core engine keyword-guarded quoting `wren-ai-service/src/core/engine.py`	Added `is_sql_keyword(text)` with a comprehensive uppercase keyword set; changed `is_ident` to guard token types and skip quoting when token text is a SQL keyword; retained existing quoting edits, error handling, and right-to-left application.
Expanded add_quotes test coverage `wren-ai-service/tests/pytest/test_engine.py`	Added many tests under `TestAddQuotes` exercising date/time functions, time/timestamp literals, timezone conversions, interval math, extract/format functions, window/over clauses, CTEs, dotted identifiers, and assertions that keywords/functions remain unquoted while identifiers are quoted.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Engine as Engine.add_quotes
    participant Tokenizer
    participant Ident as is_ident
    participant KW as is_sql_keyword

    Client->>Engine: submit SQL string
    Engine->>Tokenizer: tokenize(SQL)
    loop for each token (right-to-left edits applied later)
        Engine->>Ident: inspect token and type
        Ident->>KW: check token_text against keyword set
        alt token_text is keyword
            KW-->>Ident: true
            Ident-->>Engine: mark as not-to-quote
        else token_text not keyword
            KW-->>Ident: false
            Ident-->>Engine: if token_type in {VAR,SCHEMA,TABLE,COLUMN,DATABASE,INDEX,VIEW} -> eligible
            Engine->>Engine: queue quote edit for token
        end
    end
    Engine->>Engine: apply queued edits right-to-left
    Engine-->>Client: return modified SQL and error (if any)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore(wren-ai-service): fix add quotes #1917 — Modifies the same engine.py logic (keyword detection and skipping) and adds tests for add_quotes; likely directly related.
chore(wren-ai-service): improve add quotes #1913 — Also updates add_quotes/identifier handling in src/core/engine.py; related refactor or complementary change.

Suggested reviewers

yichieh-lu

Poem

(\/) A hop, a sniff, I scan each word,
( ••) I keep NOW bare and give names a gird.
"user"."id" gets a cozy quote,
Timezones safe, no syntax smote.
🥕 — Rabbit done, I cheer and float.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 184a639 and 2c51598.

📒 Files selected for processing (1)

wren-ai-service/tests/pytest/test_engine.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/ai-service/fix-add-quotes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

wren-ai-service/src/core/engine.py (1)
73-188: Consider maintenance impact of hardcoded keyword list

While the keyword detection approach is sound, maintaining a hardcoded list of SQL keywords may become challenging over time. Consider these potential improvements:

The list may need updates as SQL dialects evolve

Some entries like "CTE" (line 111) are not actually SQL keywords

"WITH" appears twice (lines 109 and 133)

Consider externalizing this list to a configuration file or using a SQL parsing library that already maintains keyword lists:
-        def is_sql_keyword(text: str) -> bool:
-            """Check if the text is a SQL keyword that should not be quoted."""
-            # Common SQL keywords that should never be quoted
-            sql_keywords = {
-                # ... (lines 77-187)
-            }
-            return text.upper() in sql_keywords
+        def is_sql_keyword(text: str) -> bool:
+            """Check if the text is a SQL keyword that should not be quoted."""
+            # Consider loading from a config file or using sqlparse.keywords
+            from sqlparse import keywords
+            return keywords.is_keyword(text.upper())
Alternatively, store the keywords in a separate module or JSON file for easier maintenance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 878e803 and 184a639.

📒 Files selected for processing (2)

wren-ai-service/src/core/engine.py (1 hunks)
wren-ai-service/tests/pytest/test_engine.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

wren-ai-service/tests/pytest/test_engine.py (1)

wren-ai-service/src/core/engine.py (1)

add_quotes (67-285)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: pytest
GitHub Check: pytest
GitHub Check: Analyze (go)

🔇 Additional comments (6)

wren-ai-service/src/core/engine.py (2)

192-208: Guard clause pattern improves readability

Good refactoring of the is_ident function using early returns. The guard clause pattern makes the logic clearer and easier to follow.

204-206: Token boundary assumption confirmed

I’ve verified with SQLGlot v27.8.0 that slicing the SQL string using sql[tok.start : tok.end + 1] correctly reproduces each tok.text for a variety of queries, confirming that tok.end is inclusive. No changes are needed here—this extraction is safe as written.

wren-ai-service/tests/pytest/test_engine.py (4)

143-158: Comprehensive date function test coverage

Excellent test coverage for date/time functions. The test properly validates that function names remain unquoted while their arguments are quoted appropriately.

159-427: Excellent comprehensive test coverage for timezone operations

The extensive test coverage for timezone-related SQL operations is thorough and well-structured. Tests cover:

Time literals and formatting

Timezone conversions and offsets

Interval arithmetic

Window functions with temporal ordering

CTEs with timezone operations

This provides confidence that the keyword detection won't incorrectly quote SQL temporal functions.

11-142: Well-structured test organization

The existing tests provide good coverage of basic SQL quoting scenarios including:

Simple identifiers

Dotted references

Already quoted identifiers

Wildcard patterns

Function calls

Complex queries with joins

The test structure is clear and follows good naming conventions.

446-446: Ignore the CTE quoting assertion comment
The add_quotes function intentionally wraps all identifiers—including CTE names—in double quotes. As a result, asserting that '"timezone_adjusted" AS' appears in the output is correct and should remain unchanged.

Likely an incorrect or invalid review comment.

wren-ai-service/tests/pytest/test_engine.py

update

184a639

cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Aug 27, 2025

github-actions bot added the wren-ai-service label Aug 27, 2025

coderabbitai bot reviewed Aug 27, 2025

View reviewed changes

wren-ai-service/tests/pytest/test_engine.py Outdated Show resolved Hide resolved

wren-ai-service/tests/pytest/test_engine.py Outdated Show resolved Hide resolved

yichieh-lu approved these changes Aug 27, 2025

View reviewed changes

fix tests

2c51598

cyyeh merged commit 44bbb75 into main Aug 27, 2025
9 of 10 checks passed

cyyeh deleted the chore/ai-service/fix-add-quotes branch August 27, 2025 07:39

coderabbitai bot mentioned this pull request Aug 27, 2025

chore(wren-ai-service): fix add quotes #1919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(wren-ai-service): fix add quotes #1917

chore(wren-ai-service): fix add quotes #1917

Uh oh!

cyyeh commented Aug 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 27, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(wren-ai-service): fix add quotes #1917

chore(wren-ai-service): fix add quotes #1917

Uh oh!

Conversation

cyyeh commented Aug 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cyyeh commented Aug 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 27, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)