Skip to content

Conversation

@cyyeh
Copy link
Member

@cyyeh cyyeh commented Aug 27, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved SQL identifier quoting so keywords and data-types are not quoted; functions, literals and format strings remain unquoted while valid identifiers (including dotted references) are correctly quoted across date/time, timezone, window functions, casts, intervals, and CTEs.
  • Tests

    • Added extensive tests validating quoting behavior across date/time and timestamp functions, timezones, window clauses, intervals, CTEs, aggregates, and complex comparisons.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 27, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds SQL keyword detection to avoid quoting reserved words/data types during identifier quoting and updates token-guard logic in is_ident. Extends tests with many cases covering date/time, timezone, window functions, CTEs, dotted identifiers, literals, and formatting to validate add_quotes behavior. No public API changes.

Changes

Cohort / File(s) Summary
Core engine keyword-guarded quoting
wren-ai-service/src/core/engine.py
Added is_sql_keyword(text) with a comprehensive uppercase keyword set; changed is_ident to guard token types and skip quoting when token text is a SQL keyword; retained existing quoting edits, error handling, and right-to-left application.
Expanded add_quotes test coverage
wren-ai-service/tests/pytest/test_engine.py
Added many tests under TestAddQuotes exercising date/time functions, time/timestamp literals, timezone conversions, interval math, extract/format functions, window/over clauses, CTEs, dotted identifiers, and assertions that keywords/functions remain unquoted while identifiers are quoted.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Engine as Engine.add_quotes
    participant Tokenizer
    participant Ident as is_ident
    participant KW as is_sql_keyword

    Client->>Engine: submit SQL string
    Engine->>Tokenizer: tokenize(SQL)
    loop for each token (right-to-left edits applied later)
        Engine->>Ident: inspect token and type
        Ident->>KW: check token_text against keyword set
        alt token_text is keyword
            KW-->>Ident: true
            Ident-->>Engine: mark as not-to-quote
        else token_text not keyword
            KW-->>Ident: false
            Ident-->>Engine: if token_type in {VAR,SCHEMA,TABLE,COLUMN,DATABASE,INDEX,VIEW} -> eligible
            Engine->>Engine: queue quote edit for token
        end
    end
    Engine->>Engine: apply queued edits right-to-left
    Engine-->>Client: return modified SQL and error (if any)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • yichieh-lu

Poem

(\/) A hop, a sniff, I scan each word,
( •
•) I keep NOW bare and give names a gird.
"user"."id" gets a cozy quote,
Timezones safe, no syntax smote.
🥕 — Rabbit done, I cheer and float.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 184a639 and 2c51598.

📒 Files selected for processing (1)
  • wren-ai-service/tests/pytest/test_engine.py (1 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/ai-service/fix-add-quotes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Aug 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
wren-ai-service/src/core/engine.py (1)

73-188: Consider maintenance impact of hardcoded keyword list

While the keyword detection approach is sound, maintaining a hardcoded list of SQL keywords may become challenging over time. Consider these potential improvements:

  1. The list may need updates as SQL dialects evolve
  2. Some entries like "CTE" (line 111) are not actually SQL keywords
  3. "WITH" appears twice (lines 109 and 133)

Consider externalizing this list to a configuration file or using a SQL parsing library that already maintains keyword lists:

-        def is_sql_keyword(text: str) -> bool:
-            """Check if the text is a SQL keyword that should not be quoted."""
-            # Common SQL keywords that should never be quoted
-            sql_keywords = {
-                # ... (lines 77-187)
-            }
-            return text.upper() in sql_keywords
+        def is_sql_keyword(text: str) -> bool:
+            """Check if the text is a SQL keyword that should not be quoted."""
+            # Consider loading from a config file or using sqlparse.keywords
+            from sqlparse import keywords
+            return keywords.is_keyword(text.upper())

Alternatively, store the keywords in a separate module or JSON file for easier maintenance.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 878e803 and 184a639.

📒 Files selected for processing (2)
  • wren-ai-service/src/core/engine.py (1 hunks)
  • wren-ai-service/tests/pytest/test_engine.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
wren-ai-service/tests/pytest/test_engine.py (1)
wren-ai-service/src/core/engine.py (1)
  • add_quotes (67-285)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/src/core/engine.py (2)

192-208: Guard clause pattern improves readability

Good refactoring of the is_ident function using early returns. The guard clause pattern makes the logic clearer and easier to follow.


204-206: Token boundary assumption confirmed

I’ve verified with SQLGlot v27.8.0 that slicing the SQL string using sql[tok.start : tok.end + 1] correctly reproduces each tok.text for a variety of queries, confirming that tok.end is inclusive. No changes are needed here—this extraction is safe as written.

wren-ai-service/tests/pytest/test_engine.py (4)

143-158: Comprehensive date function test coverage

Excellent test coverage for date/time functions. The test properly validates that function names remain unquoted while their arguments are quoted appropriately.


159-427: Excellent comprehensive test coverage for timezone operations

The extensive test coverage for timezone-related SQL operations is thorough and well-structured. Tests cover:

  • Time literals and formatting
  • Timezone conversions and offsets
  • Interval arithmetic
  • Window functions with temporal ordering
  • CTEs with timezone operations

This provides confidence that the keyword detection won't incorrectly quote SQL temporal functions.


11-142: Well-structured test organization

The existing tests provide good coverage of basic SQL quoting scenarios including:

  • Simple identifiers
  • Dotted references
  • Already quoted identifiers
  • Wildcard patterns
  • Function calls
  • Complex queries with joins

The test structure is clear and follows good naming conventions.


446-446: Ignore the CTE quoting assertion comment
The add_quotes function intentionally wraps all identifiers—including CTE names—in double quotes. As a result, asserting that '"timezone_adjusted" AS' appears in the output is correct and should remain unchanged.

Likely an incorrect or invalid review comment.

@cyyeh cyyeh merged commit 44bbb75 into main Aug 27, 2025
9 of 10 checks passed
@cyyeh cyyeh deleted the chore/ai-service/fix-add-quotes branch August 27, 2025 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/ai-service ai-service related module/ai-service ai-service related wren-ai-service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants