forked from babelfish-for-postgresql/babelfish_extensions
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix query rewriting unproperly with multibyte characters
Description =========== Babelfish doesn't rewrite query with multibyte characters properly. Analysis ========= Babelfish preprocess the query string and remove unsupported syntax before sending the query to PG backend. The implementation didn’t consider multibyte unicode characters, so when unicode characters are used ahead of the unsupported syntax, Babelfish will emit a broken query. In specific, character offset is used instead of byte offset during the character replacement. For example: Input T-SQL : select "你好世界" from tbl with(nolock); Executed SQL: select "你好世界" f (nolock); Solution ======== Consolidate all rewriting behaviors to PLtsql_expr_query_mutator. It’s more maintainable because there will only be one interface for query rewriting. We support Chinese unicode charset as identifier in this patch.
- Loading branch information
Showing
4 changed files
with
168 additions
and
100 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
drop table if exists unicode_test; | ||
go | ||
create table unicode_test(col nvarchar(255), 中文列名 nvarchar(255)); | ||
go | ||
insert into unicode_test values('Hello', '你好'); | ||
go | ||
~~ROW COUNT: 1~~ | ||
|
||
insert into unicode_test values('World', '世界'); | ||
go | ||
~~ROW COUNT: 1~~ | ||
|
||
|
||
/* multibyte characters as identifier */ | ||
select col 别名 from unicode_test; | ||
go | ||
~~START~~ | ||
nvarchar | ||
Hello | ||
World | ||
~~END~~ | ||
|
||
select 别名=col from unicode_test; | ||
go | ||
~~START~~ | ||
nvarchar | ||
Hello | ||
World | ||
~~END~~ | ||
|
||
|
||
/* multibyte characters with unsupported token */ | ||
select "你好世界" from unicode_test with(nolock); | ||
go | ||
~~ERROR (Code: 33557097)~~ | ||
|
||
~~ERROR (Message: column "你好世界" does not exist)~~ | ||
|
||
select 中文列名 from unicode_test with(nolock); | ||
go | ||
~~START~~ | ||
nvarchar | ||
你好 | ||
世界 | ||
~~END~~ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
drop table if exists unicode_test; | ||
go | ||
create table unicode_test(col nvarchar(255), 中文列名 nvarchar(255)); | ||
go | ||
insert into unicode_test values('Hello', '你好'); | ||
go | ||
insert into unicode_test values('World', '世界'); | ||
go | ||
|
||
/* multibyte characters as identifier */ | ||
select col 别名 from unicode_test; | ||
go | ||
select 别名=col from unicode_test; | ||
go | ||
|
||
/* multibyte characters with unsupported token */ | ||
select "你好世界" from unicode_test with(nolock); | ||
go | ||
select 中文列名 from unicode_test with(nolock); | ||
go |