September 4th, 2013, 07:01 AM
UTF8 issue with embedded SQL, Firebird (GPRE, C++)
I have a strange problem with embedded SQL for Firebird (GPRE, C++).
It is the first time I see anything similar and I was quite surprised to meet SQL directly in a C++ file.
The program works fine, and worked fine untill it didn't meet Chinese language; then we had 2 strange situation in one column that contains names:
(1) if the name is in Chinese but the last letter is ASCII, then OK
(2) else if the name is in Chinese but the last letter is NOT ASCII (ex: '中文中文') often, but not always, the last letter is damaged
The database is in UNICODE_FSS, and if I insert the data by using a script where I saved the queries, everything works fine also in case (2). The problem is when the program is used.
I would like to ask if you know some way to debug the embedded SQL; if the query is a normal string I can print it or log into a file, but I have no idea how to do that with the embedded strings.
When I have rows with damaged letters as I described in (2), when I try to select the damaged rows by searching them by name, even using:
I get all correct lines untill the first damaged tuple is found. Then the output is interrupted and I read the following:
SELECT name, id FROM my_tab WHERE name LIKE '%'
So if somebody could give me an advice to know how to print the embedded query so I can check if it's wrong, it would help me a lot.
Statement failed, SQLCODE = -802
arithmetic exception, numeric overflow, or string truncation
-Cannot transliterate character between character sets
September 25th, 2013, 02:37 AM
Nobody replied, so I reply by myself for others who may need.
I didn't need to print the sql (so if somebody knows how to do, [s]he can still write the solution) because the problem was before that.
The problem was a C/C++ function that considered the ending byte as space and removed it; the bug is in isspace() that behaves differently on different platforms, so i had the problem in some place but not in another one. To solve, added a checking on the byte to be evalued as space, in order to verify the first bit: if it's 1 we don't care what the stupid isspace says, it's not a space and we don't remove it (see UTF-8 table for details about the reason).
In the past the first bit was the parity bit, later the 1 meant extended ascii, today in UTF8: 1 means concatenated byte, 0 means single-byte (ASCII) char.