#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    8
    Rep Power
    0

    UTF8 issue with embedded SQL, Firebird (GPRE, C++)


    Hello world,
    I have a strange problem with embedded SQL for Firebird (GPRE, C++).
    It is the first time I see anything similar and I was quite surprised to meet SQL directly in a C++ file.

    The program works fine, and worked fine untill it didn't meet Chinese language; then we had 2 strange situation in one column that contains names:
    (1) if the name is in Chinese but the last letter is ASCII, then OK
    Code:
    '中文中1'
    (2) else if the name is in Chinese but the last letter is NOT ASCII (ex: '中文中文') often, but not always, the last letter is damaged
    Code:
    '中文中▒'
    The database is in UNICODE_FSS, and if I insert the data by using a script where I saved the queries, everything works fine also in case (2). The problem is when the program is used.

    I would like to ask if you know some way to debug the embedded SQL; if the query is a normal string I can print it or log into a file, but I have no idea how to do that with the embedded strings.

    Additional info:
    When I have rows with damaged letters as I described in (2), when I try to select the damaged rows by searching them by name, even using:
    Code:
    SELECT name, id FROM my_tab WHERE name LIKE '%'
    I get all correct lines untill the first damaged tuple is found. Then the output is interrupted and I read the following:

    Code:
    Statement failed, SQLCODE = -802
    arithmetic exception, numeric overflow, or string truncation
    -Cannot transliterate character between character sets
    So if somebody could give me an advice to know how to print the embedded query so I can check if it's wrong, it would help me a lot.
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    8
    Rep Power
    0
    Nobody replied, so I reply by myself for others who may need.

    I didn't need to print the sql (so if somebody knows how to do, [s]he can still write the solution) because the problem was before that.

    The problem was a C/C++ function that considered the ending byte as space and removed it; the bug is in isspace() that behaves differently on different platforms, so i had the problem in some place but not in another one. To solve, added a checking on the byte to be evalued as space, in order to verify the first bit: if it's 1 we don't care what the stupid isspace says, it's not a space and we don't remove it (see UTF-8 table for details about the reason).

    In the past the first bit was the parity bit, later the 1 meant extended ascii, today in UTF8: 1 means concatenated byte, 0 means single-byte (ASCII) char.

IMN logo majestic logo threadwatch logo seochat tools logo