#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    6
    Rep Power
    0

    Change LATIN1 data to UTF8 data in UTF-8 datbase


    I have database with Unicode characterset (ENCODING = 'UTF8').

    1. Till today data has been entered to tables in encoding "ISO-8859-1" using Java application.
    That means that string "" was saved down as bytes/codes used in characterset "ISO-8859-1". The letter "" is saved down with code 228 in database, if i examine the value with function "ascii", and letter "" is saved down with code 154, those codes are saved down:

    Latin1 "":
    228

    Latin1 "":
    154

    2. Now i changed Java application to use UTF-8 and entered with encoding "UTF-8" new data to tables.
    That means that string "" is saved down as bytes/codes used in characterset "UTF-8". The letter "" is saved down with code 228 in database (same code as in LATIN1 case), if i examine the value with function "ascii", and letter "" is saved down with code 353 (DIFFERENT code from LATIN1 case), those codes are saved down:

    Unicode "":
    228

    Unicode "":
    353

    ---

    As you see in step 1 i saved to database LATIN1 encoded data and with step 2 i started to save down UTF-8 encoded data.
    As you see LATIN1 encoded data is mostly the same as UTF-8 encoded data, for example character "" is saved down in both encodings as code/byte "228", but only minor special characters like "" differ.
    The character "" in LATIN1 case was saved to database as code 154, but UTF-8 case saves down code 353 instead.
    I need all existing data to be converted to UTF-8. That means that data entered in step 1 with LATIN1 encoding needs to be converted to UTF-8. The LATIN1 "" (154) should be replaced/updated to UTF-8 "" (353).

    How can i make such conversion?

    Note that database encoding stays UTF-8. Only i need now to convert existing data somehow from LATIN1 to UTF-8, maybe with UPDATE clause.

    any ideas?
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    6
    Rep Power
    0
    I think i can try to define question shorter.
    In UTF-8 database i inserted a string that has LATIN1 encoding. Now i want to change the string encoding to UTF-8. How can i do that? I don't need to change database encoding, the string that was inserted need to be changed there.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    6
    Rep Power
    0
    I got solution, idea is in below query. Topic can be closed.

    select id,--572, 574
    name, --"1š", "3"
    char_length(name)--3
    , convert_to( name, 'UTF-8')--"1\303\244\302\232", "3\303\244\305\241"
    , encode(convert_to( name, 'UTF-8'), 'escape') As CurrentValue--"1\303\244\302\232", "3\303\244\305\241"
    , replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' )--"1\303\244\305\241"
    , replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' )
    , replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' )
    , replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275')
    , decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' )--"1\303\244\305\241", "1\303\244\305\241"
    , convert_from (decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' ), 'UTF-8') As Latin1__to_Utf8--"1", "3"
    , strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) --10, 0
    from fb_operator
    where --id in (572, 574, 575) and
    strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) > 0
    or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\212' ) > 0
    or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\236' ) > 0
    or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\216' ) > 0

IMN logo majestic logo threadwatch logo seochat tools logo