The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Databases
> PostgreSQL Help
|
Change LATIN1 data to UTF8 data in UTF-8 datbase
Discuss Change LATIN1 data to UTF8 data in UTF-8 datbase in the PostgreSQL Help forum on Dev Shed. Change LATIN1 data to UTF8 data in UTF-8 datbase PostgreSQL Help forum discussing administration, SQL syntax, or other PostgreSQL-related topics. PostgreSQL provides enterprise level database functionality at open source prices.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

January 2nd, 2013, 04:27 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 6
Time spent in forums: 28 m 23 sec
Reputation Power: 0
|
|
|
Change LATIN1 data to UTF8 data in UTF-8 datbase
I have database with Unicode characterset (ENCODING = 'UTF8').
1. Till today data has been entered to tables in encoding "ISO-8859-1" using Java application.
That means that string "äš" was saved down as bytes/codes used in characterset "ISO-8859-1". The letter "ä" is saved down with code 228 in database, if i examine the value with function "ascii", and letter "š" is saved down with code 154, those codes are saved down:
Latin1 "ä":
228
Latin1 "š":
154
2. Now i changed Java application to use UTF-8 and entered with encoding "UTF-8" new data to tables.
That means that string "äš" is saved down as bytes/codes used in characterset "UTF-8". The letter "ä" is saved down with code 228 in database (same code as in LATIN1 case), if i examine the value with function "ascii", and letter "š" is saved down with code 353 (DIFFERENT code from LATIN1 case), those codes are saved down:
Unicode "ä":
228
Unicode "š":
353
---
As you see in step 1 i saved to database LATIN1 encoded data and with step 2 i started to save down UTF-8 encoded data.
As you see LATIN1 encoded data is mostly the same as UTF-8 encoded data, for example character "ä" is saved down in both encodings as code/byte "228", but only minor special characters like "š" differ.
The character "š" in LATIN1 case was saved to database as code 154, but UTF-8 case saves down code 353 instead.
I need all existing data to be converted to UTF-8. That means that data entered in step 1 with LATIN1 encoding needs to be converted to UTF-8. The LATIN1 "š" (154) should be replaced/updated to UTF-8 "š" (353).
How can i make such conversion?
Note that database encoding stays UTF-8. Only i need now to convert existing data somehow from LATIN1 to UTF-8, maybe with UPDATE clause.
any ideas?
|

January 2nd, 2013, 05:48 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 6
Time spent in forums: 28 m 23 sec
Reputation Power: 0
|
|
|
I think i can try to define question shorter.
In UTF-8 database i inserted a string that has LATIN1 encoding. Now i want to change the string encoding to UTF-8. How can i do that? I don't need to change database encoding, the string that was inserted need to be changed there.
|

January 3rd, 2013, 03:22 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 6
Time spent in forums: 28 m 23 sec
Reputation Power: 0
|
|
I got solution, idea is in below query. Topic can be closed.
Quote:
select id,--572, 574
name, --"1ä", "3äš"
char_length(name)--3
, convert_to( name, 'UTF-8')--"1\303\244\302\232", "3\303\244\305\241"
, encode(convert_to( name, 'UTF-8'), 'escape') As CurrentValue--"1\303\244\302\232", "3\303\244\305\241"
, replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' )--"1\303\244\305\241"
, replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' )
, replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' )
, replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275')
, decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' )--"1\303\244\305\241", "1\303\244\305\241"
, convert_from (decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' ), 'UTF-8') As Latin1__to_Utf8--"1äš", "3äš"
, strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) --10, 0
from fb_operator
where --id in (572, 574, 575) and
strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\212' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\236' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\216' ) > 0 |
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|