PostgreSQL Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsDatabasesPostgreSQL Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 2nd, 2013, 04:27 AM
applefarm applefarm is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 6 applefarm User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 23 sec
Reputation Power: 0
Change LATIN1 data to UTF8 data in UTF-8 datbase

I have database with Unicode characterset (ENCODING = 'UTF8').

1. Till today data has been entered to tables in encoding "ISO-8859-1" using Java application.
That means that string "äš" was saved down as bytes/codes used in characterset "ISO-8859-1". The letter "ä" is saved down with code 228 in database, if i examine the value with function "ascii", and letter "š" is saved down with code 154, those codes are saved down:

Latin1 "ä":
228

Latin1 "š":
154

2. Now i changed Java application to use UTF-8 and entered with encoding "UTF-8" new data to tables.
That means that string "äš" is saved down as bytes/codes used in characterset "UTF-8". The letter "ä" is saved down with code 228 in database (same code as in LATIN1 case), if i examine the value with function "ascii", and letter "š" is saved down with code 353 (DIFFERENT code from LATIN1 case), those codes are saved down:

Unicode "ä":
228

Unicode "š":
353

---

As you see in step 1 i saved to database LATIN1 encoded data and with step 2 i started to save down UTF-8 encoded data.
As you see LATIN1 encoded data is mostly the same as UTF-8 encoded data, for example character "ä" is saved down in both encodings as code/byte "228", but only minor special characters like "š" differ.
The character "š" in LATIN1 case was saved to database as code 154, but UTF-8 case saves down code 353 instead.
I need all existing data to be converted to UTF-8. That means that data entered in step 1 with LATIN1 encoding needs to be converted to UTF-8. The LATIN1 "š" (154) should be replaced/updated to UTF-8 "š" (353).

How can i make such conversion?

Note that database encoding stays UTF-8. Only i need now to convert existing data somehow from LATIN1 to UTF-8, maybe with UPDATE clause.

any ideas?

Reply With Quote
  #2  
Old January 2nd, 2013, 05:48 AM
applefarm applefarm is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 6 applefarm User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 23 sec
Reputation Power: 0
I think i can try to define question shorter.
In UTF-8 database i inserted a string that has LATIN1 encoding. Now i want to change the string encoding to UTF-8. How can i do that? I don't need to change database encoding, the string that was inserted need to be changed there.

Reply With Quote
  #3  
Old January 3rd, 2013, 03:22 AM
applefarm applefarm is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 6 applefarm User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 23 sec
Reputation Power: 0
I got solution, idea is in below query. Topic can be closed.

Quote:
select id,--572, 574
name, --"1äš", "3äš"
char_length(name)--3
, convert_to( name, 'UTF-8')--"1\303\244\302\232", "3\303\244\305\241"
, encode(convert_to( name, 'UTF-8'), 'escape') As CurrentValue--"1\303\244\302\232", "3\303\244\305\241"
, replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' )--"1\303\244\305\241"
, replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' )
, replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' )
, replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275')
, decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' )--"1\303\244\305\241", "1\303\244\305\241"
, convert_from (decode( replace( replace (replace ( replace( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232', '\305\241' ), '\302\212', '\305\240' ), '\302\236', '\305\276' ), '\302\216', '\305\275'), 'escape' ), 'UTF-8') As Latin1__to_Utf8--"1äš", "3äš"
, strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) --10, 0
from fb_operator
where --id in (572, 574, 575) and
strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\232' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\212' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\236' ) > 0
or strpos ( encode(convert_to( name, 'UTF-8'), 'escape'), '\302\216' ) > 0

Reply With Quote
Reply

Viewing: Dev Shed ForumsDatabasesPostgreSQL Help > Change LATIN1 data to UTF8 data in UTF-8 datbase

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap