Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old November 16th, 2012, 07:33 AM
Thx_for_sharing Thx_for_sharing is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 4 Thx_for_sharing User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 30 m 33 sec
Reputation Power: 0
Reading Unicode data from server and writing to a file

Hello,
I am fetching some strings from a server which have one utf-8 character. What I need to do is split on that UTF character and store the parts separately at 2 different places.
For example: 詳細2.3

The '' in the above string is nothing but U+F8FF unicode character (UTF8: EF A3 BF)

I need to split the string on unicode character.
Here is my code:
---------------------------------------
text = open(r"C:\tempchar.txt").read()

newpart = text.decode('utf-8').split(u"\uf8ff")
firstpart = newpart[::2] #some manipulation on this later
secondpart = newpart[1::2] #some manipulation on this later

--------------------------------------
When I try this on a sample string from the text file as done in the code above, it works fine. And I can print the text on cmd prompt.

But, when I do the same thing with the input string from a server, I get the following error:
"UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

I have never seen this error before and have no idea what could be causing this. PLEASE HELP!

Reply With Quote
  #2  
Old November 16th, 2012, 07:56 AM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,350 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 7 h 38 m 45 sec
Reputation Power: 383
I would think that if you read information into a_variable from the server and showed its type

print(type(a_variable))

the answer would differ from the type

print(type(open(r"C:\tempchar.txt").read()))

Once you knew this you'd have the clues you need to find the solution.
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #3  
Old November 16th, 2012, 08:05 AM
Thx_for_sharing Thx_for_sharing is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 4 Thx_for_sharing User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 30 m 33 sec
Reputation Power: 0
Thanks for your reply.
It shows <type 'unicode'> for all strings that I read from the server.

Reply With Quote
  #4  
Old November 16th, 2012, 08:27 AM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,350 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 7 h 38 m 45 sec
Reputation Power: 383
Great, you performed half of the experiment. The half I couldn't.

You're using python2. We know this because you have a unicode string that's invalid in python3.

The data type resulting from read in your test is str.

>>> print(type(open('/tmp/c.c').read()))
<type 'str'>

Got it? Your server gives unicode, your console example uses str. This python2 statement works as I'd expect:

>>> u'詳細2.3'.split(u'')
[u'\u8a73\u7d30', u'2.3']


Note to vision impaired yet gentle readers: The bit that appears as an empty string contains a narrow space character:

>>> ord(u'')
63743

Last edited by b49P23TIvg : November 16th, 2012 at 08:29 AM.

Reply With Quote
  #5  
Old November 16th, 2012, 08:41 AM
Thx_for_sharing Thx_for_sharing is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 4 Thx_for_sharing User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 30 m 33 sec
Reputation Power: 0
I see. However, the test with the input file works just perfect.

The error message appears only when I read the strings from the server in my code.
The error message says ""UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)"

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Reading Unicode data from server and writing to a file

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap