#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    4
    Rep Power
    0

    Chinese and Japanese character support in python


    I'm reading text from textbox which has path "E\7.4\日本国" and writing it to text file(.txt). Problem is that it doesn't write japanese character in text file instead it will write? question marking in place japanese character. i tried writing it with utf-8 from python but it doesn't work. i used Codecs, unicode but not getting it.

    After after writing in textfile i'm reading same path to use in my code.
    i'm using ranorexpython and python to test code written in c#(GUI).

    Here is the code
    #Writing path in pathfile.txt file
    writePath = Path1 + "\n" + Path2
    tempfile = open("C:\\Temp\\pathfile.txt", "w") tempfile.writelines(writePath)
    tempfile.close()
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    431
    Rep Power
    67
    Originally Posted by dx_generation25
    I'm reading text from textbox which has path "E\7.4\日本国" and writing it to text file(.txt). Problem is that it doesn't write japanese character in text file instead it will write? question marking in place japanese character.
    Do you use Python 2 or Python 3? Handling character encodings can be extremely tricky in Python 2 even to the point that nothing seems to work.

    Still, I tried your code snippet in Python 2 and it worked when the path was given as a literal. I guess the problem is that what comes out of the textbox is encoded in a legacy encoding, and you should use strings’ .decode() and .encode() methods to get it to UTF-8.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (mini laptop, work laptop), Android 4.2.1 (tablet)
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by SuperOscar
    Do you use Python 2 or Python 3? Handling character encodings can be extremely tricky in Python 2 even to the point that nothing seems to work.

    Still, I tried your code snippet in Python 2 and it worked when the path was given as a literal. I guess the problem is that what comes out of the textbox is encoded in a legacy encoding, and you should use strings’ .decode() and .encode() methods to get it to UTF-8.
    i use python 2.5. Problem i'm facing is that if i give
    tempfile = open("C:\\Temp\\pathfile.txt", "w", "utf-8")
    it does not work.
    I checked textfiel after writing japanese peth in text file, it is displayed as question mark.
    i have tried with encode and decode but it doesn't work out.
    update me if you have some solution for it.

    e.g.
    >>> path = r"E:\7.4\は最高のプログラマ"
    >>> t = path.encode()
    >>> print t
    E:\7.4\?????????
    >>> t = path.decode()
    >>> print t
    E:\7.4\?????????
    >>> t = path.encode("utf-8")
    >>> print t
    E:\7.4\?????????
    >>> t = path.decode("utf-8")
    >>> print t
    E:\7.4\?????????
    >>>
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    Maybe your program is correct but the program you use to view the information won't display utf8?
    [code]Code tags[/code] are essential for python code and Makefiles!
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Maybe your program is correct but the program you use to view the information won't display utf8?
    it is not happening with utf-8. If i write manually any japanese or chinese character in notepad and save it as UTF-8 Encoding, then it will not lose data.
    But same thing is not happening with python.
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2007
    Location
    Joensuu, Finland
    Posts
    431
    Rep Power
    67
    I wish I knew something about Chinese and Japanese encodings (other than Unicode, that is)...

    My own problem has usually been that GUIs written in Python 2 assume, say, Latin-1, and I want to write the output files in UTF-8. The solution is to:
    Code:
    s = s.decode('Latin-1').encode('UTF-8')
    (where “s” is a string obtained from a textbox in a GUI). I think that in your case you would just replace Latin-1 with the correct coding.
    My armada: openSUSE 13.1 (home desktop, home laptop), Crunchbang Linux 11 (mini laptop, work laptop), Android 4.2.1 (tablet)
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Posts
    4
    Rep Power
    0
    Originally Posted by SuperOscar
    I wish I knew something about Chinese and Japanese encodings (other than Unicode, that is)...

    My own problem has usually been that GUIs written in Python 2 assume, say, Latin-1, and I want to write the output files in UTF-8. The solution is to:
    Code:
    s = s.decode('Latin-1').encode('UTF-8')
    (where “s” is a string obtained from a textbox in a GUI). I think that in your case you would just replace Latin-1 with the correct coding.
    ok.. will try out...

IMN logo majestic logo threadwatch logo seochat tools logo