#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2005
    Posts
    4
    Rep Power
    0

    how to write e.g.""(Umlaut) into a file?


    Hi,
    I have made a small tool, which is checking RSSfeed. I have a german RSS server,too. If it sends e.g. a title containing "" i get:

    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 47: ordinal not in range(128)
    I think I knew where the problem is, but i do not know how to solve it.
    I write title in a streaam i made:

    stream='<XML....><title>'+str(title)+'grg'
    i think the str(title) is the problem. but what can i do to solve it?


    If I use repr() instead of str() it works, but if i watch the file with kwrite in any type of set coding sheme I do not get this sign "". How can i solve this?

    Thanks,


    Patrick
  2. #2
  3. Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Dec 2004
    Location
    Meriden, Connecticut
    Posts
    1,797
    Rep Power
    154
    Hmm. Can you show us more code on what you are doing so that we can get a general idea of what is causing the problem and where it can be fixed?

    Also, make sure you show us the value of title.
    Last edited by Yegg; August 24th, 2005 at 12:02 PM.
  4. #3
  5. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2005
    Posts
    610
    Rep Power
    65

    Smile


    Try to use unicode() rather than str()
    also check under BOM in your help file.
  6. #4
  7. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Hi Patrick,
    I think you will find this link useful
    Python Unicode Howto
    In particular the reading/writing data section.

    Have fun,
    grim
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2005
    Posts
    4
    Rep Power
    0

    thx


    Hello,

    thanks for the help. I think the unicode() helped me.

    But I have another problem, which I can not solve:

    I have a dict like{"1101010": u'/xfe'....}
    Do not ask why, but I need the dict :-)
    Then I create a text like:
    text =u''
    for ...
    tmp =dict(...)
    text = text+tmp # i should get a unicode string...
    final=text.encode('utf-8')
    print(final)

    This does not work. In case of RSS, it works perfect. If i open pythen and write manual things like that it works,too. e.g. e=u'\xfe'
    e=e+u'hello'
    ...
    text=e.encode('utf-8')
    print text

    no problem. Does anybody know what is going wrong?


    Best Regards,


    Patrick
  10. #6
  11. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Your psuedo code does not make the problem clear, what actually fails? Are you saying that dictionaries don't hold unicode data - this would be a big surprise?
    Code:
    >>> a = {"fred":u'\xfe'}
    >>> a
    {'fred': u'\xfe'}
    >>> a['fred']
    u'\xfe'
    >>> b = u'abc'
    >>> c = a['fred']+b
    >>> c
    u'\xfeabc'
    >>> d = c.encode('utf-8')
    >>> d
    '\xc3\xbeabc'
    seems to work as expected.

    grim
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2005
    Posts
    4
    Rep Power
    0
    Hi,

    if I use the script i get the problem. Otherwise it works. I did try it out the same way like you. But I thing it has something to do with this:

    If i put this line in the script:

    # -*- coding: UTF-8 -*-

    import Test #module includes implementation of test decoding
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 0: unexpected code byte


    if not:
    DeprecationWarning: Non-ASCII character '\xa6' in file /home/Test.py on line 42, but no encoding declared;

    This makes sense. If I do not select an encoding, the decode might get wrong, but if i use the correct one, i get an error message. But i do not know why?


    Patrick

IMN logo majestic logo threadwatch logo seochat tools logo