#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    29
    Rep Power
    0

    Help with python error


    This is my error message:

    File "news.py", line 52
    def cleanup_html(self, data):
    ^
    IndentationError: unindent does not match any outer indentation level

    I originally had just feedparser. But, I added another part to the program to clean it up. I would have done away with feedparser, But, I did not know how to open the file and read the post without it. I just want it to open the url, read the file, say the description and ask if I want more, if I do, give me story number two. I am ignorant on about 80% of Python. I am going further slowly. P.S. I did not see a button to press to insert code. It stripped all of my indentions. But, they were at 5 and 10 and 15

    My code is:
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    import subprocess
    import re
    import urllib
    import urllib2
    import xml.dom.minidom
    import feedparser

    rss_url = "http://feeds.reuters.com/Reuters/domesticNews"
    feed = feedparser.parse( rss_url )
    posts = []

    for i in range(0,len(feed['entries'])):
    posts.append({
    'title': feed['entries'][i].title,
    'description': feed['entries'][i].summary,
    'url': feed['entries'][i].link,
    })

    class MyClass():
    def __init__(self):
    GeneratedClass.__init__(self)
    self.lang_ref = { 'Chinese' : 'zh',
    'English' : 'en',
    'French' : 'fr',
    'German' : 'de',
    'Italian' : 'it',
    'Japanese' : 'ja',
    'Korean' : 'ko',
    'Portuguese' : 'pt',
    'Spanish' : 'es'
    }
    self.error_msg = { "Chinese" : "我发现了什么。",
    "English" : "I found nothing.",
    "French" : "Je n'ai rien trouv.",
    "German" : "Ich fand nichts.",
    "Italian" : "Ho trovato nulla.",
    "Japanese" : "何を発見した。",
    "Korean" : "아무것도 없습니다.",
    "Portuguese" : "Eu no achei nada.",
    "Spanish" : "No he encontrado nada."
    }
    try:
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    content = xml.dom.minidom.parseString(response.read())
    data = str(content.getElementsByTagName('rev')[0].firstChild.data.encode('utf-8'))

    def cleanup_html(self, data):
    data = re.sub(r'<[^<].*?/?>', '', data)
    data = data.replace("&nbsp;", " ")
    return data

    def cleanup_wiki(self, data):
    data = re.sub(r'(?i)\{\{Date[^\[\]]*?\|([^}]*)\}\}', lambda m: m.group(1), data) # remove wiki 'Date' markers
    data = re.sub(r'(?i)\{\{IPA(\-[^\|\{\}]+)*?\|([^\|\{\}]+)(\|[^\{\}]+)*?\}\}', lambda m: m.group(2), data)
    data = re.sub(r'(?i)\{\{Lang(\-[^\|\{\}]+)*?\|([^\|\{\}]+)(\|[^\{\}]+)*?\}\}', lambda m: m.group(2), data)
    data = re.sub(r'\{\{[^\{\}]+\}\}', '', data)
    data = re.sub(r'(?m)\{\{[^\{\}]+\}\}', '', data)
    data = re.sub(r'(?m)\{\|[^\{\}]*?\|\}', '', data)
    data = re.sub(r'(?i)\[\[Category:[^\[\]]*?\]\]', '', data)
    data = re.sub(r'(?i)\[\[Image:[^\[\]]*?\]\]', '', data)
    data = re.sub(r'(?i)\[\[File:[^\[\]]*?\]\]', '', data)
    data = re.sub(r'(?i)\[\[Fichier:[^\[\]]*?\]\]', '', data)
    data = re.sub(r'(?u)\[\[[^\[\]]*?:[^\[\]]*?\]\]', '', data)
    data = re.sub(r'\[\[[^\[\]]*?\|([^\[\]]*?)\]\]', lambda m: m.group(1), data)
    data = re.sub(r'\[\[([^\[\]]+?)\]\]', lambda m: m.group(1), data)
    data = re.sub(r'\[\[([^\[\]]+?)\]\]', '', data)
    data = re.sub(r'(?i)File:[^\[\]]*?', '', data)
    data = re.sub(r'\[[^\[\]]*? ([^\[\]]*?)\]', lambda m: m.group(1), data)
    data = re.sub(r"''+", '', data)
    data = re.sub(r'(?m)^\*$', '', data)
    data = re.sub(r'(?i)\{\{InfoBox[^\[\]]*?\}\}', '', data) # remove 'infobox' text block
    data = re.sub(r'(?i)<ref[^\[\]]*?</ref>', '', data) # remove 'ref' text blocks
    return data

    def cleanup_text(self, data):
    #remove unnecessary parts
    data = re.sub(r'\([^\[\]]*?\)', '', data) # remove parentheses text blocks
    data = re.sub(r'(?u)【[^[\]]*?】', '', data) # remove parentheses text blocks (japanese)
    data = re.sub(r'(?u)([^[\]]*?)', '', data) # remove parentheses text blocks (japanese)
    #replace chars
    data = data.replace("|", " ")
    data = data.replace(";", ".")
    data = data.replace(" - ", ", ")
    #remove chars
    data = data.replace('"', '') # remove quotes
    data = data.replace('', '') # remove quotes
    data = data.replace('', '') # remove quotes
    data = data.replace("#", "") # remove hashes
    data = data.replace("/", "") # remove slashes
    data = data.replace("*", "") # remove stars
    #remove extra spaces
    data = re.sub(r'\s+', ' ', data) # remove extra spaces
    #clean-up punctuation - part1
    data = data.replace(" ,", ",") # punctuation correction
    data = data.replace(" .", ".") # punctuation correction
    data = data.replace(" !", "!") # punctuation correction
    data = data.replace(" ?", "?") # punctuation correction
    data = data.replace(" :", ":") # punctuation correction
    #clean-up punctuation - part2
    data = data.replace(",.", ".") # punctuation correction
    data = data.replace("...", ".") # punctuation correction
    data = data.replace("..", ".") # punctuation correction
    data = data.replace("!.", "!") # punctuation correction
    data = data.replace("?.", "?") # punctuation correction
    data = data.replace(":.", ":") # punctuation correction
    return data

    def safe_chars(self, data, speech_lang):
    # modify or remove other TTS non-supported chars
    data = data.replace("", "Oe")
    data = data.replace("", "oe")
    data = data.replace("", "'") # non-readable single-quote

    # common filter for non latin languages
    if speech_lang in [ 'Chinese', 'English', 'Korean', 'Japanese' ]:
    data = data.replace("", "A")
    data = data.replace("", "A")
    data = data.replace("", "A")
    data = data.replace("", "A")
    data = data.replace("", "A")
    data = data.replace("", "a")
    data = data.replace("", "a")
    data = data.replace("", "a")
    data = data.replace("", "a")
    data = data.replace("", "a")
    data = data.replace("", "E")
    data = data.replace("", "E")
    data = data.replace("", "E")
    data = data.replace("", "E")
    data = data.replace("", "e")
    data = data.replace("", "e")
    data = data.replace("", "e")
    data = data.replace("", "e")
    data = data.replace("", "I")
    data = data.replace("", "I")
    data = data.replace("", "I")
    data = data.replace("", "I")
    data = data.replace("", "i")
    data = data.replace("", "i")
    data = data.replace("", "i")
    data = data.replace("", "i")
    data = data.replace("", "O")
    data = data.replace("", "O")
    data = data.replace("", "O")
    data = data.replace("", "O")
    data = data.replace("", "O")
    data = data.replace("", "o")
    data = data.replace("", "o")
    data = data.replace("", "o")
    data = data.replace("", "o")
    data = data.replace("", "o")
    data = data.replace("", "U")
    data = data.replace("", "U")
    data = data.replace("", "U")
    data = data.replace("", "U")
    data = data.replace("", "u")
    data = data.replace("", "u")
    data = data.replace("", "u")
    data = data.replace("", "u")
    data = data.replace("", "Y")
    data = data.replace("", "y")
    data = data.replace("", "Ae")
    data = data.replace("", "ae")
    data = data.replace("", "Oe")
    data = data.replace("", "oe")
    data = data.replace("", "S")
    data = data.replace("", "s")
    data = data.replace("", "N")
    data = data.replace("", "n")
    data = data.replace("", "b")
    data = data.replace("", "")

    # language specific filters
    if ( speech_lang == 'English' ):
    # remove everything except standard ascii
    data = unicode(data.decode('utf-8'))
    data = re.sub(ur'[\u007b-\uffff]', '', data)
    data = data.encode('utf-8')
    elif speech_lang in [ 'French', 'German', 'Italian', 'Portuguese', 'Spanish' ]:
    # remove everything except ascii and latin chars
    data = unicode(data.decode('utf-8'))
    data = re.sub(ur'[\u007b-\u00bf]', '', data)
    data = re.sub(ur'[\u0100-\uffff]', '', data)
    data = data.encode('utf-8')
    elif ( speech_lang == 'Chinese' ):
    # remove everything except ascii and chinese chars
    data = unicode(data.decode('utf-8'))
    data = re.sub(ur'[\u007b-\u2e7f]', '', data)
    data = re.sub(ur'[\u3040-\u30ff]', '', data) # japanese
    data = re.sub(ur'[\u3130-\u318f]', '', data) # korean hangul
    data = re.sub(ur'[\ua000-\uf8ff]', '', data)
    data = re.sub(ur'[\ufb00-\uffff]', '', data)
    data = data.encode('utf-8')
    elif ( speech_lang == 'Japanese' ):
    # remove everything except ascii and japanese chars
    data = unicode(data.decode('utf-8'))
    data = re.sub(ur'[\u007b-\u2ffff]', '', data)
    data = re.sub(ur'[\u3100-\u33ff]', '', data) # korean hangul
    data = re.sub(ur'[\u4dc0-\u4dff]', '', data)
    data = re.sub(ur'[\u9fb0-\ufeff]', '', data)
    data = re.sub(ur'[\ufff0-\uffff]', '', data)
    data = data.encode('utf-8')
    elif ( speech_lang == 'Korean' ):
    # remove everything except ascii and hangul chars
    data = unicode(data.decode('utf-8'))
    data = re.sub(ur'[\u007b-\u30ff]', '', data)
    data = re.sub(ur'[\u3200-\uabff]', '', data)
    data = re.sub(ur'[\ud7a4-\uffff]', '', data)
    data = data.encode('utf-8')

    # english-specific filter
    if (speech_lang == "English"):
    data = data.replace("&", " and ")
    # french-specific filter
    elif (speech_lang == "French"):
    data = data.replace("&", " et ")
    data = data.replace("", "b")
    # german-specific filter
    elif (speech_lang == "German"):
    data = data.replace("&", " und ")
    # italian-specific filter
    elif (speech_lang == "Italian"):
    data = data.replace("&", " e ")
    data = data.replace("", "b")
    # korean specific filter
    data = data.replace("'", "")
    data = data.replace("", " ")
    # portuguese-specific filter
    elif (speech_lang == "Portuguese"):
    data = data.replace("&", " e ")
    data = data.replace("", "b")
    # spanish-specific filter
    elif (speech_lang == "Spanish"):
    data = data.replace("&", " y ")
    data = data.replace("", "b")
    return data

    print feed.entries[1].title
    # add a file to your directory and relist its path here
    fo = open("/home/qbobot/Documents/qbo_rss.txt", "wb"
    fo.write( feed.entries[1].title);
    fo.write(" ")
    fo.write( feed.entries[1].description);
    fo.close()
    subprocess.call('echo ''|festival --tts qbo_rss.txt', shell=True)

    Thank You,



    Mel
    Last edited by yhmmc; October 4th, 2013 at 10:05 AM. Reason: forgot something
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    The error message is pretty explicit. However if you don't use code tags that preserve indentation, we can't see what you do or don't do in that regard.
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    Given rrashkin's observation, you certainly don't deserve an answer.

    Your __init__ method is indented 5 spaces, your cleanup_html method 4 spaces. They don't agree. Use emacs because it will indent your code correctly for you, and besides it comes with a symbolic calculator, stupendous undo, and rectangle editing. The return statement of cleanup_html is indented wrong also. Your messy program won't run.
    [code]Code tags[/code] are essential for python code and Makefiles!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    29
    Rep Power
    0
    thank you for your info.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    29
    Rep Power
    0
    When I posted I sincerely did not see a place to do code tags. I looked all over. sorry. I will look into emacs. I am looking and I don't see any now.
  10. #6
  11. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    follow the link at my signature about code tags. Please.
    [code]Code tags[/code] are essential for python code and Makefiles!
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    29
    Rep Power
    0
    Originally Posted by b49P23TIvg
    follow the link at my signature about code tags. Please.
    I found the # button. I just wanted to say that I downloaded many versions of emacs. I could not get them working on my Linux system or my windows system. I am looking right now for an editor that will automatically space. And, I am downloading one that says it does syntax too. Once I get passed that hurdle, I may do better.

    Thanks for all of your help. It takes me a little while to get the hang of stuff because I am extremely disabled. And, yes, I believe that I deserve knowing this information. I am trying my best to get a life instead of having Altzheimers and die a slow, painful death like my mother did. I am not Norman Bates because I mentioned Mother. I have a wife, five children and 17 grandbabies. I have two great-grand babies. Anyway, back to the subject at hand. Thanks for your help.
  14. #8
  15. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,841
    Rep Power
    480
    best wishes.

    Depending on which linux you've got, you'd just install emacs with which ever installer proper. For examples

    sudo apt-get install emacs # ubuntu

    yum install emacs # I forget the name of this distribution but I liked it.
    [code]Code tags[/code] are essential for python code and Makefiles!
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    29
    Rep Power
    0
    I've got it working now,

    Thank You.

IMN logo majestic logo threadwatch logo seochat tools logo