October 4th, 2013, 10:02 AM
-
Help with python error
This is my error message:
File "news.py", line 52
def cleanup_html(self, data):
^
IndentationError: unindent does not match any outer indentation level
I originally had just feedparser. But, I added another part to the program to clean it up. I would have done away with feedparser, But, I did not know how to open the file and read the post without it. I just want it to open the url, read the file, say the description and ask if I want more, if I do, give me story number two. I am ignorant on about 80% of Python. I am going further slowly. P.S. I did not see a button to press to insert code. It stripped all of my indentions. But, they were at 5 and 10 and 15
My code is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import subprocess
import re
import urllib
import urllib2
import xml.dom.minidom
import feedparser
rss_url = "http://feeds.reuters.com/Reuters/domesticNews"
feed = feedparser.parse( rss_url )
posts = []
for i in range(0,len(feed['entries'])):
posts.append({
'title': feed['entries'][i].title,
'description': feed['entries'][i].summary,
'url': feed['entries'][i].link,
})
class MyClass():
def __init__(self):
GeneratedClass.__init__(self)
self.lang_ref = { 'Chinese' : 'zh',
'English' : 'en',
'French' : 'fr',
'German' : 'de',
'Italian' : 'it',
'Japanese' : 'ja',
'Korean' : 'ko',
'Portuguese' : 'pt',
'Spanish' : 'es'
}
self.error_msg = { "Chinese" : "我发现了什么。",
"English" : "I found nothing.",
"French" : "Je n'ai rien trouvé.",
"German" : "Ich fand nichts.",
"Italian" : "Ho trovato nulla.",
"Japanese" : "何を発見した。",
"Korean" : "아무것도 없습니다.",
"Portuguese" : "Eu não achei nada.",
"Spanish" : "No he encontrado nada."
}
try:
req = urllib2.Request(url)
response = urllib2.urlopen(req)
content = xml.dom.minidom.parseString(response.read())
data = str(content.getElementsByTagName('rev')[0].firstChild.data.encode('utf-8'))
def cleanup_html(self, data):
data = re.sub(r'<[^<].*?/?>', '', data)
data = data.replace(" ", " ")
return data
def cleanup_wiki(self, data):
data = re.sub(r'(?i)\{\{Date[^\[\]]*?\|([^}]*)\}\}', lambda m: m.group(1), data) # remove wiki 'Date' markers
data = re.sub(r'(?i)\{\{IPA(\-[^\|\{\}]+)*?\|([^\|\{\}]+)(\|[^\{\}]+)*?\}\}', lambda m: m.group(2), data)
data = re.sub(r'(?i)\{\{Lang(\-[^\|\{\}]+)*?\|([^\|\{\}]+)(\|[^\{\}]+)*?\}\}', lambda m: m.group(2), data)
data = re.sub(r'\{\{[^\{\}]+\}\}', '', data)
data = re.sub(r'(?m)\{\{[^\{\}]+\}\}', '', data)
data = re.sub(r'(?m)\{\|[^\{\}]*?\|\}', '', data)
data = re.sub(r'(?i)\[\[Category:[^\[\]]*?\]\]', '', data)
data = re.sub(r'(?i)\[\[Image:[^\[\]]*?\]\]', '', data)
data = re.sub(r'(?i)\[\[File:[^\[\]]*?\]\]', '', data)
data = re.sub(r'(?i)\[\[Fichier:[^\[\]]*?\]\]', '', data)
data = re.sub(r'(?u)\[\[[^\[\]]*?:[^\[\]]*?\]\]', '', data)
data = re.sub(r'\[\[[^\[\]]*?\|([^\[\]]*?)\]\]', lambda m: m.group(1), data)
data = re.sub(r'\[\[([^\[\]]+?)\]\]', lambda m: m.group(1), data)
data = re.sub(r'\[\[([^\[\]]+?)\]\]', '', data)
data = re.sub(r'(?i)File:[^\[\]]*?', '', data)
data = re.sub(r'\[[^\[\]]*? ([^\[\]]*?)\]', lambda m: m.group(1), data)
data = re.sub(r"''+", '', data)
data = re.sub(r'(?m)^\*$', '', data)
data = re.sub(r'(?i)\{\{InfoBox[^\[\]]*?\}\}', '', data) # remove 'infobox' text block
data = re.sub(r'(?i)<ref[^\[\]]*?</ref>', '', data) # remove 'ref' text blocks
return data
def cleanup_text(self, data):
#remove unnecessary parts
data = re.sub(r'\([^\[\]]*?\)', '', data) # remove parentheses text blocks
data = re.sub(r'(?u)【[^[\]]*?】', '', data) # remove parentheses text blocks (japanese)
data = re.sub(r'(?u)([^[\]]*?)', '', data) # remove parentheses text blocks (japanese)
#replace chars
data = data.replace("|", " ")
data = data.replace(";", ".")
data = data.replace(" - ", ", ")
#remove chars
data = data.replace('"', '') # remove quotes
data = data.replace('«', '') # remove quotes
data = data.replace('»', '') # remove quotes
data = data.replace("#", "") # remove hashes
data = data.replace("/", "") # remove slashes
data = data.replace("*", "") # remove stars
#remove extra spaces
data = re.sub(r'\s+', ' ', data) # remove extra spaces
#clean-up punctuation - part1
data = data.replace(" ,", ",") # punctuation correction
data = data.replace(" .", ".") # punctuation correction
data = data.replace(" !", "!") # punctuation correction
data = data.replace(" ?", "?") # punctuation correction
data = data.replace(" :", ":") # punctuation correction
#clean-up punctuation - part2
data = data.replace(",.", ".") # punctuation correction
data = data.replace("...", ".") # punctuation correction
data = data.replace("..", ".") # punctuation correction
data = data.replace("!.", "!") # punctuation correction
data = data.replace("?.", "?") # punctuation correction
data = data.replace(":.", ":") # punctuation correction
return data
def safe_chars(self, data, speech_lang):
# modify or remove other TTS non-supported chars
data = data.replace("Œ", "Oe")
data = data.replace("œ", "oe")
data = data.replace("’", "'") # non-readable single-quote
# common filter for non latin languages
if speech_lang in [ 'Chinese', 'English', 'Korean', 'Japanese' ]:
data = data.replace("Á", "A")
data = data.replace("À", "A")
data = data.replace("Â", "A")
data = data.replace("Ä", "A")
data = data.replace("Ã", "A")
data = data.replace("á", "a")
data = data.replace("à", "a")
data = data.replace("â", "a")
data = data.replace("ä", "a")
data = data.replace("ã", "a")
data = data.replace("É", "E")
data = data.replace("È", "E")
data = data.replace("Ê", "E")
data = data.replace("Ë", "E")
data = data.replace("é", "e")
data = data.replace("è", "e")
data = data.replace("ê", "e")
data = data.replace("ë", "e")
data = data.replace("Í", "I")
data = data.replace("Ì", "I")
data = data.replace("Î", "I")
data = data.replace("Ï", "I")
data = data.replace("í", "i")
data = data.replace("ì", "i")
data = data.replace("î", "i")
data = data.replace("ï", "i")
data = data.replace("Ó", "O")
data = data.replace("Ò", "O")
data = data.replace("Ô", "O")
data = data.replace("Õ", "O")
data = data.replace("Ö", "O")
data = data.replace("ó", "o")
data = data.replace("ò", "o")
data = data.replace("ô", "o")
data = data.replace("ö", "o")
data = data.replace("õ", "o")
data = data.replace("Ú", "U")
data = data.replace("Ù", "U")
data = data.replace("Û", "U")
data = data.replace("Ü", "U")
data = data.replace("ú", "u")
data = data.replace("ù", "u")
data = data.replace("û", "u")
data = data.replace("ü", "u")
data = data.replace("Ÿ", "Y")
data = data.replace("ÿ", "y")
data = data.replace("Æ", "Ae")
data = data.replace("æ", "ae")
data = data.replace("Œ", "Oe")
data = data.replace("œ", "oe")
data = data.replace("Ç", "S")
data = data.replace("ç", "s")
data = data.replace("Ñ", "N")
data = data.replace("ñ", "n")
data = data.replace("ß", "b")
data = data.replace("¿", "")
# language specific filters
if ( speech_lang == 'English' ):
# remove everything except standard ascii
data = unicode(data.decode('utf-8'))
data = re.sub(ur'[\u007b-\uffff]', '', data)
data = data.encode('utf-8')
elif speech_lang in [ 'French', 'German', 'Italian', 'Portuguese', 'Spanish' ]:
# remove everything except ascii and latin chars
data = unicode(data.decode('utf-8'))
data = re.sub(ur'[\u007b-\u00bf]', '', data)
data = re.sub(ur'[\u0100-\uffff]', '', data)
data = data.encode('utf-8')
elif ( speech_lang == 'Chinese' ):
# remove everything except ascii and chinese chars
data = unicode(data.decode('utf-8'))
data = re.sub(ur'[\u007b-\u2e7f]', '', data)
data = re.sub(ur'[\u3040-\u30ff]', '', data) # japanese
data = re.sub(ur'[\u3130-\u318f]', '', data) # korean hangul
data = re.sub(ur'[\ua000-\uf8ff]', '', data)
data = re.sub(ur'[\ufb00-\uffff]', '', data)
data = data.encode('utf-8')
elif ( speech_lang == 'Japanese' ):
# remove everything except ascii and japanese chars
data = unicode(data.decode('utf-8'))
data = re.sub(ur'[\u007b-\u2ffff]', '', data)
data = re.sub(ur'[\u3100-\u33ff]', '', data) # korean hangul
data = re.sub(ur'[\u4dc0-\u4dff]', '', data)
data = re.sub(ur'[\u9fb0-\ufeff]', '', data)
data = re.sub(ur'[\ufff0-\uffff]', '', data)
data = data.encode('utf-8')
elif ( speech_lang == 'Korean' ):
# remove everything except ascii and hangul chars
data = unicode(data.decode('utf-8'))
data = re.sub(ur'[\u007b-\u30ff]', '', data)
data = re.sub(ur'[\u3200-\uabff]', '', data)
data = re.sub(ur'[\ud7a4-\uffff]', '', data)
data = data.encode('utf-8')
# english-specific filter
if (speech_lang == "English"):
data = data.replace("&", " and ")
# french-specific filter
elif (speech_lang == "French"):
data = data.replace("&", " et ")
data = data.replace("ß", "b")
# german-specific filter
elif (speech_lang == "German"):
data = data.replace("&", " und ")
# italian-specific filter
elif (speech_lang == "Italian"):
data = data.replace("&", " e ")
data = data.replace("ß", "b")
# korean specific filter
data = data.replace("'", "")
data = data.replace("…", " ")
# portuguese-specific filter
elif (speech_lang == "Portuguese"):
data = data.replace("&", " e ")
data = data.replace("ß", "b")
# spanish-specific filter
elif (speech_lang == "Spanish"):
data = data.replace("&", " y ")
data = data.replace("ß", "b")
return data
print feed.entries[1].title
# add a file to your directory and relist its path here
fo = open("/home/qbobot/Documents/qbo_rss.txt", "wb"
fo.write( feed.entries[1].title);
fo.write(" ")
fo.write( feed.entries[1].description);
fo.close()
subprocess.call('echo ''|festival --tts qbo_rss.txt', shell=True)
Thank You,
Mel
Last edited by yhmmc; October 4th, 2013 at 10:05 AM.
Reason: forgot something
October 4th, 2013, 12:21 PM
-
The error message is pretty explicit. However if you don't use code tags that preserve indentation, we can't see what you do or don't do in that regard.
October 4th, 2013, 01:30 PM
-
Given rrashkin's observation, you certainly don't deserve an answer.
Your __init__ method is indented 5 spaces, your cleanup_html method 4 spaces. They don't agree. Use emacs because it will indent your code correctly for you, and besides it comes with a symbolic calculator, stupendous undo, and rectangle editing. The return statement of cleanup_html is indented wrong also. Your messy program won't run.
[code]
Code tags[/code] are essential for python code and Makefiles!
October 4th, 2013, 01:58 PM
-
thank you for your info.
October 4th, 2013, 04:11 PM
-
When I posted I sincerely did not see a place to do code tags. I looked all over. sorry. I will look into emacs. I am looking and I don't see any now.
October 4th, 2013, 04:58 PM
-
follow the link at my signature about code tags. Please.
[code]
Code tags[/code] are essential for python code and Makefiles!
October 5th, 2013, 10:21 AM
-
Originally Posted by b49P23TIvg
follow the link at my signature about code tags. Please.
I found the # button. I just wanted to say that I downloaded many versions of emacs. I could not get them working on my Linux system or my windows system. I am looking right now for an editor that will automatically space. And, I am downloading one that says it does syntax too. Once I get passed that hurdle, I may do better.
Thanks for all of your help. It takes me a little while to get the hang of stuff because I am extremely disabled. And, yes, I believe that I deserve knowing this information. I am trying my best to get a life instead of having Altzheimers and die a slow, painful death like my mother did. I am not Norman Bates because I mentioned Mother. I have a wife, five children and 17 grandbabies. I have two great-grand babies. Anyway, back to the subject at hand. Thanks for your help.
October 5th, 2013, 11:39 AM
-
best wishes.
Depending on which linux you've got, you'd just install emacs with which ever installer proper. For examples
sudo apt-get install emacs # ubuntu
yum install emacs # I forget the name of this distribution but I liked it.
[code]
Code tags[/code] are essential for python code and Makefiles!
October 5th, 2013, 02:14 PM
-
I've got it working now,
Thank You.