March 31st, 2005, 02:31 PM
How can I count the number of words in a text file
Also, how can I find the length of each sentence in the same text file ( assume that each sentence end with e.g.(".", "?", "!" ... etc), where the legth of the sentence is the number of words in it. Also, print each sentence separately.
can you help me guys.
March 31st, 2005, 03:42 PM
Welcome to the forum. Please read the sticky posts and familiarize yourself with forum rules. One of these rules states that, we don't normally help with homework unless you show some effort in solving the problem yourself.
Up the Irons
What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
"Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
Down with Sharon Osbourne
"I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
March 31st, 2005, 06:07 PM
Hehe, good job catching that Scorpions4Ever. dreamer300, I hope you atleast know some Python. Read the file through Python, record each line, use len() to find out the length, and use find() to find specific characters.
March 31st, 2005, 06:42 PM
Tools -> Word Count
user@host$ wc -w document.txt
- Open a text file
- Read the contents
- Break the contents into words
- there's room for more discussion here; words aren't always split by spaces, you see.
- count the words.
Python is rather good at splitting and searching text.
I suggest you start with an empty text file, and put comments in for each step, then start finding Python code to do each step a bit at a time, such as:
I can assume, but Python can't. How do you treat brackets and quotes?
# open a text file
source = open("filename.txt", 'r')
# read the contents and store them somewhere
# data = something-or-other
I prefer the form:
(This is a sentence).
"This is a quote".
Where the brackets/speechmarks are part of the sentence, and the dot denotes the end of the sentence.
However, it seems that other people prefer:
(This is a sentence.)
"This is a sentence."
Where the sentence being quoted is finished by the dot, and the quotes mean the whole sentence is being quoted, rather than that the quoted phrase is part of the sentence in the text.
Of course, rarely you might find
"This is a sentence.".
Where someone pedantic wishes to indicate that the sentence being quoted is ending, and that the sentence consisting of the quote is ending. But if you just looked for . to mean the end of a sentence, then you would get ". as a sentence on its own...
... an elipsis would also mess around with that idea.
As would people who ask questions with exclamations;
"She did WHAT?!"
or "But HOW?????"
- Define what your "end of sentence" characters are.
- Group all of the text in the file into one lump
- Split it up wherever there is an end-of-sentence character.
Yes, I'm beating around the bush...
>>> for item in ["list", "of", "items"]:
>>> print item
March 31st, 2005, 06:50 PM
Very detailed tutorial sfb. I congratulate you.
Comments on this post
April 1st, 2005, 02:10 PM
Thank you †Yegg†, sfb and Scorpions4ever.
Ok guys, I know some aspects in python, e.g. I wrote this :
f = open ('me.txt' , 'r')
a = f.readline()
b = " "
counter = 0
while (a != " ") :
for b in (a):
counter = counter + 1
I want tell you that I tried.
I know about (for and wile loop), if statement, how to open a file. However, the problem was how to combine this information to get the right answer.
April 1st, 2005, 02:54 PM
I used this :
a = f.read()
print len( a.split() )
It printed the number of words in my text file.
But how can I split the text file into sentences also.
could you please me? .
April 1st, 2005, 04:02 PM
Try something like this:
article = file ( 'article.txt' ).read()
article = article.replace ( '!', '.' ).replace ( '?', '.' )
sentences = len ( article.split ( '.' ) ) - 1
April 2nd, 2005, 05:16 PM
Thank you Peyton
April 2nd, 2005, 07:13 PM
you could probably put some more effort into finding sentences if you were so inclined. the simple rules you have at the moment will find most sentences, but you don't cater for a few things.
1) abbreviations, such as etc., use a period and don't necessarily end a sentence (and then when they do, still only one period is used). also, it is not uncommon for people to write things like 'etc...' that could create a few extra sentences than there really are.
2) alot of the time, words on either side of characters such as ':', ';', and '-' should be considered as sentences as well.
3) since you're not actually processing the content of the sentences, it doesn't matter that much but you're also missing out on sentences that end with quote marks like 'and he said "hello."' that most commonly end with '."'. you would be cutting off the quote and this could cause you some problems if you were to ever go on and process the content.
just a few things I thought I should point out if you wanted to go above and beyond the call of duty finding the sentences in some text is by no means a clear cut and simple problem, as you'll always find people who don't like to obey the rules, or just plain don't know the rules.