|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Sentence count
Hi,
The following code counts the number of times a specific word appears in a file by separating each word onto a new line and then using grep to count how many lines contain that word. Is there a way to change this so that it counts the number of sentences? tr ' \t' '\n\n' < text | grep -c "$word" Thanks. |
|
#2
|
||||
|
||||
|
First off, define what ends a sentence. That could be, but is not limited to: .!?" - then just look for them (and count!).
|
|
#3
|
|||
|
|||
|
thanks for the reply. I want it to count the following '.!?'. I tried this, but I think my syntax is wrong. sorry im a newbie
tr -dc '.' < text | wc -c Thanks again. |
|
#4
|
||||
|
||||
|
You could try using the fold command to seperate each character onto a seperate line. Then you could count the characters that end a sentence (.!?) using grep and wc together.
Code:
fold -w 1 | grep "[\.\!\?]" | wc -l This won't, however, take into account any periods, exclamations or question marks placed mid sentence, such as an abbreviation (Mr., Ph.D., etc.) |
|
#5
|
||||
|
||||
|
Not at all simple, is it? A lot would depend on just how critical the accuracy is. If it is important that the value returned/printed is accurate as possible you are hoing to need to parse the input carefully, keeping track of context - are we in a quote and thus may not need to cound a '.', or was the previous character a '.' too; maybe implying we are in mid-ellipsis (the ... 'character').
Knowing *nix systems there is probably am existing base utility that will, with the flick of a switch, do exactly what you want! |
![]() |
| Viewing: Dev Shed Forums > Operating Systems > UNIX Help > Sentence count |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|