|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
AWK script problem
Guys Im using a pretty common script to count the frequency of the words in a txt file. I want to do something similar but with the results of an online tagger, the tagger counts the number of nouns and places a NN next to the noun. So basically i want to count all occurences of words that have NN proceeding them.
My code at present is: {gsub(/[.,:;!?(){}]/, "") for (i = 1; i <= NF; i++) freq[$i]++ } END { for (word in freq) printf "%s\t%d\n", word, freq[word] | sort } Can someone give me some advice as to what i need to ammend ![]() |
|
#2
|
|||
|
|||
|
what's your sample input and a desired output?
|
|
#3
|
|||
|
|||
|
An example of the text after it has been tagged is:
([ Mr._NNP Gray_NNP ]) "The word MR has a _NNP tag after it, so that is a noun" <: said_VBD :> ([ it_PRP ]) <: would_MD begin_VB promptly_RB :> at_IN three_CD ._. "_`` I need to count all instances of words that have the _NNP or _NN tag after them. My output would just be the total of _NNP and _NN instances. Thanks |
|
#4
|
|||
|
|||
|
Quote:
I assume for the above sample - the result should be 2. nawk -f sam.awk myFile here's sam.awk: Code:
BEGIN {
RE="[^ ](_NNP|_NN)"
}
{
tot+=gsub(RE, "")
}
END {
printf("Total [_NNP|_NN]: %d\n", tot)
}
|
![]() |
| Viewing: Dev Shed Forums > Operating Systems > UNIX Help > AWK script problem |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|