1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2013
    Rep Power

    Need help with logic and hints for this problem

    I've got a list in a text file with the top 1000 words used in the english language. Each line has a list of up to 50 words, like this:


    I need to write code using that file as input, to make an output file with the a list of pairs of words which appear together in at least fifty different lists. For example, in the above example, THE & IS appear together twice, but every other pair appears only once.

    I can't store all possible pairs of words, so no brute force .

    I'm trying to learn the language and i'm stuck on this exercise of the book. Please help. Any logic, guidance or code for this would help me.

    This is what I have so far. It doesn't do what's intended but i'm stuck:

    //open the file
    $handle = fopen("list.txt", 'r');
    $count = 0;
    $is = 0;
    while(!feof($handle)) {
    	$line = fgets($handle);	
    	$words = explode(',', $line);
    	echo $count . "<br /><br />";
    	foreach ($words as $word) {
    		if ($word == "is") {
    	echo "<br /><br />";
    echo "Is count: $is";
    //close the file
    $fp = fopen('output.txt', 'w');
    fwrite($fp, "is count: " . $is);

    This is what I came up with but I think it's too bloated:

    check the first value of the $words array
    store the value into $cur_word
    store $cur_word as a key in an array ($compare) and
    store the counter (line number) as the value
    of that key
    it'll be 1 at this point
    see if $cur_word is on each line and if it is then
    put the value into $compare with the key as
    if array has at least 50 values then continue
    else go to the next value of the $words array
    if it has 50 values then
    go to the next value and do the samething
    compare both lists to see how many values
    if it's at least 50 then append
    the words to the output file

    repeat this process with every word
  2. #2
  3. Impoverished Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Washington, USA
    Rep Power
    My solution brushes up against the "can't store all possible pairs of words" restriction but doesn't actually hit it so I think it's okay. It's a lot like yours too. Only problem is that it's fairly complicated (four nested loops) - more so than I would expect for something like this.

IMN logo majestic logo threadwatch logo seochat tools logo