#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    1
    Rep Power
    0

    Extracting data from a txt file in php


    I have a log file which is a .txt file which displays
    IP ADDRESS, TIME STAMP,FILENAME,HTTP STATUS CODE,BANDWIDTH,USER AGENT

    1.)103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"
    2.)207.3.35.52 -- [2007-04-01 01:24:42] "GET index.php HTTP/1.0" 200 11411 "Mozilla/4.0"

    I need to findout
    1. The total number of file requests in the month.
    2. The number of file requests from the articles directory.
    3. The TOTAL bandwidth consumed by the file requests over the month.
    4. The number of requests that resulted in 404 status errors. Display a list of the filenames that produced these 404 errors (try not to repeat filenames if the same wrong filename was requested more than once

    i've managed to get the total number of files

    <?php
    $file="april.txt";
    $linecount = 0;
    $handle = fopen($file, "r");
    while(!feof($handle)){
    $line = fgets($handle);
    $linecount++;
    }

    fclose($handle);

    echo $linecount;

    ?>
    to get the data i know i need to explode the strings into an array and loop through the array to count the information needed but i'm struggling with exploding it properly so it breaks up the array properly
    any ideas
  2. #2
  3. hiding my <b> from ur <strong>
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2004
    Posts
    959
    Rep Power
    181
    rikkidas, your question is really a RegEx question. I'm not an expert in RegEx but I can tell you how to word your question better so that it's easier to understand and so that RegEx experts can help.

    First, indicate "RegEx question" somewhere in the title. I actually believe that there might be a RegEx specific forum here, so you can try that too.

    Second, rewrite the question so that you exclude what you've already been able to do. What you're really doing is running through each line of a file (you know how to do that), parsing out information that you need using RegEx (you need to know this), and then performing calculations on this information, like adding together the results (I bet you already know how to do this). So just list that...you're more apt to get a response for a question we can understand and complete quickly, rather than having to write you a long winded email about many different parts to your question.

    So just ask the one part that you need..the regex.

    Something like:

    "I have the following line:
    103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"

    and I need to parse out the root directory (after the GET), the numbers after the directory listing, and **something here that would show up if a 404 error happened **.


    Once you get this information, you should be able to easily make your calculations. And if not, you can ask in a separate question.
    ****
    Enjoy my post? Drop some props by hitting the scales button up top. JBL

    Website Design in Los Angeles and Washington, DC by PoweredPages.com
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2004
    Posts
    2,974
    Rep Power
    375
    1. total no of request (you have done yourself)
    2. use preg_match_all http://php.net/manual/en/function.preg-match-all.php (although there might be another function quicker than this)
    3 - same as 2 above but this time you might need to convert the answer to a suitable format i.e. instead of displaying 2400000 you could convert that into MB/GB etc.
    4. go through line by line, check if there is 404, if so get the file.

    Now your regexes are going to be a pain in the bet and if i find time later on, i will try to help you since i am at work now.

    Having said all this why go through all this trouble? Most web hosts give you statistics.

    Originally Posted by rikkidas
    I have a log file which is a .txt file which displays
    IP ADDRESS, TIME STAMP,FILENAME,HTTP STATUS CODE,BANDWIDTH,USER AGENT

    1.)103.239.234.105 -- [2007-04-01 00:42:21] "GET articles/learn_PHP_basics HTTP/1.0" 200 12729 "Mozilla/4.0"
    2.)207.3.35.52 -- [2007-04-01 01:24:42] "GET index.php HTTP/1.0" 200 11411 "Mozilla/4.0"

    I need to findout
    1. The total number of file requests in the month.
    2. The number of file requests from the articles directory.
    3. The TOTAL bandwidth consumed by the file requests over the month.
    4. The number of requests that resulted in 404 status errors. Display a list of the filenames that produced these 404 errors (try not to repeat filenames if the same wrong filename was requested more than once

    i've managed to get the total number of files

    <?php
    $file="april.txt";
    $linecount = 0;
    $handle = fopen($file, "r");
    while(!feof($handle)){
    $line = fgets($handle);
    $linecount++;
    }

    fclose($handle);

    echo $linecount;

    ?>
    to get the data i know i need to explode the strings into an array and loop through the array to count the information needed but i'm struggling with exploding it properly so it breaks up the array properly
    any ideas

IMN logo majestic logo threadwatch logo seochat tools logo