Scripts
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb Site ManagementScripts

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Generate data entry and reporting .NET Web apps in minutes, straight from your database. Read our FREE whitepaper “Build Web 2.0 Applications Without Hand-Coding” Download now!
  #1  
Old May 10th, 2004, 12:15 PM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Question Grab text from html, parse and format it into csv text?

I have a daily report that is formatted for me into an html page. It has some extraneous headers and footers with my data in the middle. Each day I am manually copying the data into an excel "import" sheet, parsing into columns, deleting the unneeded columns and finally pasting the remaining data into the actual excel doc that where I analyze the data

. I am looking for a script that can grab this page, grab the needed data, parse it down and output the needed data for me as comma or tab delim text that I can just copy and paste into my excel doc for further work.

There are no images on the original html page and I can easily tell the script where the data starts and stops, but don't know how to do much of the rest as I'm just new at php and know very little about asp...

if someone could help me out that would be huge!

-Danny

Reply With Quote
  #2  
Old May 10th, 2004, 02:28 PM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Example

here is an example of the text that I need to be parsed and formatted:
Code:
RTRV-EQPT (RMACT)
Dallas
04-05-10 05:43:16 cdt
RTRV-EQPT:TID:RMACT:CTAG;

TID         aid:data,,,
----------- -----------------------------------------------------------------------------
SITEID01 RPM3-1297::PRTN_TYPE=AUTO,PRTN_PORT=PM3-1299:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SIDEID11 RPM1-0174::PRTN_TYPE=MAN,PRTN_PORT=PM1-0174-21:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY
SITEID07 RPME-0121::PRTN_TYPE=MAN,PRTN_PORT=PME-0121:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY
SITEID09 RPM1-0011::PRTN_TYPE=AUTO,PRTN_PORT=PM1-0011-25:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SITEID14 RPME-0233::PRTN_TYPE=AUTO,PRTN_PORT=PME-0235:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SITEID02 RPME-0145::PRTN_TYPE=AUTO,PRTN_PORT=PME-0150:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY

Communication Errors:
TID                  Channel    Count Error
-------------------- ---------- ----- -------------------------
siteid17          ldcsa      2     Error: cmd_ack_RL
siteid04          ldcsa      9     Error: DENY received
siteid05          ldcsa      70    16654 Error: wait-for-response
siteid06          ldcsa      71    16654 Error: wait-for-response


The only portion of this that I need (this is a condensed version) is the middle part:

Code:
SITEID01 RPM3-1297::PRTN_TYPE=AUTO,PRTN_PORT=PM3-1299:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SITEID11 RPM1-0174::PRTN_TYPE=MAN,PRTN_PORT=PM1-0174-21:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY
SITEID07 RPME-0121::PRTN_TYPE=MAN,PRTN_PORT=PME-0121:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY
SITEID09 RPM1-0011::PRTN_TYPE=AUTO,PRTN_PORT=PM1-0011-25:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SITEID14 RPME-0233::PRTN_TYPE=AUTO,PRTN_PORT=PME-0235:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY
SITEID12 RPME-0145::PRTN_TYPE=AUTO,PRTN_PORT=PME-0150:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY


And I need it compressed down to this (some fields removed and separated by commas or tabs)
Code:
SITEID01,AUTO,PM3-1299,IS-ANR-EQ-STBY
SITEID11,MAN,PM1-0174-21,IS-NR-STBY
SITEID07,MAN,PME-0121,IS-NR-STBY
SITEID09,AUTO,PM1-0011-25,IS-ANR-EQ-STBY
SITEID14,AUTO,PME-0235,IS-ANR-EQ-STBY
SITEID12,AUTO,PME-0150,IS-ANR-EQ-STBY


Anyone got any tips/pointers or feel like writing a nice script for me?

Reply With Quote
  #3  
Old May 25th, 2004, 05:08 PM
zedmelon's Avatar
zedmelon zedmelon is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2003
Location: under a rock
Posts: 54 zedmelon User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 11 h 20 m 42 sec
Reputation Power: 5
1. Are you working on Windows or UNIX/Linux/Solaris/etc?

2. (even better) Is there any way you can have access to the script that builds the web page in the first place? I'd recommend starting there since it shouldn't be difficult to modify that script to email selected portions of the data directly to your inbox, or at least dump it to a file you can reach on a daily basis.
__________________
Mother says my .sig can beat up your .sig.

Reply With Quote
  #4  
Old July 13th, 2004, 05:52 AM
lighthaus lighthaus is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 3 lighthaus User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I would use PHP, just for familiarity's sake. Here you'll see all the functions you'll need,. The main ones will be explode, strstr, and substr. You'll separate out the section you need, explode it into an array and loop through the array and just get the interesting parts and append them to a new string. Don't know if it helps, but if it does you can use it again and again.

Reply With Quote
  #5  
Old July 16th, 2004, 10:34 AM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I'm a bit unsure how to implement the functions you mentioned. I've been reading as much as I can find though. Can you help me with the method to do this?

So far I have...

Code:
$file = fopen ($name, "r");
while (!feof ($file)) {
   $line = fgets ($file, 1024);
       echo $line;
   }

fclose($file);


to basically open the file and print it out as a check. and that works, but how do i select which part of the file to output? Right now it outputs the whole file (even the html tags are kept since it is reading the html and not the text itself..)

I need an output file that has only the middle portion of the original file. Then I need to parse it (explode) but I have 4 seperators so I assume I'll have to do an explode 4 times?? in order to get everything parsed down.

Ideas? Help?

Reply With Quote
  #6  
Old July 16th, 2004, 11:15 AM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Ok, I got it to parse using the list and split functions....

Now.. I have exactly the output that I need for a single line, but need it to loop to each line and I'm running into problems. here is what I have.

Code:
<?php
$name = "parse.txt";
$file = fopen ($name, "r");
$line = fgets ($file, 1024);
list($DCS, $NULL, $NULL1, $DESC1, $PRTN_TYPE, $DESC2, $PRTN_PORT, $DESC3, $RM_PST, $DESC4, $PORT_PST) = split('[:,= ]', $line);
echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST";
fclose($file);
?> 


And I get exactly what I need (the first line).. how do I loop it until end of file? A while loop didn't work .. but I probably did it wrong...

Reply With Quote
  #7  
Old July 16th, 2004, 11:19 AM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Code:
<?php
$name = "parse.txt";
$file = fopen ($name, "r");
while (!feof ($file)) {
   $line = fgets ($file, 1024);
   list($DCS, $NULL, $NULL1, $DESC1, $PRTN_TYPE, $DESC2, $PRTN_PORT, $DESC3, $RM_PST, $DESC4, $PORT_PST) = split('[:,= ]', $line);
   echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST";
   }
fclose($file);
?> 


but it's all outputting to one line... I tried putting a \n in the end of the echo statement, but that didn't work... help?

Reply With Quote
  #8  
Old July 16th, 2004, 11:30 AM
dz1317 dz1317 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Posts: 19 dz1317 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Okay.. finally figured out that since I am exporting to a webpage, \n was simply putting a new line in the code, but the output was still on the same line so I have to use a <br>...

Code:
   echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST <br>";


cool.. so now I have my formatted output..... but I had to cheat and strip off the top and bottom of the original page manually and create a new text file I called 'parse.txt'... Can someone help me figure out how to strip off the top and bottom of the original webpage to get down to 'parse.txt' automatically?

thx

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb Site ManagementScripts > Grab text from html, parse and format it into csv text?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 5 hosted by Hostway