|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Generate data entry and reporting .NET Web apps in minutes, straight from your database. Read our FREE whitepaper “Build Web 2.0 Applications Without Hand-Coding” Download now! |
|
#1
|
|||
|
|||
|
I have a daily report that is formatted for me into an html page. It has some extraneous headers and footers with my data in the middle. Each day I am manually copying the data into an excel "import" sheet, parsing into columns, deleting the unneeded columns and finally pasting the remaining data into the actual excel doc that where I analyze the data
. I am looking for a script that can grab this page, grab the needed data, parse it down and output the needed data for me as comma or tab delim text that I can just copy and paste into my excel doc for further work. There are no images on the original html page and I can easily tell the script where the data starts and stops, but don't know how to do much of the rest as I'm just new at php and know very little about asp... if someone could help me out that would be huge! -Danny |
|
#2
|
|||
|
|||
|
Example
here is an example of the text that I need to be parsed and formatted:
Code:
RTRV-EQPT (RMACT) Dallas 04-05-10 05:43:16 cdt RTRV-EQPT:TID:RMACT:CTAG; TID aid:data,,, ----------- ----------------------------------------------------------------------------- SITEID01 RPM3-1297::PRTN_TYPE=AUTO,PRTN_PORT=PM3-1299:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SIDEID11 RPM1-0174::PRTN_TYPE=MAN,PRTN_PORT=PM1-0174-21:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY SITEID07 RPME-0121::PRTN_TYPE=MAN,PRTN_PORT=PME-0121:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY SITEID09 RPM1-0011::PRTN_TYPE=AUTO,PRTN_PORT=PM1-0011-25:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SITEID14 RPME-0233::PRTN_TYPE=AUTO,PRTN_PORT=PME-0235:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SITEID02 RPME-0145::PRTN_TYPE=AUTO,PRTN_PORT=PME-0150:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY Communication Errors: TID Channel Count Error -------------------- ---------- ----- ------------------------- siteid17 ldcsa 2 Error: cmd_ack_RL siteid04 ldcsa 9 Error: DENY received siteid05 ldcsa 70 16654 Error: wait-for-response siteid06 ldcsa 71 16654 Error: wait-for-response The only portion of this that I need (this is a condensed version) is the middle part: Code:
SITEID01 RPM3-1297::PRTN_TYPE=AUTO,PRTN_PORT=PM3-1299:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SITEID11 RPM1-0174::PRTN_TYPE=MAN,PRTN_PORT=PM1-0174-21:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY SITEID07 RPME-0121::PRTN_TYPE=MAN,PRTN_PORT=PME-0121:RM_PST=IS-NR-ACT,PORT_PST=IS-NR-STBY SITEID09 RPM1-0011::PRTN_TYPE=AUTO,PRTN_PORT=PM1-0011-25:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SITEID14 RPME-0233::PRTN_TYPE=AUTO,PRTN_PORT=PME-0235:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY SITEID12 RPME-0145::PRTN_TYPE=AUTO,PRTN_PORT=PME-0150:RM_PST=IS-NR-ACT,PORT_PST=IS-ANR-EQ-STBY And I need it compressed down to this (some fields removed and separated by commas or tabs) Code:
SITEID01,AUTO,PM3-1299,IS-ANR-EQ-STBY SITEID11,MAN,PM1-0174-21,IS-NR-STBY SITEID07,MAN,PME-0121,IS-NR-STBY SITEID09,AUTO,PM1-0011-25,IS-ANR-EQ-STBY SITEID14,AUTO,PME-0235,IS-ANR-EQ-STBY SITEID12,AUTO,PME-0150,IS-ANR-EQ-STBY Anyone got any tips/pointers or feel like writing a nice script for me? ![]() |
|
#3
|
||||
|
||||
|
1. Are you working on Windows or UNIX/Linux/Solaris/etc?
2. (even better) Is there any way you can have access to the script that builds the web page in the first place? I'd recommend starting there since it shouldn't be difficult to modify that script to email selected portions of the data directly to your inbox, or at least dump it to a file you can reach on a daily basis.
__________________
Mother says my .sig can beat up your .sig. |
|
#4
|
|||
|
|||
|
I would use PHP, just for familiarity's sake. Here you'll see all the functions you'll need,. The main ones will be explode, strstr, and substr. You'll separate out the section you need, explode it into an array and loop through the array and just get the interesting parts and append them to a new string. Don't know if it helps, but if it does you can use it again and again.
|
|
#5
|
|||
|
|||
|
I'm a bit unsure how to implement the functions you mentioned. I've been reading as much as I can find though. Can you help me with the method to do this?
So far I have... Code:
$file = fopen ($name, "r");
while (!feof ($file)) {
$line = fgets ($file, 1024);
echo $line;
}
fclose($file);
to basically open the file and print it out as a check. and that works, but how do i select which part of the file to output? Right now it outputs the whole file (even the html tags are kept since it is reading the html and not the text itself..) I need an output file that has only the middle portion of the original file. Then I need to parse it (explode) but I have 4 seperators so I assume I'll have to do an explode 4 times?? in order to get everything parsed down. Ideas? Help? |
|
#6
|
|||
|
|||
|
Ok, I got it to parse using the list and split functions....
Now.. I have exactly the output that I need for a single line, but need it to loop to each line and I'm running into problems. here is what I have. Code:
<?php
$name = "parse.txt";
$file = fopen ($name, "r");
$line = fgets ($file, 1024);
list($DCS, $NULL, $NULL1, $DESC1, $PRTN_TYPE, $DESC2, $PRTN_PORT, $DESC3, $RM_PST, $DESC4, $PORT_PST) = split('[:,= ]', $line);
echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST";
fclose($file);
?>
And I get exactly what I need (the first line).. how do I loop it until end of file? A while loop didn't work .. but I probably did it wrong... |
|
#7
|
|||
|
|||
|
Code:
<?php
$name = "parse.txt";
$file = fopen ($name, "r");
while (!feof ($file)) {
$line = fgets ($file, 1024);
list($DCS, $NULL, $NULL1, $DESC1, $PRTN_TYPE, $DESC2, $PRTN_PORT, $DESC3, $RM_PST, $DESC4, $PORT_PST) = split('[:,= ]', $line);
echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST";
}
fclose($file);
?>
but it's all outputting to one line... I tried putting a \n in the end of the echo statement, but that didn't work... help? |
|
#8
|
|||
|
|||
|
Okay.. finally figured out that since I am exporting to a webpage, \n was simply putting a new line in the code, but the output was still on the same line so I have to use a <br>...
Code:
echo "$DCS $PRTN_TYPE $PRTN_PORT $PORT_PST <br>"; cool.. so now I have my formatted output..... but I had to cheat and strip off the top and bottom of the original page manually and create a new text file I called 'parse.txt'... Can someone help me figure out how to strip off the top and bottom of the original webpage to get down to 'parse.txt' automatically? thx |
![]() |
| Viewing: Dev Shed Forums > Web Site Management > Scripts > Grab text from html, parse and format it into csv text? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|