
April 25th, 2003, 10:30 PM
|
|
Junior Member
|
|
Join Date: Feb 2003
Location: Australia
Posts: 19
Time spent in forums: < 1 sec
Reputation Power: 0
|
|
|
Parsing HTML Docs
hi all,
i am confused in parsing HTML documents,
like
<HTML>
<HEAD>
something
</HEAD>
<BODY>
something of body
</BODY>
</HTML>
I want to strip of all tags and info b/w "<" and ">" and want to have only rest of the file.
I am doing like
#include "***1.h"
int main(int argc, char *argv[])
{
char line[MAX_LINE+1];
FILE *fp;
char c;
if(argc < 2 )
{
printf("\n Useage - ./executable <file1> \n");
printf("Insufficient number of arguments \n");
return EXIT_FAILURE;
}
if ((fp = fopen(argv[1], "r")) == NULL)
{
printf("Could not open File -%s", argv[1]);
return EXIT_FAILURE;
}
while (fgets(line, MAX_LINE, fp) != NULL)
{
printf("%s",line); /*----*/
}
fclose(fp);
exit(EXIT_SUCCESS);
}
I am able to get line at /*---*/ but I am not getting to parse it so that I can skip whatever is between < and >
Can somebody help what standard library funciton will be used?
Thanks
|