#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Australia
    Posts
    19
    Rep Power
    0

    Parsing HTML Docs


    hi all,

    i am confused in parsing HTML documents,

    like

    <HTML>
    <HEAD>
    something
    </HEAD>
    <BODY>
    something of body
    </BODY>
    </HTML>

    I want to strip of all tags and info b/w "<" and ">" and want to have only rest of the file.

    I am doing like

    #include "***1.h"

    int main(int argc, char *argv[])
    {
    char line[MAX_LINE+1];
    FILE *fp;
    char c;

    if(argc < 2 )
    {
    printf("\n Useage - ./executable <file1> \n");
    printf("Insufficient number of arguments \n");
    return EXIT_FAILURE;
    }

    if ((fp = fopen(argv[1], "r")) == NULL)
    {
    printf("Could not open File -%s", argv[1]);
    return EXIT_FAILURE;
    }

    while (fgets(line, MAX_LINE, fp) != NULL)
    {
    printf("%s",line); /*----*/
    }
    fclose(fp);

    exit(EXIT_SUCCESS);
    }


    I am able to get line at /*---*/ but I am not getting to parse it so that I can skip whatever is between < and >

    Can somebody help what standard library funciton will be used?

    Thanks
  2. #2
  3. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Location
    Australia
    Posts
    19
    Rep Power
    0
    i found it... pls disregars the above

    thnx

IMN logo majestic logo threadwatch logo seochat tools logo