1. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    May 2003
    Rep Power

    web fetching and regexp's

    Hi all,
    I am trying to create a class which gets a html file from the net, and takes selected content from that file using regexp's (jakarta). I have succeded in doing this so far, but in a really long-winded way and i need help to make my code more efficient.

    So far, i get each line of the html file, using the readLine() method from the BufferedReader class. I add these lines together to get a huge string.

    I then use the following regexp code that reads a string and extracts matching content which it put's into an array:

    PHP Code:
    String patt "plate=([A-Z][:digit:]{1,3} ([A-Z]{3})&price=250";
    CharacterIterator ci = new StringCharacterIterator(data2);
    //data2 being the huge string//
    int end 0;
    String mod;
    Vector rs = new Vector(1,1);
    RE and = new RE(" ");
    RE r = new RE(patt);
            while (
    r.match(ciend)) {
    int start r.getParenStart(0);
    end r.getParenEnd(0);
    mod = and.subst(ci.substring(startend), " ");

    However, isn't there a way to read the file and check for matching content line by line, instead of wasting computer time by adding each line to form a huge string? So far i haven't been able to do this!

  2. #2
  3. No Profile Picture
    Clueless llama
    Devshed Regular (2000 - 2499 posts)

    Join Date
    Feb 2001
    Lincoln, NE. USA
    Rep Power
    Well, I am not sure what you mean. If you want to check the strings as you are reading them in, you can simply create your regex's and process the lines as you read them instead of putting them into a stringbuffer or whatever.
    //create regex's
    //get a stream
    //loop through stream reading line by line
      //check line for pattern match
      //save match in some collection
    //close stream
    I honestly don't know which way would be more efficient, this way or doing the pattern match on one big string. Regex's can really bog down things. Try both ways and see which works better.

    Before posting did you try:
    [ Javadocs | Google ]

IMN logo majestic logo threadwatch logo seochat tools logo