#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    3
    Rep Power
    0

    Unhappy Downloading HTML src of webpage only returning a few characters


    I'm constructing a basic web crawler for school and while starting, I ran into a weird problem. The intention is for 1kb of plain html to be written to "file.txt". Instead, only a few characters are read and the rest of the file is full of whitespace. How can this be? Code below
    Code:
    byte[] bytes = new byte[1024];
     FileWriter fstream = new FileWriter("file.txt");
     BufferedWriter out = new BufferedWriter(fstream);
     URL url = new URL("http://www.google.com/");
     InputStream stream = url.openConnection().getInputStream();
     stream.read(bytes);
     out.write(new String(bytes));
     out.close();
  2. #2
  3. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,713
    Rep Power
    348
    How many bytes were read by the read() method? Save and print the value it returns.

    What was read from the site?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2013
    Posts
    3
    Rep Power
    0
    Originally Posted by NormR
    How many bytes were read by the read() method? Save and print the value it returns.

    What was read from the site?
    The value returned was '26' indicating only 26 bytes were read. Why is this happening? The contents of the file are as follows:
    Code:
    <!doctype html><html items
    There were roughly 1000 trailing spaces.
  6. #4
  7. Contributing User
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Aug 2010
    Location
    Eastern Florida
    Posts
    3,713
    Rep Power
    348
    Put the read() in a loop and continue reading until the -1 indicating the end of the stream.

    There weren't 1000 trailing spaces. What you see are the unused parts of the input buffer. 26 characters were read into the beginning of the buffer and the rest of the buffer was left empty with binary 0s not spaces. See the String class's constructor for how to create a String from part of an array.

IMN logo majestic logo threadwatch logo seochat tools logo