April 22nd, 2013, 01:08 AM
Downloading HTML src of webpage only returning a few characters
I'm constructing a basic web crawler for school and while starting, I ran into a weird problem. The intention is for 1kb of plain html to be written to "file.txt". Instead, only a few characters are read and the rest of the file is full of whitespace. How can this be? Code below
byte bytes = new byte;
FileWriter fstream = new FileWriter("file.txt");
BufferedWriter out = new BufferedWriter(fstream);
URL url = new URL("http://www.google.com/");
InputStream stream = url.openConnection().getInputStream();
April 22nd, 2013, 07:31 AM
How many bytes were read by the read() method? Save and print the value it returns.
What was read from the site?
April 22nd, 2013, 11:32 AM
The value returned was '26' indicating only 26 bytes were read. Why is this happening? The contents of the file are as follows:
Originally Posted by NormR
There were roughly 1000 trailing spaces.
<!doctype html><html items
April 22nd, 2013, 12:02 PM
Put the read() in a loop and continue reading until the -1 indicating the end of the stream.
There weren't 1000 trailing spaces. What you see are the unused parts of the input buffer. 26 characters were read into the beginning of the buffer and the rest of the buffer was left empty with binary 0s not spaces. See the String class's constructor for how to create a String from part of an array.