#1
  1. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6352

    Recovering jpegs with corrupt headers


    Well here's a tricky one for you guys.

    My ex wife "backed up" her laptop to my desktop computer via FTP before formatting it. However, somehow she managed to transfer the files in a format that destroyed the first X bytes of the file. All headers are gone off all the files she sent to my computer. Consequently, I have about 3 gigs of corrupted files, most notably nearly a gig of photos.

    Since the headers are bad but the DATA it still good, I know I can recover them, even if it means losing the EXIF metadata. However, I haven't been able to figure out how to do it exactly.

    The "bad" images generally start about 50-100 bytes into their header, they're missing the beginning bit, which makes them not register as valid jpegs.

    The following PHP script will, in many cases, take a "good" header off a known good jpeg file, and slap it on top of a "bad" file and produce either a thumbnail or the top "stripe" of the "good" image and then junk data. It's better than nothing, but there's HUGE high-res pictures here, I want to see them all.


    PHP Code:
    <?php
    $fp 
    fopen('good.jpg''r');
    $data fread($fpfilesize('good.jpg'));
    fclose($fp);


    //The end of the jpeg header is 0x1fe0
    //CLEARLY THIS ISN'T RIGHT
    $chr1 '1f';
    $chr2 'e0';

    //jpeg headers end at 1FD0:
    $headers '';

    for ( 
    $p 0$p strlen($data); $p++ ) {
      
    $char dechex(ord($data[$p]));
      
    //echo $char . ' ';
      
    if ( $char == $chr1 && dechex(ord($data[$p+1])) == $chr2 ) {
        break;
      }
      
    $headers .= $data[$p];
    }

    echo 
    "\n\n";

    echo 
    "Headers captured (" strlen($headers) . " bytes)...\n\n";

    //headers from the good file captured, get everything AFTER the headers from the bad file:
    $badFilename 'badImage.jpg';
    $fixedFilename 'badImage_FIXED.jpg';
    $fp fopen($badFilename'r');
    $data fread($fpfilesize($badFilename));
    fclose($fp);


    $capture false;
    $fileData '';
    for ( 
    $p 0$p strlen($data); $p++ ) {
      
    $char dechex(ord($data[$p]));
      if ( 
    $char == $chr1 && dechex(ord($data[$p+1])) == $chr2 ) {
        
    $capture true;
      }
      if ( 
    $capture ) {
        
    //echo $char . ' ';
        
    $fileData .= $data[$p];
      }
    }

    echo 
    "\n\n";

    echo 
    "Image data captured (" strlen($fileData) . " bytes)...\n\n";

    echo 
    "Attempting to make a new file at {$fixedFilename}...\n";
    //make the new file with the known good headers:

    $fp fopen($fixedFilename'w');
    fwrite($fp$headers.$fileData);
    die(
    "Fixed file saved into {$fixedFilename}\n\n");
    Any help is appreciated. I think I just need the correct value for "end of header"

    -Dan
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  2. #2
  3. Jealous Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,303
    Rep Power
    9400
    Dan sent me three example images: one I could recover and two I could not...

    For the first one it had a couple problems: a bunch of leading 0x00 bytes, which are trivial enough to strip out, and the fact that the image had a thumbnail stored as well. Removing the null bytes makes only the thumbnail visible.

    Removing the thumbnail is possible with a bit of knowledge of JPEG syntax. A JPEG file can have multiple images inside, as well as some junk data that gets ignored. Each image begins with a 0xFF 0xD8 marker and ends with a 0xFF 0xD9 marker. Adding to that the fact that the thumbnail was stored first*, you can strip out any leading junk data, the first image, and any remaining junk data until the next 0xFF 0xD8.

    The other two images looked to be cut off in the middle of the image data itself. While you can probably add any missing header information** you can't add the missing image data, and because that data is compressed you can't even reverse engineer it for some good guesses. At least not by hand in a hex editor, which is what I had to work with.


    * Which makes sense if you think about it. The thumbnail comes first so any simple image decoder will see that first and display it, while more detailed ones will realize there's two images with different sizes.

    ** Like by using the headers from another image from the same camera. If that's not possible you could guess at the image size and densities; while there are a couple data tables needed for decoding, the JPEG standard defines "typical" values that are worth a shot.
  4. #3
  5. CSS & JS/DOM Adept
    Devshed Supreme Being (6500+ posts)

    Join Date
    Jul 2004
    Location
    USA (verifiably)
    Posts
    20,131
    Rep Power
    4304
    Hi Dan. I stumbled across this yesterday (while looking for something else) and wondered if you had considered trying anything like it: http://www.wondershare.com/data-reco...-recovery.html
    Spreading knowledge, one newbie at a time.

    Check out my blog. | Learn CSS. | PHP includes | X/HTML Validator | CSS validator | Common CSS Mistakes | Common JS Mistakes

    Remember people spend most of their time on other people's sites (so don't violate web design conventions).
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,908
    Rep Power
    6352
    Tried it. It gets some and not others. Basically it gets the same ones requinix was able to get.

    It seems the problem is that the files either:
    1) Have random junk crap at the beginning, usually zeroes, and also corrupt headers/thumbnails.
    2) Have the first X kb cut off.

    In the case of (1) I can just go through the binary of the file and chop off the built-in thumbnail, revealing the real picture.

    In the case of (2) it seems I'm screwed. The absolute best case I can hope for is to manually guess the image dimensions of every one of those images and make up a fake header with those sizes, then fill in junk data until I make the file the perfect size. Even then, the results will be a file with static across the top.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2013
    Posts
    2
    Rep Power
    0
    JPEG unlike some other file formats doesn't really have a file header, just a "start of data" marker and some "start of image" markers with some rules. These markers delineate sections, one of which is the EXIF data, one of which is the image data, and there may be others. What we want to do is remove the data from the start of each file to the start of the image data, and then save the file.
    Once all of the files are stripped down to image data, you will open a file containing only a header, and append an image data file at the end. Save that file, and open a header, append image data.

IMN logo majestic logo threadwatch logo seochat tools logo