asobi seksu album cover

Been having a sort of chronic problem with my phone over the past couple weeks where songs would skip. Didn't think it was too big of a deal, but transferring some more music over to it the other day somehow my pictures all disappeared. I take fairly frequent backups, but this caught a few that weren't on the memory card. I checked dmesg, saw end_request: I/O error, dev sdg, sector 48728, and knew that I was probably in for some problems.

Hoping to salvage something, I made a dump of the volume via dd if=/dev/sdg of=phonestick, and set to work trying to figure out how I could read through the dump. I came upon a helpful page with jpeg header information and grepped for the ascii JFIF marker to make sure there were some recognizable jpeg files in there.

(Un)fortunately for me, I have some experience with disgusting byte-level hackery in python, and decided to give a crack at extracting all of the jpeg images I could find. In the end, a very simple and straightforward algorithm ended up working surprisingly well:

#!/usr/bin/env python

chunk = 1048576 * 4

# http://www.obrador.com/essentialjpeg/headerinfo.htm
start_of_image = soi = '\xff\xd8\xff\xe0'
jfif_id = 'JFIF\x00'
diffie_quant_marker = '\xff\xdb'
diffie_huffman_marker = '\xff\xc4'
frame_marker = '\xff\xc0'
scan_marker = '\xff\xda'
comment_marker = '\xff\xee'
end_of_image = eoi = '\xff\xd9'

def extra_check(string):
    """An extra check to make sure we're looking at a jpeg file..."""
    return jfif_id in string[:11]

def slice_image(img):
    """Find the EOI marker assuming we are at the beginning of a jpeg file."""
    dqm_loc = img.find(diffie_quant_marker)
    dhm_loc = img.find(diffie_huffman_marker, dqm_loc)
    frm_loc = img.find(frame_marker, dhm_loc)
    smk_loc = img.find(scan_marker, frm_loc)
    com_loc = img.find(comment_marker, smk_loc)
    eoi_loc = img.find(end_of_image, com_loc)
    return img[:eoi_loc+2]

def generate_jpeg_files(f):
    """A generator that spits out strings that match jpeg files.  `f` is a
    file opened in binary mode."""
    eof = False
    s = ''
    while not eof:
        while soi not in s:
            s = f.read(chunk)
            if not s:
                eof = True
                break
        img_loc = s.find(soi)
        img = s[img_loc:]
        if len(img) < 11:
            extra = f.read(chunk)
            img += extra
            s += extra
        if not extra_check(img):
            # hmm.. it wasn't a jpeg after all, continue
            s = s[img_loc+1:]
            continue
        image = slice_image(img)
        s = s[img_loc + len(image):]
        yield image


def find_all_images(filename, threshold=None):
    f = open(filename, 'rb')
    image_generator = generate_jpeg_files(f)
    for num,img in enumerate(image_generator):
        ifile = open('potential_image_%04d.jpg' % num, 'wb')
        ifile.write(img)
        ifile.close()
        if threshold and num > threshold:
            break
    f.close()

if __name__ == '__main__':
    import optparse
    parser = optparse.OptionParser(usage='%prog [opts] filename', version='1.0')
    parser.add_option('-t', '--threshold', help='maximum number of image files to extract')
    opts, args = parser.parse_args()
    threshold = int(opts.threshold) if opts.threshold else None
    find_all_images(args[0], threshold)

You can also download the script above; I see no reason why it shouldn't run on any platform although it can't attest to its endian-safeness. The algorithm is pretty basic; it searches for the SOI marker, then iterates through the rest of the markers until it finds what looks like the EOI marker, then spits that out as a file. The chunk size can be adjusted to your needs; I wasn't going to have anything much bigger than 1 meg on there, so I wasn't too worried about having to read lots of chunks for one image.

In the end, it worked really well and found 63 images on my phone card. Unfortunately, almost all of them were from mp3 ID3 tags, and the rest were from the phone's build in themes. It's entirely possible that I deleted that directory accidentally and then nuked the storage space with my music transfer.

Jun 16 2009