I've been working on an application to pull together communications from various sources including WhatsApp (others include disparate website messages, sms, email, skype, msn) into a single database to allow the display of unified threads. This script has been very useful in understanding the WhatsApp database format. Thanks
I have noticed that the thumb_image field isn't decoded in the python, which is a shame because it contains (at least on Android, that I'm looking at) the path to the media. This would eliminate the need for all of that code that tries to match up the media files based on the date plus/minus a couple of days!
The magic number at the start of the field, 0xACED, indicates that thumb_image is a Java object serialization stream. The contents can be fairly easily decoded using the format described in the java docs (just Google for "Java Object Serialization Stream Protocol" - sadly, I can't post a link as a newbie!) You'll see that one of the data items towards the end is the path of the media file.
Hope that helps / is of interest!
(My application exports from sqlite, imports into MySQL and then uses PHP to decode into HTML, hence my thumb_image decoder is written in PHP, rather than python, otherwise I'd have given you some source to do it!)