Hangouts Takeout - need example datasets (script for pseudonymization inside)

Search This thread

phry

Member
Nov 23, 2007
24
1
Hey there,
I'm currently working on an app to export data from Google Hangouts using the Takeouts JSON file.

Unfortunately I myself have not been using Hangouts TOO thoroughly, soo I need some more example datasets from other users.

If you want to help me, head over to https://www.google.com/settings/takeout#custom:chat and get a copy of your chat history.

Then you can use the following script to anonymize your data. As I can only anonymize data I know of, PLEASE go through the file afterwards and take a look if some information leaked before sending it to me.

If you find data that is not anonymized by the script, please give me appropriate feedback!

Code:
#!/bin/bash
FILE="Hangouts.json"
cp $FILE "$FILE.pseudo"
FILE="$FILE.pseudo"

function randstr(){
        echo `perl -e 'printf "%08X\n", rand(0xffffffff);'`
}

O=$IFS
IFS=$(echo -en "\n\b")

#pseudonymize all gaia_ids (and chat_ids of the same value)
GAIA_IDS=`perl -ne '/"gaia_id" : "(?!pseudo:)(.*)"/ and print $1."\n"' $FILE | sort | uniq`
for ID in $GAIA_IDS; do
        PSEUDO="pseudo:"`randstr`
        perl -pi -e "s/\"$ID\"/\"$PSEUDO\"/g" $FILE
done;

#as far as I've seen gaia_id equals chat_id, but if that's not always the case, let's pseudonymize those, too
CHAT_IDS=`perl -ne '/"chat_id" : "(?!pseudo:)(.*)"/ and print $1."\n"' $FILE | sort | uniq`
for ID in $CHAT_IDS; do
        PSEUDO="pseudo:"`randstr`
        perl -pi -e "s/\"$ID\"/\"$PSEUDO\"/g" $FILE
done;


FALLBACK_NAMES=`perl -ne '/"fallback_name" : "(?!pseudoname:)(.*)"/ and print $1."\n"' $FILE | sort | uniq`
for ID in $FALLBACK_NAMES; do
        PSEUDO="pseudoname:"`randstr`
        perl -pi -e "s/\"$ID\"/\"$PSEUDO\"/g" $FILE
done;

perl -pi -e 's/"(text|display_url|link_target|url|image_url)"\s*:\s*".*"/"$1" : "ANONYMIZED_DATA"/g' $FILE
IFS=$O

Thanks everyone!