Find mass deletions to free up Gmail space

Skip to the final script for a way to find the largest groups of emails in your inbox. I freed up about 10% space just by deleting old notification / newsletter type emails.

Cleaning up my Inbox

My Gmail inbox is getting full

/images/gmail_cleanup/gmail_space_indicator.png

and Google often lets me know:

/images/gmail_cleanup/gmail_space_warning.png

Although Google offers some help cleaning up the cruft -- it's not really in their interest to make the tools too useful.

/images/gmail_cleanup/gmail_cleanup_tool.png

Their list allows you to delete emails one by one and is similar to searching larger:10M in gmail. You'll eventually clean up space one by one... can we do this faster? What about all those newsletters, notifcations from linkedin, messages from facebook, etc... that are small individually but together take up megabytes of space?

I have tons of email like these in my inbox:

LinkedIn     <Person> shared their thoughts
Amazon.com   Your Amazon.com order ...
Credit Card  You are eligble ...

There is no way aggregate this sort of data in the Gmail UI. Here is how we can accomplish this:

  1. Export email data to .mbox file using Google takeout

  2. Analyze emails and categorize by sender or subject using python

  3. Delete the largest useless groupings

I've written the code for step 2 already, lets see how it works:

Opening up the mbox file

Luckily python has the builtin mailbox library for reading .mbox files.

import mailbox
mbox = mailbox.mbox(mbox_path)

Iterating over emails

Iterating over emails is similiarly easy. There are tons of ways to query each message. We will just consider From and Subject, as those are enough to identify the most newsletters and other automated emails.

for message in mbox:
    print(message["From"])
    print(message["Subject"])

It can take quite some time to open up the mbox file, but eventually we get output like:

AT&T Account Management <update@emaildl.att-mail.com>
Your AT&T online bill is ready to be viewed
Venmo <venmo@venmo.com>
X Y paid your $10.00 request
...

Gather Statistics

We can use another python builtin: Counter to make this easy as well:

from collections import Counter
sender_counts = Counter()
sender_sizes = Counter()
for message in mbox:
    # TODO: normalize sender, this field looks like "Name <email@domain.com>"
    sender = message["From"]
    sender_counts[sender] += 1
    sender_sizes[sender] += len(message.as_bytes())
    # ... repeat for other metrics

Final results

The final result is saved in this gist. I added some niceties based on running this on larger amount of data :

  • Nicer print formatting

  • Progress indication

  • Remove sender name from the 'From' field because some senders change their name

The final output from one of my mailboxes is:

Parsing mbox... this could take some time
1000 / 94992
...

Most common senders:
<me>@gmail.com                      : 5946
store-news@amazon.com               : 2099
notification+hk7u3_-m@facebookmail. : 1684
no-reply@woot.com                   : 1598
venmo@venmo.com                     : 1311
<person>@<school>.edu               : 1279
...


Most common sender domains:
gmail.com             : 16903
<school>.edu          : 14562
<school>.edu          : 10875
amazon.com            : 5067
facebookmail.com      : 2336
linkedin.com          : 1887
woot.com              : 1784
venmo.com             : 1312
...

Most common subjects:
                                    : 1097
=?utf-8?Q?Woot=20Daily=20Digest?=   : 993
Re:                                 : 803
=?utf-8?Q?Tools.Woot=20Daily=20Dige : 506
Alistair, please add me to your Lin : 322
...

Top summed-size subjects:
                                    : 997.018524 MB
Re:                                 : 667.143983 MB
Fwd:                                : 138.824105 MB
no subject                          : 128.076811 MB
You have a postcard from <person>   : 118.236279 MB
Re:                                 : 111.424229 MB
...

Top summed-size senders:
<me>@gmail.com              : 2941.191349 MB
<person>@<school>.edu       : 475.729489 MB
...
<person>@gmail.com          : 173.501381 MB
store-news@amazon.com       : 157.440193 MB
<various>@facebookmail.com  : 55.942341 MB
...

Summed size: 10526.607168 MB

These results are a bit skewed because I had already tried to clean out my inbox manually (e.g. delete all emails from facebook) but some trends still show up:

  • I am the biggest offender! Turns out I had been emailing tons photos to people over the years. Almost all of these sent emails could be deleted because I have these photos uploaded elsewhere

  • I can delete from:store-news@amazon.com older_than:1y

  • no-reply@woot.com doesn't take up much space, but I can still delete those just to reduce the number of emails to wade through. Same with some of the others.

    • I'll be using this list to unsubscribe from email lists!

  • <person>@<school>.edu has sent me lots of PDFs for school assignments over time

After deleting all the newsletter and now useless notification emails, I regained about 1 GB. Not as much as I would have hoped, but still useful considering I didn't have to review any of those emails for precious memories.

Final script

Comments