Find mass deletions to free up Gmail space

Skip to the final script for a way to find the largest groups of emails in your inbox. I freed up about 10% space just by deleting old notification / newsletter type emails.

Cleaning up my Inbox

My Gmail inbox is getting full


and Google often lets me know:


Although Google offers some help cleaning up the cruft -- it's not really in their interest to make the tools too useful.


Their list allows you to delete emails one by one and is similar to searching larger:10M in gmail. You'll eventually clean up space one by one... can we do this faster? What about all those newsletters, notifcations from linkedin, messages from facebook, etc... that are small individually but together take up megabytes of space?

I have tons of email like these in my inbox:

LinkedIn     <Person> shared their thoughts   Your order ...
Credit Card  You are eligble ...

There is no way aggregate this sort of data in the Gmail UI. Here is how we can accomplish this:

  1. Export email data to .mbox file using Google takeout

  2. Analyze emails and categorize by sender or subject using python

  3. Delete the largest useless groupings

I've written the code for step 2 already, lets see how it works:

Opening up the mbox file

Luckily python has the builtin mailbox library for reading .mbox files.

import mailbox
mbox = mailbox.mbox(mbox_path)

Iterating over emails

Iterating over emails is similiarly easy. There are tons of ways to query each message. We will just consider From and Subject, as those are enough to identify the most newsletters and other automated emails.

for message in mbox:

It can take quite some time to open up the mbox file, but eventually we get output like:

AT&T Account Management <>
Your AT&T online bill is ready to be viewed
Venmo <>
X Y paid your $10.00 request

Gather Statistics

We can use another python builtin: Counter to make this easy as well:

from collections import Counter
sender_counts = Counter()
sender_sizes = Counter()
for message in mbox:
    # TODO: normalize sender, this field looks like "Name <>"
    sender = message["From"]
    sender_counts[sender] += 1
    sender_sizes[sender] += len(message.as_bytes())
    # ... repeat for other metrics

Final results

The final result is saved in this gist. I added some niceties based on running this on larger amount of data :

  • Nicer print formatting

  • Progress indication

  • Remove sender name from the 'From' field because some senders change their name

The final output from one of my mailboxes is:

Parsing mbox... this could take some time
1000 / 94992

Most common senders:
<me>                      : 5946               : 2099
notification+hk7u3_-m@facebookmail. : 1684                   : 1598                     : 1311
<person>@<school>.edu               : 1279

Most common sender domains:             : 16903
<school>.edu          : 14562
<school>.edu          : 10875            : 5067      : 2336          : 1887              : 1784             : 1312

Most common subjects:
                                    : 1097
=?utf-8?Q?Woot=20Daily=20Digest?=   : 993
Re:                                 : 803
=?utf-8?Q?Tools.Woot=20Daily=20Dige : 506
Alistair, please add me to your Lin : 322

Top summed-size subjects:
                                    : 997.018524 MB
Re:                                 : 667.143983 MB
Fwd:                                : 138.824105 MB
no subject                          : 128.076811 MB
You have a postcard from <person>   : 118.236279 MB
Re:                                 : 111.424229 MB

Top summed-size senders:
<me>              : 2941.191349 MB
<person>@<school>.edu       : 475.729489 MB
<person>          : 173.501381 MB       : 157.440193 MB
<various>  : 55.942341 MB

Summed size: 10526.607168 MB

These results are a bit skewed because I had already tried to clean out my inbox manually (e.g. delete all emails from facebook) but some trends still show up:

  • I am the biggest offender! Turns out I had been emailing tons photos to people over the years. Almost all of these sent emails could be deleted because I have these photos uploaded elsewhere

  • I can delete older_than:1y

  • doesn't take up much space, but I can still delete those just to reduce the number of emails to wade through. Same with some of the others.

    • I'll be using this list to unsubscribe from email lists!

  • <person>@<school>.edu has sent me lots of PDFs for school assignments over time

After deleting all the newsletter and now useless notification emails, I regained about 1 GB. Not as much as I would have hoped, but still useful considering I didn't have to review any of those emails for precious memories.

Final script