Find mass deletions to free up Gmail space
Skip to the final script for a way to find the largest groups of emails in your inbox. I freed up about 10% space just by deleting old notification / newsletter type emails.
Cleaning up my Inbox
My Gmail inbox is getting full
and Google often lets me know:
Although Google offers some help cleaning up the cruft -- it's not really in their interest to make the tools too useful.
Their list allows you to delete emails one by one and is similar to searching larger:10M
in gmail. You'll eventually clean up space one by one... can we do this faster?
What about all those newsletters, notifcations from linkedin, messages from facebook, etc... that are small individually but together take up megabytes of space?
I have tons of email like these in my inbox:
LinkedIn <Person> shared their thoughts Amazon.com Your Amazon.com order ... Credit Card You are eligble ...
There is no way aggregate this sort of data in the Gmail UI. Here is how we can accomplish this:
Export email data to
.mbox
file using Google takeoutAnalyze emails and categorize by sender or subject using python
Delete the largest useless groupings
I've written the code for step 2 already, lets see how it works:
Opening up the mbox file
Luckily python has the builtin mailbox library for reading .mbox
files.
import mailbox mbox = mailbox.mbox(mbox_path)
Iterating over emails
Iterating over emails is similiarly easy. There are tons of ways to query each message.
We will just consider From
and Subject
, as those are enough to identify the most newsletters and other automated emails.
for message in mbox: print(message["From"]) print(message["Subject"])
It can take quite some time to open up the mbox file, but eventually we get output like:
AT&T Account Management <update@emaildl.att-mail.com> Your AT&T online bill is ready to be viewed Venmo <venmo@venmo.com> X Y paid your $10.00 request ...
Gather Statistics
We can use another python builtin: Counter
to make this easy as well:
from collections import Counter sender_counts = Counter() sender_sizes = Counter() for message in mbox: # TODO: normalize sender, this field looks like "Name <email@domain.com>" sender = message["From"] sender_counts[sender] += 1 sender_sizes[sender] += len(message.as_bytes()) # ... repeat for other metrics
Print Results
The Counter
class has a way to pull out largest counts in the email
top_list = counter.most_common(15) max_length = max(len(key) for key, value in top_list) for key, stat in top_list: print(f"{key:{max_length}.{max_length}} : {stat}")
Running this on a truncated .mbox
shows we are starting to get interesting results:
me@gmail.com : 520 venmo@venmo.com : 104 ...
me@gmail.com : 1242321 venmo.com : 919305 ...
Final results
The final result is saved in this gist. I added some niceties based on running this on larger amount of data :
Nicer print formatting
Progress indication
Remove sender name from the 'From' field because some senders change their name
The final output from one of my mailboxes is:
Parsing mbox... this could take some time 1000 / 94992 ... Most common senders: <me>@gmail.com : 5946 store-news@amazon.com : 2099 notification+hk7u3_-m@facebookmail. : 1684 no-reply@woot.com : 1598 venmo@venmo.com : 1311 <person>@<school>.edu : 1279 ... Most common sender domains: gmail.com : 16903 <school>.edu : 14562 <school>.edu : 10875 amazon.com : 5067 facebookmail.com : 2336 linkedin.com : 1887 woot.com : 1784 venmo.com : 1312 ... Most common subjects: : 1097 =?utf-8?Q?Woot=20Daily=20Digest?= : 993 Re: : 803 =?utf-8?Q?Tools.Woot=20Daily=20Dige : 506 Alistair, please add me to your Lin : 322 ... Top summed-size subjects: : 997.018524 MB Re: : 667.143983 MB Fwd: : 138.824105 MB no subject : 128.076811 MB You have a postcard from <person> : 118.236279 MB Re: : 111.424229 MB ... Top summed-size senders: <me>@gmail.com : 2941.191349 MB <person>@<school>.edu : 475.729489 MB ... <person>@gmail.com : 173.501381 MB store-news@amazon.com : 157.440193 MB <various>@facebookmail.com : 55.942341 MB ... Summed size: 10526.607168 MB
These results are a bit skewed because I had already tried to clean out my inbox manually (e.g. delete all emails from facebook) but some trends still show up:
I am the biggest offender! Turns out I had been emailing tons photos to people over the years. Almost all of these sent emails could be deleted because I have these photos uploaded elsewhere
I can delete
from:store-news@amazon.com older_than:1y
-
no-reply@woot.com
doesn't take up much space, but I can still delete those just to reduce the number of emails to wade through. Same with some of the others.I'll be using this list to unsubscribe from email lists!
<person>@<school>.edu
has sent me lots of PDFs for school assignments over time
After deleting all the newsletter and now useless notification emails, I regained about 1 GB. Not as much as I would have hoped, but still useful considering I didn't have to review any of those emails for precious memories.
Comments