Find mass deletions to free up Gmail space

Skip to the final script for a way to find the largest groups of emails in your inbox. I freed up about 10% space just by deleting old notification / newsletter type emails.

Cleaning up my Inbox

My Gmail inbox is getting full

/images/gmail_cleanup/gmail_space_indicator.png

and Google often lets me know:

/images/gmail_cleanup/gmail_space_warning.png

Although Google offers some help cleaning up the cruft -- it's not really in their interest to make the tools too useful.

/images/gmail_cleanup/gmail_cleanup_tool.png

Their list allows you to delete emails one by one and is similar to searching larger:10M in gmail. You'll eventually clean up space one by one... can we do this faster? What about all those newsletters, notifcations from linkedin, messages from facebook, etc... that are small individually but together take up megabytes of space?

Read more…

Performance tuning with Linux scheduling policies

Scheduling is NP-hard. Even if we could predict the future and see how long every process is going to run, it would still be a challenge to achieve optimal scheduling. In a real world system, we have nanoseconds to make scheduling decisions that effect thousands of threads handling wildly different tasks such as handling a button press on a menu to copy files to a flash drive.

What we really want is for our systems to have low latency for real time interactions while maintaining maximum throughput for CPU heavy tasks. Let's how well a Linux system handles a specific scheduling scenario and how the behavior can be tuned.

First construct a sample set of 3 programs to schedule on a 2 core system:

  • Worker A and B: Two tasks with real time latency requirements each pinned to a specific core. This could be a task running a UI or a control loop.

  • Worker C: One CPU heavy "background" process.

Ideally C is running whenever A or B isn't running on one of the cores. The timeline of system execution would look something like this:

/images/cpu_balancing_goal.svg

Read more…

Speeding up boto3 list objects

Boto3 is Amazon's official AWS SDK for python. Unfortunately a fairly common operation, listing objects, is slow! See this final resulting gist for a way to get a roughly 2x speedup.

Let's profile the way to list objects that is given as an example in the documentation.

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
entries = []
for obj in bucket.objects.all():
  entries.append(obj)

On my machine this operation takes about 2.5 seconds for a fairly small bucket (7500 objects). The story is more interesting if we use FunctionTrace and Firefox profiler to see the timeline of execution (cropped and zoomed to the an interesting region):

/images/list_objects/listing_timeline_cropped.png

Read more…