I really dislike ugly URLs. I know most people don’t particularly care about how a URL looks, but I do. Unfortunately, a lot of URLs I end up wanting to save, or send to someone else are way, way uglier than they need to be. They have a bunch of utm_ crap all around them.

For instance, https://500px.com/photo/152315787/mankins-by-forrest-mankins?utm_campaign=mankins-by-forrest-mankins&utm_medium=social&utm_source=500px works just as well as https://500px.com/photo/152315787/mankins-by-forrest-mankins, and looks way, way nicer.

Amazon URLs can be really nice and clean, but most of them are ugly monstrosities. This mess: http://www.amazon.com/Maltese-Falcon-Vintage-Crime-Lizard-ebook/dp/B004G5ZU32/ref=sr_1_1?ie=UTF8&qid=1433505888&sr=8-1&keywords=maltese+falcon+kindle could be as simple as http://www.amazon.com/gp/product/B004G5ZU32.

When I’m trying to save or share one of these URLs on my Mac, it’s a minor annoyance to have to go clean them up before I do something with them. On iOS though, it’s a nightmare. Text selection is super hard to do, and trying to get rid of this crap from the Safari address bar is even harder that hitting the tiny little x to dismiss notifications in Notification Center.

I went looking for a way to automate this on iOS. Turns out there is an app for that, but it doesn’t have a share sheet extension 😏. I then turned to Workflow, but didn’t find a workflow that could do this either. I pondered making my own workflow, but good lord, it’s way too complex for me.

I then pulled up the documentation on App Extensions hoping to be able to whip together a quick project in Xcode to do this. Before I went too far down that road though, I remembered that there’s actually a way to run Python on iOS, and trigger it via a share sheet extension: Pythonista! I’d bought the app on a lark years ago because I thought being able to run Python on iOS would be useful someday. I opened it up and hacked together a quick script to do exactly that:

#!/usr/bin/python

# coding: utf-8

import appex
import clipboard

from urlparse import urlparse, urlunparse, parse_qs
from urllib import urlencode


def get_asin(url):
    '''
    Amazon links can be simplified to have a structure of http://www.amazon.com/gp/product/<ASIN>
    See: http://leancrew.com/all-this/2015/06/clean-amazon-links-with-textexpander/
    
    This function attempts to parse an ASIN from a url, and returns one if found. None otherwise.
    '''
    split = url.split('/')
    for i, part in enumerate(split):
        part = part.strip()
        if 'dp' == part:
            try:
                return split[i + 1]
            except IndexError as e:
                print 'Unable to find ASIN'
                return None
        if 'gp' == part:
            try:
                if split[i + 1].strip() == 'product':
                    return split[i + 2]
            except IndexError as e:
                print 'Unable to find ASIN'
                return None


def should_strip_param(param):
    if param.startswith('utm_'):
        # Google analytics crap
        return True
    if param == 'ncid':
        # Tech crunch rss feed includes this thing
        return True
    return False


def clean_url(url):
    url = url.lower()
    parsed = urlparse(url)
    
    if 'amazon' in url:
        return urlunparse([
            parsed.scheme,
            parsed.netloc,
            '/gp/product/' + get_asin(url),
            parsed.params,
            urlencode({}, doseq=True),
            parsed.fragment
        ])
    
    qd = parse_qs(parsed.query, keep_blank_values=True)
    filtered = dict( (k, v) for k, v in qd.iteritems() if not should_strip_param(k))
    return urlunparse([
        parsed.scheme,
        parsed.netloc,
        parsed.path,
        parsed.params,
        urlencode(filtered, doseq=True),
        parsed.fragment
    ])


def main():
    if not appex.is_running_extension():
        print 'Running in Pythonista app, using test data...\n'
        url = 'https://500px.com/photo/152315787/mankins-by-forrest-mankins?utm_campaign=mankins-by-forrest-mankins&utm_medium=social&utm_source=500px'
    else:
        url = appex.get_url()
    if url:
        cleaned_url = clean_url(url)
        clipboard.set(cleaned_url)
        print 'Original url:', url
        print
        print 'Cleaned url (copied to clipboard):', cleaned_url
    else:
        print 'No input URL found.'

if __name__ == '__main__':
    main()

Now, I have a way to clean up cruddy urls 😀.

Unfortunately, extension scripts in Pythonista can’t open URLs, so for now I’m resorting to sticking the cleaned up URL on the clipboard 😏.

A few thoughts on using Pythonista:

  • It’s really quite cool that I can whip up python scripts and have them run on iOS.
  • Pythonista also has some pretty handy modules to make iOS specific stuff super easy.
  • Pythonista’s custom keyboard is more useful than I thought it would be for writing code on an iPhone.
  • Holy hell, typing code on an iPhone is a nightmare!
  • I wish there was a way for me to type code on my Mac, and be able to run it on Pythonista without having to copy/paste a bunch of times (iCloud/Dropbox sync). I believe the reason this isn’t supported has to do with sandboxing, but I’m not totally sure.