Exercise in Aysnc Web Image Scraping

Motivation

For the past five months my wife and I have been building our new house here in Chattanooga. A new house is basically a blank canvas. Each room and area of the house has its own requirements and design. This is very overwhelming and I am NOT an interior designer. Most people would take to Pinterest for their design inspiration, but for me inspiration came from Freshome. Every day I would comb through the latest articles, download one by one the images that inspired me the most. Needless to say this was time consuming. I wanted to be able to automate this process. It was time to crack open SublimeText 2.

Requirements

The requirements that were quite simple:

  • Grab latest 10 articles
  • Download each ‘inspiration image' for a given article asynchronously

Dependencies

Making it Better?

I’m always looking for advice and tips to improve my python scripts. The ease and strength of language has made developing these types of utilities extremely fun. I would really like to hear people’s thoughts regarding the multithreading or making the parse selector better.

FH_ImageDownloader

Export SimpleGeo Layers with Python

Urban Airship has decided to shut down the SimpleGeo services that it acquired by March 31st, 2012. I personally see this as step-backwards for the services that they offer and to the SimpleGeo customer’s who rely on the data.

* stiQRd (I am lead developer) uses it for all of our location information.

What was most shocking to me after their announcement wasn’t the cut-off date or the fact that Urban Airship had decided to sunset the service but the fact that they “punted” the solutions for helping existing customer’s migrate their data.  Granted they did offer semi-replacements for Storage, but it basically amounts to you rolling your own solution.  That is why we were using SimpleGeo in the first place...because we didn’t want to roll our own because of limited time and resources.  

SimpleGeo was so easy to use and the company had great support. It was a no brainer.

In order to prepare and update the stiQRd app before the sunset date I need to get an export of my data.  Unfortunately, there isn’t an easy way to export layer data programmaticly or through the console.  I posted a question to the Google Group and got some half-ass response that didn’t help whatsoever. Not to be derailed I was hacking together a few solutions, none of which I really liked.  I then came across this gem (I know...it is bad) of a Ruby script that will iterate through all of your layers and export that data to csv. Score!

The tips to retrieving all your data from a particular layer are two fold. SimpleGeo’s API doesn’t have a “get_all_records(layer_name)” type of method, but they do have “get_nearyby”, which at face value doesn’t look like it will give you the necessary results, but it does when you use (0,0) as the lat/lon and add in the “bbox” (boundary box) parameter values of ‘-90,-180,90,180’.

Unfortunately, I don’t know Ruby very well and I am spending a lot of my free time sharpening my Python skills. Since there wasn’t an implementation for export in Python I decided to port the Ruby script.  With the help of Bob “I am a PEP freak” Waycott, we have a pretty elegant solution.

I am also happy to repost that Parse has taken it a few steps further and offered up the simplest/elegant solution for migrating this data over to their backend service.  If you are looking for a turnkey solution then head over their for their migration tool.

If ANYONE has suggestions or updates please let me know.

Python Migration Script: (written by Cory D. Wiles and Bob Waycott)

Fast String Concatenation

Besides collection/list iterations, string concatenation is probably the most common developer task. Doesn’t matter the language.  Unfortunately, this is done incorrectly, especially with large strings.  I myself am guilty of taking for granted what goes on under the hood when performing this operation. After reading this article I found myself wondering about the mutability of strings in Ruby. Strings are natively mutable so you don’t have as much performance hit when doing basic concatenation, unlike Python which treats strings as immutable, thus “+=” performs a copy-on-write. No efficient at all. For small string concats doing the standard: “+=” or “+” is still the shortest distance between two points, however manipulating a string of any significant size, what seems to be the most efficient across Python and Ruby is to add each separate string into an array and then join.

Examples:
Ruby: ** Thanks to Travis Dunn for codev

Python

Do to the fact that I am doing more and more large scale projects performance is more important than ever and paying close attention to these little details allow for performance boots very quickly.

Dynamic Model Update and Using Google App Engine

A few weeks ago I presented at the Mobile Technology for Teaching & Learning.  My talk was how to leverage service-oriented-architecture and objective-c runtime functionality in your enterprise applications.  In my demo app I had two json files to use as my datasource so that I wouldn't have to rely on an internet connection.

*I was glad that I did because I had lots of problems with it.

After that talk was over I pushed all the code to github with the goal of modifying the project to use an actual web service to show a real world example of how it works. In order to do that I needed to first create the web service.  Up to this point in my development career I have been using PHP for the service layer with CouchDB as the datastore.  With this particular project I wanted to take things in a different direction.  A few years ago I started tinkering with Python and have really grown to love the language. Unfortunately, I haven't been in a situation to were I could use it in any project beyond a few helper scripts (mainly because of me being a novice and my tight time constraints). I wasn't bound by these restrictions for this project.  What I wasn't looking forward to was doing the server side, "infrastructure" setup to get things going.  I wanted to focus on the development.

Enter Google App Engine to the rescue!  Setting up an account and application with GAE is out of the scope of this post, but Google makes it very easy to do…especially if you are focusing on Python.  I knew that scaling wouldn't be an issue nor would the data integrity. Google is somewhat good at both of those.

I was able to hack out a basic web service within a day.

My biggest roadblock came with handling class instances into JSON.  The awesome community of stackoverflow.com had it covered.

The demo app is modified to use these endpoints for data via GCD.  I also added a class extension for NSDictionary that simplifies the JSON parsing logic, thus removing a lot of the boilerplate code that I had in there originally.

I will be adding in some more enhancements and abstracting the code over the next few weeks.  As always, if you have any comments or suggestions please don't hesitate to contact me