This week, I was able to spend some serious time writing and optimizing my first server. I’ve worked with plenty of back-end stuff before, but usually just some PHP helping a website run. This time, I built a complete API service using Dart, my programming language of choice these days.
The Project – Build an API
The server I needed to write was to help Clickspace TV handle reams of data, with the immediate need being Instagram data. Currently, all of our clients run a webapp that calls out to Instagram directly for API calls. That’s not sustainable, so I needed an intermediary to cache responses on a server we controlled. The bonus feature planned was to extend the Instagram API for our needs. First, I needed a way to flag Instagram media as inappropriate so none of the Clickspace TV clients would ever see it. Second, I wanted a way to get a list of Instagram media sorted by ‘most popular’ for a particular hashtag. So I wound up sketching out a server that would accept requests from our clients, go out to Instagram’s API, stash the results in a database and cross-reference with some other data, and return custom-parsed Instagram objects, sorted in a dozen different ways.
Prior to Day 1
Before I got to writing my own custom code, I went through the excellent Hello World tutorials provided by Google. In the first part, I set up my development environment, which honestly I drag my feet on every time. In the second part, I got Hello World working both locally, and running in the Google App Engine online. I don’t have a computer science degree – nothing even close – so I typically don’t get this far, this fast. I burn out on terminology I don’t understand, and how-to guides telling me something is ‘simple’ when it’s a foreign language to me. Kudos to the Dart team for writing so well.
Monday – Research on Limitations
Looking around the Internet, building a RESTful API with Dart was best with the rpc package. I also had a need for a database, and the Hello World examples had already heavily influenced my decision to use the Google App Engine, through the gcloud packages.
I spent most of the day working with the rpc examples, and re-purposing their demo code to gain an understanding of how it worked. The first brick wall I hit was return types. The API can return pretty much any object, as long as it’s not too simple, and not too complex. It can return a list of strings, but not a single string. A bunch of booleans, ints, doubles, Strings and more, but not any one individually. And objects that stored less-than-basic variables, like Type, failed as well.
The gcloud database had it’s own restrictions – anything stored in the database tied to your object had to be a simple variable. It could store a ‘Key’ to a separate database object, but the Key had a variable that rpc didn’t support on it (note: rpc has since been updated with the ability to ignore variables, after I brought this to the project owner’s attention). So between rpc and gcloud, I couldn’t store or return an Instagram object in the way that Instagram returned them.
Tuesday – Working Within the Confines
After exploring the limits on Monday, I came to work Tuesday ready to build my very flat data structures. Just a class wrapper, and then simple variables, or lists of simple variables. The work was mostly slogging through the Instagram data structures and mapping them to variables in my custom class. All tests went well, and I was able to bang out the api calls and database code to boot.
Each time someone needed data, the server would go through five steps:
- Verify the request (and impose some practical limits)
- Check the database to see if the request had been run recently
- Run the request to the Instagram API (if allowed)
- Parse the results and store them in the database, while deleting old records
- Finally, after the database had become the official point of record, query the database and return the results.
Everything was working as expected – requests took a few seconds each, but I was very happy with the results.
Wednesday – The $6,000 Test
I sent the morning working on the client-side version of this API – how would my clients actually use this thing. The code was very simple and straightforward, since I use Dart client-side as well. I copied and pasted a lot, and ran my tests. Success.
I built out a slide for ClickspaceTV in HTML. It currently exists as a Flash slide, so the design and logic was already mostly decided. I knocked out a slide that would look for brand new Instagram posts and display them as they came in. Simple.
Local testing went great, but I was only looking at one hashtag. With all of our clients, there are over 100 hashtags, and another 100 location ids to query. So I put it to the test, letting my local server query 200 responses at once.
Within the hour, it had maxed out my budget. The queries I were sending took upwards of six minutes to complete, and hit the database so often and so hard that Google had turned me off until I committed to more money. I looked at the logs, and calculated what this server would cost, and realized that I had built something that would cost $6,000 to run every month.
I was devastated. This was a huge setback. Everything had gone so well, so what had I done wrong?
Thursday – Complete Rewrite
Pouring over the logs, the data, and the documentation, I found my many, many errors. I started over.
The biggest problem was that my Instagram Object was taking up too much space. I had created a database column for every single variable, and after re-reading the docs, I discovered that getting the contents of each column took a separate ‘read’ operation – and ‘write’ was even more expensive. I went from maybe 40 or 50 variables to five – three indexed variables to assist with queries, 1 bool, and a JSON string for the remainder. Whatever my server didn’t need for calculations or sorting was serialized. I did that for all of the objects – one class got reduced to two variables. I opted to deserialize them into working classes before sending objects back to the client, but I could have gone either way. It took all day, but I just barely squeaked in a working second version to test.
Friday – Optimizing
Based on the live tests overnight, I calculated that I had the server cost down to about $120 a month or so – down from $6,000. I made more changes – rewriting code to access the database less and less, only when absolutely necessary. I used some local variables to track changes. I focused on making sure I didn’t hit the database with any query that wasn’t absolutely necessary.
Then I found the dashboard page that showed how much I was actually spending – not just an estimate I was calculating based on the numbers I was seeing. As of this writing, I am going to be billed $0.16 for today’s queries. Firing on all cylinders across all clients, it’ll be maybe 4 to 6 times that – still under a buck a day.
But I can’t seem to find what I’m actually being charged for ‘instance hours’ – what it costs to keep this server on. Google documents a laundry list of prices for instances, but I believe I’ll be running at 2 cents an hour, per instance. It seems to want me to run two instances at all times, so that’s another $0.96 a day, if I’ve got it all right. So this server is going to run me about $60 a month. (I will add a paragraph below at a later date once I get better pricing data)
I feel like Google App Engine is very fairly priced, but the ‘instance hours’ price per hour may put it out of reach for super-small deployments. But I appreciate learning first hand the importance of serializing your database objects, I’ll be doing that by default from now on.
I’ll have more to say about this project at a later date, after we’ve put it through it’s paces.