Open Source

Simple Lego Blocks for Big Data

Monday, November 30th, 2015

Data engineers should abstract their code in the most lightweight way possible to facilitate downstream integration in a large-scale data system.

You want lego blocks, not puzzle pieces.


The creators of the C programming language once famously said, “first make it work, then make it right, and, finally, make it fast.” This adage still applies today.

The difference is, we have tools to take working code and validate that it is right against reams of data. Many of these tools can also be used to make the working, right code run really fast across a cluster of machines, possibly even in real-time, as the data comes in.

But, making code work, then right, then fast, requires some discipline.

Read the rest of this entry »

Idiomatic Python Resources

Sunday, November 29th, 2015

Let’s say you’ve just joined my team and want to become an idiomatic Python programmer. Where do you begin?

Well, you can move up the learning curve quickly using resources from this blog:

I also have some good resources on web development with Python:

And on more advanced Python concepts, like dunders and functional programming:

Read the rest of this entry »

Programming: it’s weird

Sunday, June 14th, 2015

I read the Bloomberg piece, What Is Code?, an explanation of code artistry and programmer/hacker culture in 2015. I love this paragraph about “languages as liquid infrastructure”:

The point is that things are fluid in the world of programming, fluid in a way that other industries don’t seem to be. Languages are liquid infrastructure. You download a few programs and, whoa, suddenly you have a working Clojure environment. Which is actually the Java Runtime Environment. You grab an old PC that’s outlived its usefulness, put Linux on it, and suddenly you have a powerful Web server. Now you can participate in whole new cultures. There are meetups, gatherings, conferences, blogs, and people chatting on Twitter. And you are welcomed. They are glad for the new blood.

Java was supposed to supplant C and run on smart jewelry. Now it runs application servers, hosts Lisplike languages, and is the core language of the Android operating system. It runs on billions of things. It won. C and C++, which it was designed to supplant, also won. A lot of things keep winning because computers keep getting more plentiful. It’s weird.

Worse is better, is worse, is better, is worse, is better…

The 3 Best Python Books for Your Team

Saturday, June 6th, 2015

Python is the core programming language used at It also happens to be a quickly-growing language with wide adoption among open source projects. It’s no wonder it’s quickly becoming the leading language for software teams.

I’ve written a couple of blog posts with original material for learning Python, including “import this: learning the Zen of Python with code and slides” and “Build a web app fast”.

Newcomers to Python are often overwhelmed by the wealth of information, available online and in print, for the language. I am often asked by others, “What are the best books for my Python team?” I plan to answer that question with this post, by highlighting what I consider to be the three best Python books on the market today.

Read the rest of this entry »

Picking tech stacks

Sunday, May 24th, 2015

I realize now that one of the hardest parts of running a successful startup is “betting” on tech stacks that, 3 years out, will have a groundswell of community support around them.

It’s still shocking to me that when I chose each of the following technologies as a central part of, they were so new/immature as to not even show up on a Google search trends box, but are now very popular technologies.

Read the rest of this entry »

Web interest in Apache Storm, Kafka, Spark in the Python community

Thursday, November 27th, 2014

Apache Storm, Kafka, and Spark are gaining a lot of momentum in the data analysis and processing communities. I was curious whether the interest in using these technologies with Python, in particular, is growing. Based on these Google Trends reports, it seems like it is.

Read the rest of this entry »

Clojonic: Pythonic Clojure

Sunday, November 2nd, 2014

In June 2012, I promised myself that I’d learn Clojure “as a mind expander”. As a long-time Python programmer who has been using Python full-time in my work at, I wanted to explore. I wrote then:

I don’t know whether Clojure programs will be better or worse than equivalent Python programs. But I know they will be different.

It took me awhile, but in January of this year, I started teaching myself the language.

Rich Hickey, and the “Cult of Personality”

My approach was to first learn the underpinnings of the language from books and online videos. If you embark on this for Clojure, you will inevitably run into the copious publicly-available material from the language’s creator, Rich Hickey.

In stark contrast to Guido van Rossum in the Python community, Rich Hickey is undeniably not just the Clojure language’s creator, but also a kind of spokesperson for a functional programming renaissance. Guido van Rossum generally lays low and lets the Python language and community speak for itself, and tries to avoid controversy. To him, Python is just a popular tool he happened to create, and it doesn’t represent any major paradigm shift in programming. It’s a positive evolutionary improvement supported by a great open source ecosystem and community. To Hickey, however, “traditional” programming languages — but especially popular ones with an object-oriented focus, such as Java and C++ — are just plain wrong. He proposes Clojure as an antidote of sorts.

You can get the gist of this from his motivating videos, such as Hammock-Driven Development, Are We There Yet?, and Simple Made Easy. For a thorough overview of Clojure as a language, you can also get a walkthrough by Hickey, given to a room full of Java developers, in Clojure for Java Programmers Part I and Part II.

Here is a summary of the viewpoint. Most languages are missing some important attributes that can help us tackle the most complex issues in programming projects:

Read the rest of this entry »

Python annotations and type-checking

Saturday, August 16th, 2014

In 2010, the Python core team wrote PEP 3107, which introduced function annotations for Python 3.x.

Nearly 4 years ago, I wrote this response to the PEP, but I published it to a discussion site that ended up becoming defunct (Clusterify). I saw that recently, interest in function annotations for type-checking was revived by GvR, and thought I might resurrect this discussion.


There is a huge flaw with the creation of Python annotations, IMO. Lack of composability.

The problem only arises when you consider that at some point in the future, there may be more than one use case for function annotations (as the PEP suggests). For example, let’s say that in my code, I use function annotations both for documentation and for optional run-time type checking. If I have a framework that expects all the annotations on my function definition to be docstrings, and another framework that expects all the annotations to be classes, how do I annotate my function with both documentation and type checks?

This amounts to lack of a standard for layering function annotations. Is this really a problem?

It’s true that some standard for this could organically form in the community. For example, one could imagine tuples being used for this. If an annotation expression is a tuple, then every framework should iterate through the items of the tuple until they find an item of the matching type. However, this won’t always work: what if two frameworks are both expecting strings, or two frameworks are both expecting classes, with different semantics?

Read the rest of this entry »

5 years ago, I was bored

Friday, July 4th, 2014

I wrote this to a friend five years ago, a few weeks after I had quit my job to embark on the crazy ride that has been’s founding story.

You said to me, “I am glad that you left because you sounded unhappy there.”

But you know, I wasn’t exactly unhappy.

I was just bored.

I’m eager to work on my own stuff. I had a good work environment and I learned a lot. I was making money, had flexibility about hours and work from home, and was respected on my team.

But I had a couple of realizations. First, I didn’t see a future for myself in financial firms. I just don’t like their core business enough; in fact, I think their core business is somewhat superfluous and that financial firms should be way, way smaller than they are. They should make less money, have less power, etc.

Second, my specific project had this split personality. On the one hand, it wanted to be this cutting edge framework to really empower application developers throughout the company. On the other, it was a lost project — lots of code, lots of ideas, but no solid product and no real customer.

Read the rest of this entry »

streamparse: Python + Apache Storm for real-time stream processing

Sunday, May 4th, 2014 released streamparse today, which lets you run Python code against real-time streams of data by integrating with Apache Storm.

We released it for our talk, “Real-time streams & logs with Apache Kafka and Storm” at PyData Silicon Valley 2014.

An initial release (0.0.5) was made. It includes a command-line tool, sparse, with the ability to set up and run local Storm-friendly Python projects.

Read the rest of this entry »