Programming

Fully Distributed Teams: are they viable?

Monday, May 14th, 2012

It has become increasingly common for technology companies to run as Fully Distributed teams. That is, teams that collaborate primarily over the web rather than using informal, face-to-face communication as the main means of collaborating.

This has only become viable recently due to a mix of factors, including:

  • the rise of “cloud” collaboration services (aka “web 2.0″ software) as exemplified by Google Apps, Dropbox, and SalesForce
  • the wide availability of high-speed broadband in homes that rivals office Internet connections (e.g. home cable and fiber)
  • real-time text, audio and video communication platforms such as IRC, Google Talk, and Skype

Thanks to these factors, we can now run Fully Distributed teams without a loss in general productivity for many (though not all) roles.

In my mind, there are three models for scaling number of employees in a growing company in the wild today. These are:

Read the rest of this entry »

Speed and lightness

Monday, May 7th, 2012

Last week, I decided to give myself the present of a 512GB SSD drive, which was available at a nearly 25% discount on NewEgg for a limited time.

The price-per-gigabyte for SSDs has finally fallen to nearly $1/GB, and the rewrite cycle problems that used to afflict these drives is now becoming a non-issue with the Linux kernel’s TRIM implementation and the updated firmware on these drives.

So, I took the plunge. My main development workstation was a Thinkpad T400, maxed out to 8GB of RAM, and with dual 500GB platter drives (via Thinkpad’s excellent Ultrabay extension). I was running Ubuntu 10.10 for a long time. I timed the SSD purchase with the release of Ubuntu 12.04 LTS — 10.10 no longer being supported, I figured I’d do a clean install on the new SSD and clean up my development workstation for the first time in a couple of years.

A couple of things occurred to me in this process. First of all, since 2009, I have moved more and more of my data into cloud services. I have moved the lion’s share of my “business and personal documents”, including photos, into Dropbox with my 50GB account. I have moved my Music into Ubuntu One Cloud, due to its excellent streaming service for iPad, iPhone, and Android (which I also use from my Macbook Air via a Fluid App). And I have moved my truly old files and digital keepsakes into NAS drives that I host in my little server room at home.

Read the rest of this entry »

On multi-form data

Thursday, April 19th, 2012

I read an excellent debrief on a startup’s experience with MongoDB, called “A Year with MongoDB”.

It was excellent due to its level of detail. Some of its points are important — particularly global write lock and uncompressed field names, both issues that needlessly afflict large MongoDB clusters and will likely be fixed eventually.

However, it’s also pretty clear from this post that they were not using MongoDB in the best way. For example, in a small part of their criticism of “safe off by default”, they write:

We lost a sizable amount of data at Kiip for some time before realizing what was happening and using safe saves where they made sense (user accounts, billing, etc.).

You shouldn’t be storing user accounts and billing information in MongoDB. Perhaps MongoDB’s marketing made you believe you should store everything in MongoDB, but you should know better.

In addition to that data being highly relational, it also requires the transactional semantics present in mature relational databases. When I read “user accounts, billing” here, I cringed.

Read the rest of this entry »

XDDs: stay healthily skeptical and don’t drink the kool-aid

Sunday, February 12th, 2012

On my LinkedIn profile, I list one of my skills as “thought-driven development”. This is a little tongue-in-cheek; software engineering over the last few years has developed a lot of “XDDs,” such as test-driven development, behavior-driven development, model-driven development, etc. etc.

“Thought-driven development” doesn’t actually exist, but by it, I simply mean: perhaps we should think about what we’re doing, rather than reaching for a nearby methodology du jour.

In my last job, a colleague of mine used to also joke about “design-driven design” — perhaps the ultimate play on the XDDs since it is also a strange loop.

All this is not to say the XDDs aren’t useful — they definitely are. A lot of them have spawned entire groups of cross-platform open source projects. I am all for anything that makes the adoption of XYZ best practice easier for my team. But these techniques often require some lateral thinking to get to any real benefit.

When evaluating technologies like this, you have to take each little community with a grain of salt. Almost every programming framework / methodology / etc. that exists portends to offer some order of magnitude increase in software reliability / developer productivity / whatever else. And almost all, if not all, fail to do so, in practice.

Here is one anecdote to illustrate the point. From 2006-2008 at Morgan Stanley, the entire corporation was obsessed with Java’s Spring framework and its core “architectural pattern”, Inversion of Control. I can’t even begin to explain to you the number of man-hours that were wasted re-architecting existing, working software to meet this chimerical conception of component decoupling. I even contributed to this, urged by the zealots and their blind faithful. All of the reasons seemed great: decoupling code, using more interfaces, allowing for easier unit testing, being able to “rewire dependencies” and use fancy technologies like “aspect-oriented programming”.

Even Google got swept up in the madness and developed their own, competing framework called Guice. And in the end — after all that work — my diagnosis is that IoC is basically a non-starter, a complete waste of time.

A complicated framework that morphed into a programming methodology, developed exclusively to work around some annoying limitations of the Java language. Since it was applied without thinking, now everyone’s Java code has to suffer, and you can hardly pick up a Java web application today without being crushed by the weight of its IoC container’s XML configuration files. (Nevermind that most other communities, such as Python’s and Ruby’s, have hardly a clue what IoC is all about — a good enough indication that it is a waste of time.)

Every framework and approach should be judged on its true merits, that is the true cost/benefit analysis of applying that particular technology. Will it save us time? Will it simplify — not complicate — our code? Will it make our code more flexible / adaptable? Will it let us serve our users and customers better?

I regularly go back to old classics like the Mythical Man-Month to remember that nothing we do in software is truly new. I highly recommend you read it, and also its most famous essay, “No Silver Bullet”.

tl;dr stay healthily skeptical, don’t drink the kool-aid.

8 years ago today, I wrote this in a bug report

Sunday, February 5th, 2012

Me: Thanks so much for your fix to my issue. My friend, who majors in business, once told me that I should no longer major in Computer Science because “programming is like banging your head against the wall repeatedly, but with less reward.” I find that to be a rather rash dramatization, but I know in dealing with bugs as subtle as these it may feel that way. I hope at least the end-result is rewarding for you.

Programmer: Are you a Computer Science major? If so, don’t let your friend discourage you. Just ask him about “head banging” when those business majors find that their product development and marketing efforts fail to work after spending millions of dollars.

Me: Yes, I’m a CS major. And you’re right — the reward is great in software, and the cost of building an useful product is relatively minimal. That is one of the reasons I chose this path. It’s why I love helping out honest, intelligent developers such as yourself in any way I can. I have found that hardworking CS majors who are not only better programmers, but more often than not better thinkers and better managers — if you’d give them the chance. I am happy in my decision, and still have that naivete that perhaps I can change the industry a bit, shake things up, come up with an idea that changes everything, innovate in whatever way I can. Big aspirations; we’ll see what happens. For now, I’ll just keep respecting the good software I find in the world, such as yours.

The C++ trap

Thursday, November 3rd, 2011

I came across this wonderful piece of historical retelling by David Beazley, one of my favorite Pythonistas and the author of Python Essential Reference. Here is a man who conquered C++ in just about every way, but ultimately found himself trapped in its byzantine complexity, only to escape by way of Python.

Swig grew a fully compatible C++ preprocessor that fully supported macros. A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization. Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface. Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers.It could wrap overloaded C++ methods/function. Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses.

Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system [...] This whole pattern matching approach had a huge power if you knew what you were doing [...]

In hindsight however, I think the complexity of Swig has exceeded anyone’s ability to fully understand it (including my own). For example, to even make sense of what’s happening, you have to have a pretty solid grasp of the C/C++ type system (easier said than done). Couple that with all sorts of crazy pattern matching, low-level code fragments, and a ton of macro definitions, your head will literally explode if you try to figure out what’s happening. So far as I know, recent versions of Swig have even combined all of this type-pattern matching with regular expressions. I can’t even fathom it.

Sadly, my involvement was Swig was an unfortunate casualty of my academic career biting the dust. By 2005, I was so burned out of working on it and so sick of what I was doing, I quite literally put all of my computer stuff aside to go play in a band for a few years. After a few years, I came back to programming (obviously), but not to keep working on the same stuff. In particularly, I will die quite happy if I never have to look at another line of C++ code ever again. No, I would much rather fling my toddlers, ride my bike, play piano, or do just about anything than ever do that again.

From this Swig mailing list entry.

import this: learning the Zen of Python with code and slides

Saturday, October 29th, 2011

It’s hard to find me gushing more unapologetically than when I talk about the virtues of my favorite programming language, Python.

Indeed, my life for the last 3 years has been dominated by the language. In many ways, pursuing a startup and enduring the associated financial hardship was partially because I had become frustrated with using Java in my full-time work and wanted to convert hobby projects I was building outside of work hours into full-fledged projects.

Read the rest of this entry »

Upcoming: standing desk setup, Python training, Groovy/JavaScript articles

Sunday, August 28th, 2011

I’ve been quite busy with work lately, so haven’t had time to send a few posts toward my blog. However, I have been working on some spare time and work-related projects that I’d love to share with everyone here.

Among them:

  • Lifehacking through standing desks. I have created a standing desk setup for my home office, and my investors in Parse.ly have created a healthy startup office in NYC. I have some thoughts about this as it applies engineering and hacking to the thing many of us do most: sit in a desk chair all day.
  • 2-day Python Training Course. I have created a 2-day Python training course that helps existing programmers learn Python by comparing it very directly to languages they may already know, like C and Java. I gave this course to a group of government employees a few months ago. It has some interesting characteristics: I designed the whole course using ReStructured Text (ReST) and compiled it into a web-based presentation. This means that the entire course has the potential to be “open source” — exercises, slides, and all. I plan to release this to the public. For now, I am just clearing a few of the images I used in the course to make sure I don’t inadvertently infringe copyright. After that, I will open to the public.
  • Groovy/JavaScript articles go public domain. Some articles I wrote for GroovyMag and JSMag last year are now able to be published in the public domain. These include one about RESTful services with Groovy, one that talks about functional programming with JavaScript, and finally one that discusses a design for “metagrids” in ExtJS 3.x. I will put all three articles up on this blog once they are reformatted.

Stay tuned!

Groovy, the Python of Java

Saturday, April 9th, 2011

I was a bona fide Java programmer for 5 years before I started working on Aleph Point and Parse.ly. I truly believe that Python and JavaScript are fundamentally better languages than Java for a variety of reasons born out of experience with each of them. (Note: Before this gets marked as flamebait, please notice that not only was I Java programmer for more than 5 years, but I was also a Java open source contributor!) I have enormous respect for the Java open source community, which has produced some of the highest quality modules available anywhere.

Now, don’t get me wrong — Python also has batteries included, and usually, when I think that I’m missing a great module I used to use in Java, it already exists in a much more powerful form in Python’s Standard Library or the wealth of modules on PyPI, GitHub, and Bitbucket. However, I believe in not reinventing the wheel, and so if a great open source tool exists in Java, I will want to interact with it.

One of these modules which we use extensively at Parse.ly is Apache Solr, and its surrounding Lucene project modules. Lucene is an extremely mature framework for document indexing, and Solr is a powerful server-ization of that technology that fits well into complex, mixed language distributed systems. I know there are efforts — like Whoosh — to build fast search engines atop the Python language. And I applaud these efforts — more projects means more competition, and more competition means better products. However, I still believe that you go with the best of breed tools available for production software, and you try not to let religious arguments about programming language get in the way.

Lately, I have come across more and more Java open source projects that have no equivalent in Python, and which I would like to access. Knowing that I wanted to feel comfortable incorporating Java open source projects — beyond Solr, which was already nicely wrapped as a web service — I, at first, thought that I’d be forced to still live among the weeds of complex class and interface definitions, cumbersome Java IDEs, XML configuration files, and (IMO) time-wasting rabbit holes like dependency injection, configuration management, and classpath hell. And then I found Groovy.

Read the rest of this entry »

Pythonic means idiomatic and tasteful

Wednesday, November 3rd, 2010

Pythonic isn’t just idiomatic Python — it’s tasteful Python. It’s less an objective property of code, more a compliment bestowed onto especially nice Python code. The reason Pythonistas have their own word for this is because Python is a language that encourages good taste; Python programmers with poor taste tend to write un-Pythonic code.

This is highly subjective, but can be easily understood by Pythonistas who have been with the language for awhile.

Here’s some un-Pythonic code:

def xform(item):
    data = {}
    data["title"] = item["title"].encode("utf-8", "ignore")
    data["summary"] = item["summary"].encode("utf-8", "ignore")
    return data

This code is both un-Pythonic and unidiomatic. There’s some code duplication which can very easily be factored out. The programmer hasn’t used concise, readability-enhancing facilities that are available to him by the language. Even lazy programmers will recognize this code’s clear downsides.

Read the rest of this entry »