Startups

The Log: a building block for large-scale data systems

Monday, December 16th, 2013

A software engineer at LinkedIn has written a monster of a blog post about “The Log”, a building block for large-scale data systems. The concepts in this post are near and dear to my heart due to my work on precisely these kinds of problems at Parse.ly.

What is “a log”?

The log is similar to the list of all credits and debits and bank processes; a table is all the current account balances. If you have a log of changes, you can apply these changes in order to create the table capturing the current state. This table will record the latest state for each key (as of a particular log time). There is a sense in which the log is the more fundamental data structure: in addition to creating the original table you can also transform it to create all kinds of derived tables.

At Parse.ly, we just adopted Kafka widely in our backend to address just these use cases for data integration and real-time/historical analysis for the large-scale web analytics use case. Prior, we were using ZeroMQ, which is good, but Kafka is better for this use case.

We have always had a log-centric infrastructure, not born out of any understanding of theory, but simply of requirements. We knew that as a data analysis company, we needed to keep data as raw as possible in order to do derived analysis, and we knew that we needed to harden our data collection services and make it easy to prototype data aggregates atop them.

I also recently read Nathan Marz’s book (creator of Apache Storm), which proposes a similar “log-centric” architecture, though Marz calls it a “master dataset” and uses the fanciful term, “Lambda Architecture”. In his case, he describes that atop a “timestamped set of facts” (essentially, a log) you can build any historical / real-time aggregates of your data via dedicated “batch” and “speed” layers. There is a lot of overlap of thinking in that book and in this article.

full-stack

LinkedIn’s log-centric stack, visualized.

Read the rest of this entry »

How investors play the option

Tuesday, September 17th, 2013

Paul Graham put up a new essay, one of his longest, called “How to raise money”. It gives a good glimpse into the mind game that is startup financing.

I thought it was particularly interesting how he documented three different ways in which investors say “no”, without really saying no.

Read the rest of this entry »

Parse.ly Press Coverage – August 2013

Tuesday, September 3rd, 2013

Parse.ly made its funding announcement — a $5M series A, led by Grotech Ventures and with participation from FundersClub, Blumberg Capital, and ff Venture Capital. Read on for the full list of links to our coverage.

Read the rest of this entry »

Parse.ly: brand hacking

Saturday, July 20th, 2013

There’s some hoopla lately about “weird” startup names in the Wall Street Journal, with specific coverage of “.ly” domains in The Atlantic Wire:

The latest start-up boom has led to the creation of at least 161 companies that end in “ly,” “lee,” and “li,” which is, naming consultants tell us, 160 too many. There’s feedly, bitly, contactually, cloudly, along with a bunch of other company-LYS [...] and all but the first ever “ly” name are “just lazy,” Nancy Friedman, a naming consultant, told The Atlantic Wire.

Read the rest of this entry »

Building ships

Wednesday, July 17th, 2013

“If you want to build a ship, don’t drum up the people to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.”
- Antoine De Saint-Exupery

“We’re killing it”

Friday, July 5th, 2013

Good post today, A “Third Way” in Entrepreneurship, that discusses the “always be winning”, annoyingly positive veneer of most startup entrepreneurs. This is a community where many founders you meet always share their latest victory and pretend that failures rarely happen.

… entrepreneurs are pressured to maintain a totally positive face to the outside world about the state of their company. In San Francisco, “we’re killing it” is almost now an inside joke because of the ubiquity of that response when someone asks an entrepreneur how their company is faring. Most of these companies are not “killing it”, and the entrepreneurs probably know that.

There is also a nice comment thread discussing the “we’re killing it” phrase, a discussion to which I contributed an anecdote and interpretation.

The comment I added to the discussion:

A friend once relayed a story to me of a dinner meeting of ~20 early-stage high-tech executives he attended that was sponsored by a startup organization. The moderator asked one question as an ice breaker to kick off the night: “What is the greatest challenge that your startup faces today?”

My friend was the first one picked to share. Being a very level-headed guy (who personally hates the term, “killing it”), he suggested that one of his biggest challenges was maintaining work/life balance & personal relationships, for himself & also for his employees, so that they don’t burn out on the job.

The baton then got passed to the next entrepreneur, and, as my friend tells it, entrepreneur after entrepreneur shared their “greatest challenge”, though they were only “challenges” in the weakest sense of the word. For example: “handling all the new customers we have”, “scaling our servers for our massive user-base”, “hiring enough software engineers to keep up with the business growth”.

He realized then that every entrepreneur was “positioning” the answer to make it appear that the greatest challenge faced was dealing with the company’s illusory massive success.

I think this anecdote describes the “killing it” mentality quite well — even among peers and in a setting where people should be comfortable sharing their fears, this community prefers reality distortion.

Uninterruptability

Thursday, May 23rd, 2013

Paul Graham, in a footnote from his essay on “How to Make Wealth”:

One valuable thing you tend to get only in startups is uninterruptability. Different kinds of work have different time quanta. Someone proofreading a manuscript could probably be interrupted every fifteen minutes with little loss of productivity. But the time quantum for hacking is very long: it might take an hour just to load a problem into your head. So the cost of having someone from personnel call you about a form you forgot to fill out can be huge.

This is why hackers give you such a baleful stare as they turn from their screen to answer your question. Inside their heads a giant house of cards is tottering.

The mere possibility of being interrupted deters hackers from starting hard projects. This is why they tend to work late at night, and why it’s next to impossible to write great software in a cubicle (except late at night).

One great advantage of startups is that they don’t yet have any of the people who interrupt you. There is no personnel department, and thus no form nor anyone to call you about it.

PyCon 2013: The Debrief

Sunday, March 17th, 2013

PyCon US 2013 is over! It was a lot of fun — and super informative.

pycon_panorama

The People

For me, it was great to finally meet in person such friends and collaborators as
@__get__, @nvie, @jessejiryudavis, and @japerk.

It was of course a pleasure to see again such Python super-stars as
@adrianholivaty, @wesmckinn, @dabeaz, @raymondh, @brandon_rhodes, @alex_gaynor, and @fperez_org.

(Want to follow them all? I made a Twitter list.)

I also met a whole lot of other Python developers from across the US and even the world, and the entire conference had a great energy. The discussions over beers ranged from how to use Tornado effectively to how to hack a Python shell into your vim editor to how to scale a Python-based software team to how to grow the community around an open source project.

In stark contrast to the events I’ve been typically going to in the last year (namely: ‘trade conferences’ and ‘startup events’), PyCon is unbelievably pure in its purpose and feel. This is where a community of bright, talented developers who share a common framework and language can push their collective skills to new heights.

And push them, we did.

Read the rest of this entry »

Solidify your Python web skills in two days at PyCon US 2013

Friday, February 8th, 2013

PyCon US 2013 is coming up in March. It is in beautiful Santa Clara, right outside of Palo Alto / San Francisco.

The main conference is sold out, but there are still a few spots open for the tutorial sessions.

(Here’s a secret: the tutorials are where I’ve always learned the most at PyCon.)

Most of PyCon’s attendees are Python experts and practitioners. However, Python is one of the world’s greatest programming languages because it is one of its most teachable and learnable. Attending PyCon is a great way to rapidly move yourself from the “novice” to “expert” column in Python programming skills.

This year, there is an excellent slate of tutorial sessions available before the conference starts. These cost $150 each, which is a tremendous value for a 3-hour, in-depth session on a Python topic. I know of a lot of people who are getting into Python as a way to build web applications. There is actually a great “novice web developer” track in this year’s tutorials, which I’ll outline in this page.

Read the rest of this entry »

Smaller buckets and bigger thimbles

Saturday, December 8th, 2012

Just came across this essay I wrote on my morning commute from Long Island to NYC in 2007, while I was a software engineer for Morgan Stanley.

I was joking with some friends the other day that my “to read” list keeps growing every day, and it only seems like things are added but never removed. I made the following analogy: it grows by the bucket full and shrinks by the thimble full, to which my coworkers replied, “you need bigger thimbles and smaller buckets.” If only it were that easy.

Unfortunately, I’m not getting used to this 9-to-5 stuff even if it is only 9-to-5. The other day I watched a video of Andy Hertzfield (one of the original software developers on the Mac team at Apple) and he was talking about how when he was my age he would work 80 hour weeks and just poured his heart and soul and to work. And I thought: I can’t do that on my current project. Why should I?

Read the rest of this entry »