“So, you work in IT?”

August 18th, 2014

For many years, IT as a field was dominated by people who could not write code.

This is because computer technology was mystifying and befuddling to most people that anyone who knew merely how to use computers with any level of comfort could demand a tax from those who didn’t.

During that same period (late 90s and early 2000′s), programming itself was being commoditized by offshore outsourcing, so the same IT people were positioning themselves for management positions. This is how MIS (Management of Information Systems) became a popular career path among the IT elite, and why when I was in college in 2002-2006, Comp Sci enrollment was at a major low.

Read the rest of this entry »

Python annotations and type-checking

August 16th, 2014

In 2010, the Python core team wrote PEP 3107, which introduced function annotations for Python 3.x.

Nearly 4 years ago, I wrote this response to the PEP, but I published it to a discussion site that ended up becoming defunct (Clusterify). I saw that recently, interest in function annotations for type-checking was revived by GvR, and thought I might resurrect this discussion.

Background

There is a huge flaw with the creation of Python annotations, IMO. Lack of composability.

The problem only arises when you consider that at some point in the future, there may be more than one use case for function annotations (as the PEP suggests). For example, let’s say that in my code, I use function annotations both for documentation and for optional run-time type checking. If I have a framework that expects all the annotations on my function definition to be docstrings, and another framework that expects all the annotations to be classes, how do I annotate my function with both documentation and type checks?

This amounts to lack of a standard for layering function annotations. Is this really a problem?

It’s true that some standard for this could organically form in the community. For example, one could imagine tuples being used for this. If an annotation expression is a tuple, then every framework should iterate through the items of the tuple until they find an item of the matching type. However, this won’t always work: what if two frameworks are both expecting strings, or two frameworks are both expecting classes, with different semantics?

Read the rest of this entry »

Improving a surface interpretation of “big data”

July 27th, 2014

A silly little piece appeared in The New York Times discussing a hypothesis of a Harvard economics professor that Apple might slow down its operating system ahead of major product releases in an attempt to encourage consumers to upgrade.

One of his students used Google Trends data to investigate this hypothesis. In the article, two graphs are compared — one that shows Google Trends search volume for “iPhone Slow” and the other for “Samsung Galaxy slow”.

iphone_slow

It is shown that the spikes in searches for slow operation of Apple’s products seem to correlate with new iPhone release dates, whereas there are no search spikes in the data for the Samsung Galaxy.

samsung_galaxy_slow

These graphs are horribly misleading on their own. Both products have grown in popularity over the years, so the increase in search volume over time reflects nothing more than their widespread mainstream popularity. This could have easily been removed from the graphs by adjusting these trendlines relative to the “base” searches, e.g. “iPhone” and “Samsung Galaxy”. In the graphs as shown, it’s hard to tell whether little spikes are actually hidden within the compressed and precise trendline for the Samsung Galaxy.

Read the rest of this entry »

Joel Spolsky’s business operating system

July 24th, 2014

Joel Spolsky wrote his first blog post in a year today, announcing Trello, Inc., a spin-off company for the successful project management product Fog Creek Software developed, Trello.

Trello has announced a $10M+ venture financing round and they are going to expand the product and team. This was a bit of a surprise to me, because Spolsky had always been critical of the VC-funded tech startup industry on his blog over the years.

But in the blog post, he explains, he has really been critical of the kind of company that this industry typically breeds. So, he has made his life’s work creating companies with a different “operating system” altogether.

highNotes

Read the rest of this entry »

Delta customer service: exclusions may apply

July 13th, 2014

Some friends of ours invited us to spend 4 days with them in Paris in late August. They had some rooms free in an apartment they had rented and so all we needed to do was figure out how to fly there.

I don’t normally travel to Europe in the summer and know it can be a busy time of year, so figured I’d have to do a bunch of research before booking these tickets. What I never expected how this would send me into a rabbit hole where the major airline, Delta, would prove to me how poorly it can treat customers and prospective customers who are about to spend thousands of dollars with them.

Doing research

My research started, as many a traveler’s does, on the web. I used hipmunk and expedia and seatguru and all the usual tools to shop around. Because this was a short trip (only 4 days total), we were hoping to perhaps use some of those thousands of American Express membership rewards points we had built up over the years to at least get a Business Class upgrade for the overnight (red eye) flight from Washington, DC to Paris.

We figured this was an appropriate splurge. It’s one of the rare times my girlfriend gets vacation time away from grueling medical school hours, and for me, this trip is days before my 30th birthday. Given that we’d need some comfy sleep after an 8-hour flight and 2-hour drive to the airport before that, we were hoping we’d be able to recline a little on our flight over to Europe.

Digging around, I find rates for tickets are all over the map. But eventually, I get a couple of promising leads. First, Delta has a SkyMiles frequent flyer program, and they have a relationship with American Express, especially if you have an American Express + SkyMiles credit card. I don’t have one of those, but I realize that I’ve had the same American Express “Blue for Students” credit card for like 12 years, and that this card is now discontinued, so I should probably upgrade to something that actually earns me some travel points.

I sign up for SkyMiles and get instantly approved for this credit card. So far so good.

But little did I know I was about to enter the tangled web of fine print that dominates so much of corporate America and consumer interaction today.

Read the rest of this entry »

5 years ago, I was bored

July 4th, 2014

I wrote this to a friend five years ago, a few weeks after I had quit my job to embark on the crazy ride that has been Parse.ly’s founding story.

You said to me, “I am glad that you left because you sounded unhappy there.”

But you know, I wasn’t exactly unhappy.

I was just bored.

I’m eager to work on my own stuff. I had a good work environment and I learned a lot. I was making money, had flexibility about hours and work from home, and was respected on my team.

But I had a couple of realizations. First, I didn’t see a future for myself in financial firms. I just don’t like their core business enough; in fact, I think their core business is somewhat superfluous and that financial firms should be way, way smaller than they are. They should make less money, have less power, etc.

Second, my specific project had this split personality. On the one hand, it wanted to be this cutting edge framework to really empower application developers throughout the company. On the other, it was a lost project — lots of code, lots of ideas, but no solid product and no real customer.

Read the rest of this entry »

Disable Google Hangout’s auto-mute on typing

July 2nd, 2014

Damnit, Google. Sometimes, you make product improvements that are awesome. Other times, you make “improvements” that are downright depressing regressions.

In an effort to stop the annoying sensation that happens when you are on a Google Hangout video conference and you hear nothing but your colleague’s “tap-tap-tap” on their loud programmer keyboards, Google added a feature to the software that automatically detects when someone is typing and auto-mutes them.

This is a nice idea, but what about when talking while typing is what you actually want to do? In this case, Google provides no recourse. And indeed, recently I gave a walkthrough to my team of a new code project, but constantly cut out because as I was showcasing ideas in code (and even simply navigating code with my keyboard using vim), Google would constantly mute me and make me cut out. Damnit, Google! You suck!

Well, Internet users unite! We have a working fix for this “feature”.

Read the rest of this entry »

Truth on tap

June 21st, 2014

Some people have put together an alternative to Wikipedia called Conservapedia. But, I won’t grace it with a link. I’d rather not let the Internet become more dangerous as a form of mind control.

The site is meant to provide explanations of world-wide phenomena in conservative terms. This brings full circle the blurring notion of truth in the Internet Era, as was described quite well by Clay Shirky in his essay, “Truth without scarcity, ethics without force.”

For example, the many-thousand word article on “Public Schools” includes a section entitled “Gender Disparity”. It explains that “Public schools as of late have seen girls’ scores soar above boys’ because schools have been geared toward the needs of girls”. It goes on:

Schools seek to emasculate boys by preventing healthy roughhousing and having psychologists put boys on drugs such as Ritalin. Then boys often come to hate school because radical feminists seek to prevent men from being men and forcing males to go through counseling to “discuss their feelings” and other liberal hogwash treating all students as if they were female. Colleges, because of this trend, see a trend of 60/40 female to male ratio because of feminist drivel such as romance novels in literature and ineffective therapy and attempts to push feminine traits on boys and young men making them frustrated and fed up with the system unless they agree to the school’s desire to become effeminate.

Now, certainly, there are valid conservative arguments against public schools. You don’t have to look far to find them. You might feel that a public school is a poor use of taxpayer dollars, is a violation of parental child-rearing rights, or is a form of mass indoctrination.

But, a feminist conspiracy?

Read the rest of this entry »

streamparse: Python + Apache Storm for real-time stream processing

May 4th, 2014

Parse.ly released streamparse today, which lets you run Python code against real-time streams of data by integrating with Apache Storm.

We released it for our talk, “Real-time streams & logs with Apache Kafka and Storm” at PyData Silicon Valley 2014.

An initial release (0.0.5) was made. It includes a command-line tool, sparse, with the ability to set up and run local Storm-friendly Python projects.

Read the rest of this entry »

The Log: a building block for large-scale data systems

December 16th, 2013

A software engineer at LinkedIn has written a monster of a blog post about “The Log”, a building block for large-scale data systems. The concepts in this post are near and dear to my heart due to my work on precisely these kinds of problems at Parse.ly.

What is “a log”?

The log is similar to the list of all credits and debits and bank processes; a table is all the current account balances. If you have a log of changes, you can apply these changes in order to create the table capturing the current state. This table will record the latest state for each key (as of a particular log time). There is a sense in which the log is the more fundamental data structure: in addition to creating the original table you can also transform it to create all kinds of derived tables.

At Parse.ly, we just adopted Kafka widely in our backend to address just these use cases for data integration and real-time/historical analysis for the large-scale web analytics use case. Prior, we were using ZeroMQ, which is good, but Kafka is better for this use case.

We have always had a log-centric infrastructure, not born out of any understanding of theory, but simply of requirements. We knew that as a data analysis company, we needed to keep data as raw as possible in order to do derived analysis, and we knew that we needed to harden our data collection services and make it easy to prototype data aggregates atop them.

I also recently read Nathan Marz’s book (creator of Apache Storm), which proposes a similar “log-centric” architecture, though Marz calls it a “master dataset” and uses the fanciful term, “Lambda Architecture”. In his case, he describes that atop a “timestamped set of facts” (essentially, a log) you can build any historical / real-time aggregates of your data via dedicated “batch” and “speed” layers. There is a lot of overlap of thinking in that book and in this article.

full-stack

LinkedIn’s log-centric stack, visualized.

Read the rest of this entry »