What is “a log”?
The log is similar to the list of all credits and debits and bank processes; a table is all the current account balances. If you have a log of changes, you can apply these changes in order to create the table capturing the current state. This table will record the latest state for each key (as of a particular log time). There is a sense in which the log is the more fundamental data structure: in addition to creating the original table you can also transform it to create all kinds of derived tables.
At Parse.ly, we just adopted Kafka widely in our backend to address just these use cases for data integration and real-time/historical analysis for the large-scale web analytics use case. Prior, we were using ZeroMQ, which is good, but Kafka is better for this use case.
We have always had a log-centric infrastructure, not born out of any understanding of theory, but simply of requirements. We knew that as a data analysis company, we needed to keep data as raw as possible in order to do derived analysis, and we knew that we needed to harden our data collection services and make it easy to prototype data aggregates atop them.
I also recently read Nathan Marz’s book (creator of Apache Storm), which proposes a similar “log-centric” architecture, though Marz calls it a “master dataset” and uses the fanciful term, “Lambda Architecture”. In his case, he describes that atop a “timestamped set of facts” (essentially, a log) you can build any historical / real-time aggregates of your data via dedicated “batch” and “speed” layers. There is a lot of overlap of thinking in that book and in this article.
It’s great to see all the various threads of large-scale data analytics and integration coming together into a unified whole of similar theory and practice.
LinkedIn, for example, has almost no batch data collection at all. The majority of our data is either activity data or database changes, both of which occur continuously. In fact, when you think about any business, the underlying mechanics are almost always a continuous process—events happen in real-time, as Jack Bauer would tell us. When data is collected in batches, it is almost always due to some manual step or lack of digitization or is a historical relic left over from the automation of some non-digital process. Transmitting and reacting to data used to be very slow when the mechanics were mail and humans did the processing. A first pass at automation always retains the form of the original process, so this often lingers for a long time.
Production “batch” processing jobs that run daily are often effectively mimicking a kind of continuous computation with a window size of one day. The underlying data is, of course, always changing. [...]
Interestingly, I also recently discovered that Kafka + Storm are widely deployed at Outbrain and Loggly. LinkedIn has its own stream processor, Samza, which relies directly upon Kafka. Meanwhile, AWS deployed a developer preview of Kinesis, based on the design of Kafka.
This all suggests to me that real-time stream processing atop log architectures has gone mainstream.
So, are stream processors a niche thing, only meant for analytics companies? LinkedIn’s engineers would argue that as the world is increasingly moving into having data feeds available in real-time, this will become more generalizable than the large-scale batch-oriented data flows (e.g. Hadoop, Map/Reduce) that came before. For context:
Seen in this light, it is easy to have a different view of stream processing: it is just processing which includes a notion of time in the underlying data being processed and does not require a static snapshot of the data so it can produce output at a user-controlled frequency instead of waiting for the “end” of the data set to be reached. In this sense, stream processing is a generalization of batch processing, and, given the prevalence of real-time data, a very important generalization.
So why has the traditional view of stream processing been as a niche application? I think the biggest reason is that a lack of real-time data collection made continuous processing something of an academic concern.
I once referred my work on Parse.ly as building a “content trading desk”. It was a weak connection, only realized in retrospect, that the kind of data I used to see when I worked on Wall Street and the kind I see in the media industry now has some overlap. Namely: “constantly updating time series”. LinkedIn also recognized that Wall Street was one of the only places where large-scale stream processing was happening, due to the availability of real-time market data:
I think the lack of real-time data collection is likely what doomed the commercial stream-processing systems. Their customers were still doing file-oriented, daily batch processing for ETL and data integration. Companies building stream processing systems focused on providing processing engines to attach to real-time data streams, but it turned out that at the time very few people actually had real-time data streams. Actually, very early at my career at LinkedIn, a company tried to sell us a very cool stream processing system, but since all our data was collected in hourly files at that time, the best application we could come up with was to pipe the hourly files into the stream system at the end of the hour! They noted that this was a fairly common problem. The exception actually proves the rule here: finance, the one domain where stream processing has met with some success, was exactly the area where real-time data streams were already the norm and processing had become the bottleneck.
Even in the presence of a healthy batch processing ecosystem, I think the actual applicability of stream processing as an infrastructure style is quite broad. I think it covers the gap in infrastructure between real-time request/response services and offline batch processing. For modern internet companies, I think around 25% of their code falls into this category.
It turns out that the log solves some of the most critical technical problems in stream processing, which I’ll describe, but the biggest problem that it solves is just making data available in real-time multi-subscriber data feeds.
The entire article is worth a read.]]>
I didn’t know what “Single Dispatch Functions” were all about. Sounded very abstract. But it’s actually pretty cool, and covered in PEP 443.
What’s going on here is that Python has added support for another kind of polymorphism known as “single dispatch”. This allows you to write a function with several implementations, each associated with one or more types of input arguments. The “dispatcher” (called
singledispatch and implemented as a Python function decorator) figures out which implementation to choose based on the type of the argument. It also maintains a registry of types to function implementations.
This is not technically “multimethods” — which can also be implemented as a decorator, as GvR did in 2005 — but it’s related. See the Wikipedia article on Dynamic Dispatch for more information.
Also, the other interesting thing about this change is that the library is already on Bitbucket and PyPI and has been tested to work as a backport with Python 2.6+. So you can start using this today, even if you’re not on 3.x!
Someone on Hacker News asked,
Huh? But that’s not single dispatch? Single dispatch is deciding what function to call based on the type of your object, not on the type of arguments. That’s called double dispatch.
Single dispatch is pretty standard polymorphism, C++ can do that.
That’s a bit of a semantic argument. Python already has “object-oriented single dispatch” — aka traditional object-oriented polymorphism.
What this module adds is “functional single dispatch”.
So, whereas before you’d always be forced to implement some type-varying function using two classes
HandleB, each with an implementation for
class HandleA: def handle(self): pass class HandleB: def handle(self): pass def main(obj): # obj could be instance of HandleA or HandleB obj.handle()
In this case, “dynamic dispatch” is done by
obj.handle(), which will pick a different implementation depending on the type of
With this PEP/stdlib addition, you can now write two functions,
handle_B, which take an argument,
obj, and are dynamically dispatched using the generic function
from functools import singledispatch @singledispatch def handle(obj): pass @handle.register(A) def handle_A(obj): pass @handle.register(B) def handle_B(obj): pass def main(obj): # obj could be instance of A or B handle(obj)
And in this case, “dynamic dispatch” is done by
handle(obj), or really, by the dispatcher decorator. It chooses
handle_B based on the type of the
The reason this is a nice addition is because it makes Python eminently “multi-paradigm” — you can choose object-oriented or functional styles depending on your taste and the applicability to the task at hand, instead of being forced into one programming style or the other.]]>
I thought it was particularly interesting how he documented three different ways in which investors say “no”, without really saying no.
Say nothing: “I mentioned earlier that investors prefer to wait if they can. What’s particularly dangerous for founders is the way they wait. Essentially, they lead you on. They seem like they’re about to invest right up till the moment they say no. If they even say no. Some of the worse ones never actually do say no; they just stop replying to your emails. They hope that way to get a free option on investing.”
Say yes, conditionally: “When an investor tells you ‘I want to invest in you, but I don’t lead,’ translate that in your mind to ‘No, except yes if you turn out to be a hot deal.’ And since that’s the default opinion of any investor about any startup, they’ve essentially just told you nothing.”
Say yes, then no: “Remember the twin fears that torment investors? The fear of missing out that makes them jump early, and the fear of jumping onto a turd that results? This is a market where people are exceptionally prone to buyer’s remorse. And it’s also one that furnishes them plenty of excuses to gratify it.”
This is why in 2010, I wrote the essay “It’s easier to play the option than the bet.” I noticed that when analyzed rationally from an investor’s vantage point, it’s always easier to take an “option” on investing in a company, rather than “pulling the trigger” and actually making the bet.
Coming to this realization was important for Parse.ly‘s founding history. It made me realize the wisest thing we could do as a company is to “ride like hell”, rather than jockeying to position ourselves as someone else’s winning bet. In other words, actual traction beats perceived hotness. Or, as Gabe Weinberg (founder of DuckDuckGo) put it, “Traction trumps everything.” Actual traction gives you more options, as a founder.
Fundraising is extremely confusing for first-time founders, but I think pg’s essay clears things up. So: read it, along with his earlier two in his fundraising series:
Here is some coverage from around the web:
More coming soon!]]>
The latest start-up boom has led to the creation of at least 161 companies that end in “ly,” “lee,” and “li,” which is, naming consultants tell us, 160 too many. There’s feedly, bitly, contactually, cloudly, along with a bunch of other company-LYS [...] and all but the first ever “ly” name are “just lazy,” Nancy Friedman, a naming consultant, told The Atlantic Wire.
According to a quick Python script  I just wrote, ~2,438 valid English words exist that end with the suffix “ly”. Don’t see how using valid English words for domains is lazy.
It turns out, my company’s name — Parse.ly — isn’t a valid “-ly” English word. But we had a good reason for picking it.
Joe Kukura at AllVoices reacted:
Okay, I will admit that Parse.ly is a pretty cute play on words. But, with respect, there was already a Parse.
Actually, Parse.ly was founded in 2009, two years before Parse. We registered the “.ly” domain name in July 2009, and registered the Parsely.com domain in 2010. I even reacted on HackerNews to Parse’s launch this way:
as the founder of Parse.ly and owner of parse.ly and parsely.com domains, I just have to say — that’s friggen cold, dudes
At the time, we didn’t pick the name to follow a trend. There was no trend to follow. The only other “.ly” domain we knew of was bit.ly. In fact, we labored for quite a bit about whether to actually go with the name since “.ly” domains weren’t very common and we were worried people wouldn’t know they could even type in a “.ly” TLD into the browser.
We picked it because the name resonated with people we tested it on, the domain was unregistered, and we were cheap and scrappy. After all, this was during the era of the startup diet
Really good, short, brandable “.com” domain names are expensive, and in summer of 2009, we had just started up and had no money to invest in a fancy brand. We had to cobble one together, like everything else in those early days. We made the decision quickly, but with full knowledge that this would be an important part of our brand moving forward.
My co-founder actually explained the reasoning behind our company name selection on BrainTree’s customer blog:
It’s a play on the spelling of the herb parsley, which allows us to use the verb parse and the herb parsley together to form a brand that’s recognizable and still speaks to the core of our business: parsing and analysis.
Here are some other fun facts about our original branding.
Our first Parse.ly logo was designed as a trade for another domain I happened to own. It was the dormant domain for a film project for one of my friends, Josh Bernhard. I had registered it for him while we were both in college.
It so happened that my friend had picked the name “Max Spector” for his film, and thus registered maxspector.com. The film never came to fruition, so the domain just gathered dust for awhile. But, Max Spector happened to be the name of a prominent San Francisco designer. And Max got in touch with me about buying the domain for his personal website. Acting opportunistically, I offered it in trade for a logo for Parse.ly. To my surprise, he agreed.
Thus, the total cost for our original brand (domain & logo) was $75 (parse.ly domain registration) + $25 (cost of one-year renewal of maxspector.com as part of trade with designer) = $100. Pretty good deal for 2-month old company with no revenue, minimal capital, and a stealth product.
When we started to get some traction (big customers, seed stage investors), we acquired the parsely.com domain (2010), which actually cost ~$2,000. We also tweaked the logo further with another designer, which, together with some other branding/design work, probably cost another ~$1,500.
Over time, we came to really enjoy the name “Parse.ly”. So did our customers. Our company went through a few iterations, but the original vision behind the name continued to stick.
Despite owning the “.com”, we decided to keep the “.” in our branding because we think it emphasizes the play-on-words (“parse”) and reminds people about the mis-spelling . The mis-spelling is important: this way people don’t accidentally type “parsley.com”, which we don’t own.
Our “.com brand tweak” project cost 35X the original “.ly brand registration” project. Which is perfect. That’s the way startups work — we stay ridiculously lean for as long as possible, and scale our processes in response to demand. We do just-in-time scaling. And this applies to non-software systems, like marketing and branding. It’s actually the opposite of lazy.
What’s lazy is hiring a naming consultant to pick your brand when you don’t even have money to pay employees yet. It’s also just as lazy to dump $350,000 of your startup capital on a good-looking domain name, when you don’t know even know if you’ve managed to build a company that can last .
Registering domains on uncommon TLDs to get a memorable, but cheap, web brand during your company’s startup period? Not lazy. That’s just hacking a system to your advantage.
Want to work somewhere where it’s a virtue to hack things to your advantage, rather than waste money on fancy consultants? Parse.ly is hiring!
 My Python script for calculating the number of “-ly” English words:
word = lambda line: line.strip().lower() ly_names = [word(line) for line in open("/usr/share/dict/words") if word(line).endswith("ly")] print len(ly_names)
 The inclusion of a “.” in our logo may bug some old-school branding consultants, but I think it creates a nice distinctive look.
 Indeed, Color ended in a ball of flames. So much for the $350,000 brand investment.]]>
“If you want to build a ship, don’t drum up the people to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.”
- Antoine De Saint-Exupery
… entrepreneurs are pressured to maintain a totally positive face to the outside world about the state of their company. In San Francisco, “we’re killing it” is almost now an inside joke because of the ubiquity of that response when someone asks an entrepreneur how their company is faring. Most of these companies are not “killing it”, and the entrepreneurs probably know that.
There is also a nice comment thread discussing the “we’re killing it” phrase, a discussion to which I contributed an anecdote and interpretation.
The comment I added to the discussion:
A friend once relayed a story to me of a dinner meeting of ~20 early-stage high-tech executives he attended that was sponsored by a startup organization. The moderator asked one question as an ice breaker to kick off the night: “What is the greatest challenge that your startup faces today?”
My friend was the first one picked to share. Being a very level-headed guy (who personally hates the term, “killing it”), he suggested that one of his biggest challenges was maintaining work/life balance & personal relationships, for himself & also for his employees, so that they don’t burn out on the job.
The baton then got passed to the next entrepreneur, and, as my friend tells it, entrepreneur after entrepreneur shared their “greatest challenge”, though they were only “challenges” in the weakest sense of the word. For example: “handling all the new customers we have”, “scaling our servers for our massive user-base”, “hiring enough software engineers to keep up with the business growth”.
He realized then that every entrepreneur was “positioning” the answer to make it appear that the greatest challenge faced was dealing with the company’s illusory massive success.
I think this anecdote describes the “killing it” mentality quite well — even among peers and in a setting where people should be comfortable sharing their fears, this community prefers reality distortion.
One valuable thing you tend to get only in startups is uninterruptability. Different kinds of work have different time quanta. Someone proofreading a manuscript could probably be interrupted every fifteen minutes with little loss of productivity. But the time quantum for hacking is very long: it might take an hour just to load a problem into your head. So the cost of having someone from personnel call you about a form you forgot to fill out can be huge.
This is why hackers give you such a baleful stare as they turn from their screen to answer your question. Inside their heads a giant house of cards is tottering.
The mere possibility of being interrupted deters hackers from starting hard projects. This is why they tend to work late at night, and why it’s next to impossible to write great software in a cubicle (except late at night).
One great advantage of startups is that they don’t yet have any of the people who interrupt you. There is no personnel department, and thus no form nor anyone to call you about it.
In other words, Celebrity Cruises is presenting Conroy’s review of his 7NC Cruise as an essay and not a commercial. This is extremely bad. Here is the argument for why it is bad. Whether it honors them well or not, an essay’s fundamental obligations are supposed to be to the reader. The reader, on however unconscious a level, understands this, and thus tends to approach an essay with a relatively high level of openness and credulity. But a commercial is a very different animal. Advertisements have certain formal, legal obligations to truthfulness, but these are broad enough to allow for a great deal of rhetorical maneuvering in the fulfillment of an advertisements primary obligation, which is to serve the financial interests of its sponsor. Whatever attempts an advertisement makes to interest and appeal to its readers are not, finally, for the reader’s benefit. And the reader of an ad knows all this, too – that an ad’s appeal is by its very nature calculated – and this is part of why our state of receptivity is different, more guarded, when we get ready to read an ad.
An ad that pretends to be be art is – at absolute best – like somebody who smiles at you warmly only because he wants something from you. This is dishonest, but what’s sinister is the cumulative effect that such dishonesty has on us: since it offers a perfect facsimile or simulacrum of goodwill without goodwills’s real spirit, it messes with our heads and eventually starts upping our defenses even in cases of genuine smiles and real art and true goodwill. It makes us feel confused and lonely and impotent and angry and scared. It causes despair.
- David Foster Wallace
(full essay on on Harpers.org)
You can also always count on The Onion to provide some perspective.
Sponsored Content: Pretty F'ing Awesome http://t.co/XzP6nuvEsS "… articles & videos endorsed by faceless corporate conglomerates[!]"
— Andrew Montalenti (@amontalenti) May 18, 2013
__call__(make an object behave like a function) or
__iter__(make an object iterable).
The choice of wrapping these functions with double-underscores on either side was really just a way of keeping the language simple. The Python creators didn’t want to steal perfectly good method names from you (such as “call” or “iter”), but they also did not want to introduce some new syntax just to declare certain methods “special”. The dunders achieve the dual goal of calling attention to these methods while also making them just the same as other plain methods in every aspect except naming convention.
Some people call these “magic methods”. Indeed one of the best guides online, A Guide to Python’s Magic Methods, uses this term. The reason I don’t like this term is that it makes it seem like dunders are only reserved for “real experts”, when quite the opposite is true. Indeed, nearly any new Python programmer uses
__init__ to implement object initializers (aka constructors). The double-underscores don’t mean “reserved for wizards”; they simply mean, “reserved by the core Python team”.
One somewhat perplexing trend is that a few library authors have chosen to use the dunder conventions for their own code. For example, in SQLAlchemy, you use
__tablename__ to map a SQLAlchemy ORM class to a SQL table. It then exposes new properties,
__mapper__. Ugh. For an otherwise beautifully designed library, this is so very wrong and completely misses the point. The dunder convention is a namespace reserved for the core Python team to implement their own protocols. Never use the namespace for your own, library-specific things! This defeats the whole purpose. If you need to hide a property, use
A recent video called “__dunder__ functions” walks through and demystifies many of the dunder protocols. It’s worth a watch, especially for Python beginners.
Moving forward, here are the rules you should follow for dunders:
__call__method. Because, you’re not! It’s a standard language feature just like
For me, it was great to finally meet in person such friends and collaborators as
@__get__, @nvie, @jessejiryudavis, and @japerk.
It was of course a pleasure to see again such Python super-stars as
@adrianholivaty, @wesmckinn, @dabeaz, @raymondh, @brandon_rhodes, @alex_gaynor, and @fperez_org.
(Want to follow them all? I made a Twitter list.)
I also met a whole lot of other Python developers from across the US and even the world, and the entire conference had a great energy. The discussions over beers ranged from how to use Tornado effectively to how to hack a Python shell into your vim editor to how to scale a Python-based software team to how to grow the community around an open source project.
In stark contrast to the events I’ve been typically going to in the last year (namely: ‘trade conferences’ and ‘startup events’), PyCon is unbelievably pure in its purpose and feel. This is where a community of bright, talented developers who share a common framework and language can push their collective skills to new heights.
And push them, we did.
IPython in Depth
My first tutorial was IPython In-Depth. From this, I was sent down the path of really digging into three areas of IPython I had not paid much attention to before: its 0MQ based architecture, its display framework, and its new support for “cell magics”.
The 0MQ architecture means that IPython has essentially created a “Python server” which can efficiently receive commands for evaluation from other IPython “clients”. What does this mean? You can run IPython in “kernel” mode (simply run: “ipython kernel”) and a headless service will run in the background. You can then connect to that service from any number of clients — for example, IPython’s web-based HTML notebook, an IPython command-line client, or an IPython client embedded inside a vim (via vim-ipython) or emacs editor.
From there, code can be sent to the kernel for evaluation, and results returned. This kernel can even live on a remote server (for example, to pull data from production) and be secured with an SSH tunnel. Really cool stuff — and I think people have only scratched the surface for how this opens up tool development options for Python. For example, imagine two engineers collaboratively working on Python code and testing their interaction in a shared IPython session running on a server.
The IPython display framework is what makes it possible for the IPython Notebook to display rich content as the evaluation of expressions. For example, images, plots, graphics, tables, Markdown-formatted text, and more. The point of understanding this framework is that it actually enabled a much more prototyping-friendly coding style, when you imbue your Python objects with the display protocol to allow them to “repr” themselves in rich and visual ways.
Finally, cell magics. These make it possible to apply some special IPython logic to an entire cell of code. Some of these are already quite powerful, e.g. %px (for parallel execution on an IPython cluster) and %%timeit (for performance measurements). But what’s even cooler is that by using a small Python library and a decorator syntax, you can easily write your own cell magics, offering another higher level abstraction for interactive prototyping.
Rapid Web Prototyping (my tutorial)
I taught a tutorial this year, “Rapid Web Prototyping with Lightweight Tools”. (More information in my prior post.) The full video is recorded and available online here.
Overall, the tutorial went really well. I managed to cover a lot of material in a compressed time period, gave people a chance to play with a lot of small tools from their machines, and even had some great design and architecture discussions during the Q&A session.
One of the tutorial attendees described it as a course that “unified” her web knowledge. And another called it a “smörgåsbord of powerful, but comprehensible, technologies that I’ll be exploring for weeks.”
For me personally, it was a personal goal of mine to teach some material at PyCon that I felt would especially benefit those who were novices in either web development or rapid prototyping. For many attendees, this was the first time they were experimenting with static HTML clickable prototypes and alternative lightweight frameworks / DBs like Flask / MongoDB.
In short: a great experience, lots of fun, and an exhausting first day!
My next day of tutorials were a bit more academic. I spent the first half of the day in Brandon Rhodes’ tutorial on Sphinx and code documentation. Most of this was material I already knew; I attended this tutorial only because my Data Analysis tutorial was over-subscribed and I couldn’t find a seat.
But, I was actually glad about this: Brandon has a great tutorial presentation style (very different from mine) and I treated it more as a way to learn about different teaching approaches. Also, though I’ve been a heavy user of Sphinx for awhile, I learned some new things about using the autodoc module for code documentation and how to customize look-and-feel more directly.
Advanced Machine Learning
In the afternoon, I shifted over to an advanced machine learning tutorial with scikit-learn. This tutorial was loaded with interesting code examples and concepts, many of which were beyond my current level of academic background. But I thoroughly enjoyed getting exposed to a lot of these and learning a bit more about the scikit-learn API. It also meant I had a lot of interesting ideas for how to capitalize on the knowledge for Parse.ly.
Done with Tutorials
The tutorials were over, and I felt like I had truly expanded some of my skills, as well as exercising some of my own teaching / public speaking muscles.
(By the way, you should exercise those, too, periodically — they do atrophy.)
Highlights of the Conference
The rest of the “official conference” was itself, truly a smörgåsbord of powerful Python technologies being explained by top-notch practitioners. Among these, I’ll highlight a few of the most interesting talks I attended, with the respective theme:
On the Parse.ly team, we already had a few watercooler discussions around the Raspberry Pi. My colleague Didier has actually “de-tethered” from cable and now uses a Raspberry Pi as a $35 media center computer.
But it wasn’t until I attended PyCon that I realized just how important — and transformative — this technology really is.
Eben’s keynote address does a great job about getting you excited about the Pi, and is also just a great story of startup scaling. As a present to every PyCon attendee, everyone received a free Raspberry Pi Model B, along with a 4GB SD card pre-loaded with Debian. A very wise marketing move: I’m certainly playing with it when I get home!
The Raspberry Pi was not only featured in several lightning talks, but also in many barside conversations. People were eager to get home and figure out how to use Python to truly embrace the hacker spirit: building useful tools that you can actually deploy in the real world. I heard about ideas for music players, alternative gaming consoles, security systems, art installations, and educational tools. One project even discussed its goal to bring Raspberry Pi devices pre-loaded with thousands of Khan Academy videos to remote regions beyond the Internet’s digital divide.
Parsing and Compilers
As someone who took David Beazley’s amazing compilers course in Chicago last year, I was pleased to see the amount of activity at PyCon around language parsing.
Alex Gaynor, one of PyPy’s core contributors, gave an excellent talk that implemented a basic language interpreter in less than an hour, using a library he wrote called “rply” — a PLY port to RPython with a better API. A related talk was an introduction to Allen Short’s Parser Expression Grammar-based library, parsley.
(Funny side note: Allen mentioned that the reason he spelled ‘parsley’ correctly was to disambiguate it from Parse.ly!)
I also acted as a fly on the wall at the PyPy open space, where some of Python’s most important compiler and language hackers discussed some of the issues they are running into for getting PyPy fully working with Python 3.x and making headway on PyPy’s stackless / GIL-less operation.
These talks are definitely worth a watch:
Pandas was declared at one of the keynotes as being a “new killer app” of the Python language. These two talks showed some of the power of Python as a data analysis and algorithm prototyping environment.
Pythonic Approaches to Problems
These two talks focused on some corners of the language that tend not to be discussed in much detail, but open up many interesting Pythonic approaches to problems like encapsulation and datetime handling.
Yes, he deserves his own category. Raymond Hettinger gave three talks that I attended, and each was superb. His keynote discussed the important “winning features” of the Python language that have caused it to not only be a vibrant community in its own right, but a trend-setting language across the entire software landscape. His two other talks focused on Python style, specifically around things like iteration and class design. For those in the startup community, his “Python Class Toolkit” talk is particularly awesome — and hilarious.
Overcoming the Gender Gap
I tweeted that one of the biggest differences between my last PyCon (2009, Chicago) and this one was the much larger startup presence, especially in the exhibit hall. I met the folks from top-notch startups like Optimizely, Bitdeli, Datadog, Disqus, Quantcast, Continuum, and many more.
But the bigger surprise was the difference in terms of gender distribution. Apparently, 20% of PyCon attendees were women. The excellent PyLadies organization managed to help spread awareness for PyCon and the Python Software Foundation made a concerted effort to diversify what was typically a male-dominated conference. These efforts paid off in a big way. I was really glad to see this improvement, and it got me thinking about how the same techniques spearheaded in the Python community can be used to improve the gender diversity of other areas of entrepreneurship, technology, and education.
In short, I couldn’t be prouder to be a Pythonista, and there’s no better time than now to be a Pythonista. It was a great experience — a warm congratulations and thank you to the organizers for putting together an amazing event.
See you next year in Montreal!]]>
The tutorial is hands-on and code-oriented, building upon the viewpoint I laid out in my “Build a web app fast” post from last year.
It selects some lightweight tools for building a web app such as:
It combined that with a lightweight database (MongoDB) and a small but production-ready UNIX-based deployment stack (supervisor, uWSGI, nginx). Code is finally pushed to live servers using fabric.
Throughout the tutorial, live rapid prototyping tools like IPython Notebook, CodeMirror, and Emmet are used, as well as good old pen-and-paper.
I gave this talk as a fundraiser for a startup hub in Charlottesville, VA, and it raised over $1,000. At PyCon, the tutorial is raising money for the Python Software Foundation, IMO one of the most important open source foundations out there.
But more importantly, this tutorial teaches programmers how to rapidly go from idea to prototype to deployed web application with minimal magic and minimal effort. Most fledgling startup hackers who reach out to me are looking for this kind of material, so I hope to turn it into a multi-part blog series (perhaps ebook) soon. More to come!]]>
His net worth on taking office was $1,800 — the value of his 1987 Volkswagen Beetle. He donates 90% of his presidential salary to the poor. He lives in a small house on $800/month with his wife, even as president. He has sold off presidential vacation homes and believes public officials should be “taken down a notch”. He believes serving consecutive terms is “monarchic.” He hopes to return to farming after serving his presidential term.
Read more on the NYTimes.com.]]>
The main conference is sold out, but there are still a few spots open for the tutorial sessions.
(Here’s a secret: the tutorials are where I’ve always learned the most at PyCon.)
Most of PyCon’s attendees are Python experts and practitioners. However, Python is one of the world’s greatest programming languages because it is one of its most teachable and learnable. Attending PyCon is a great way to rapidly move yourself from the “novice” to “expert” column in Python programming skills.
This year, there is an excellent slate of tutorial sessions available before the conference starts. These cost $150 each, which is a tremendous value for a 3-hour, in-depth session on a Python topic. I know of a lot of people who are getting into Python as a way to build web applications. There is actually a great “novice web developer” track in this year’s tutorials, which I’ll outline in this page.
Here’s a suggested set of tutorials that will get a novice Python programmer to a strong web development skill level in just two days:
Day 1, Morning
Day 1, Afternoon
Day 2, Morning
Day 2, Afternoon
I’ve bolded my suggested web-related session in each time slot, and also provided one alternate in case the suggestion doesn’t meet your fancy. (Full disclosure: I’m the instructor for the “Rapid Web Prototyping” one.)
Also, I have a blog post, “import this: The Zen of Python” with some of my publicly available materials for Python learning. I suggest you look these over before the conference. Having some basic Python skills before you go into the tutorial will make sure you get the most value out of it.
Did I convince you that this is a good way to spend a couple days in March? I hope so. If so, register here!]]>
This device was comically under-powered in retrospect. It had 2 megabytes of RAM, which had to be used as not only the working memory of the device, but also the storage. It had a 16 Hz processor, a 4-greyscale screen, and a stylus-driven interface.
The Palm V was an amazing device. In lieu of the plastic of the Palm III, it had a finished anodized aluminum finish, very similar to the kinds of sleek devices we would only begin to regularly see in the last couple of years. It was nearly half the weight of its predecessor, and as thin as the stylus you used to control it. It had a surprisingly well-designed docking station (imagine this: since USB hadn’t yet been developed, it had to sync over the low-bandwidth Serial Port available on PCs at the time).
I came across a time capsule of my Palm V usage. Since I was a web designer at this time, but since it was a “pre-web” era, I often put together little static HTML websites to demonstrate features and ideas to my friends. I did one of these for my Palm V.
A Mobile Device of Pure Utility
In a world without cell data plans, wifi networks, & laptop computers, my Palm V provided much of the same utility that is now spread across these various pieces of everyday technology infrastructure, but in a single device. I kept a unified to do list of my life, and a specialized task manager specifically for coursework and classes. In it, I tracked not just homework assignments and due dates grouped by class, but also my grades — so that I could prioritize work to optimize my grades.
I think I only got my first cell phone a couple of years later — and they were nowhere near ubiquitous yet, so my contact manager was actually being used to store landline phone numbers and addresses of my friends — and likely, of local take-out / delivery restaurants.
What was AvantGo?
I sometimes tell my friends that despite the lack of infrastructure in school for it, I was indeed a “child of the Internet”. This was mostly thanks to AvantGo.
When I was at home on my personal computer, I was already an avid web content reader from early pioneers like Salon.com, CNET, Wired, and BBC. But I couldn’t easily read their content at school. AvantGo was an early, popular service for Palm devices that let you download and sync “content channels”. Of course, this is way before RSS/Atom. Channels were really mobile-optimized static HTML websites that each provider would put out daily with a snapshot of their latest content. In the screenshot above, you can see me reading an article from Salon.com dated March 30, 2002.
In addition to officially supported channels that were put out by major publishers, you could also download and sync arbitrary web content. This was essentially an early version of Readability / Instapaper — in some ways, much clunkier, in other ways, more complete. The software would go to the website and try to mobile-optimize a site (stripping it down to basic HTML), while also crawling the relative links. For example, at the time I was very interested in participating in the Debian open source project, so I had AvantGo sync the Debian Policy Manual, which contained hundreds of HTML pages full of reference information.
So, yes — when I was bored in Spanish or Math class, I’d pull up a political article from Salon.com, a tech piece about the 2000 tech bubble forming in Silicon Valley from Wired, or read over open source software development guidelines. I was a child of the Internet, and the Palm V kept me tethered in a pre-mobile world.
What makes AvantGo so interesting, upon reflection, is that despite everything against it — under-powered mobile devices, no ubiquitous wireless Internet access, and a less standardized world-wide-web, it still satisfied a use case that has yet to be satisfied today. Namely, it provided a way for me to curate the “best sources of content” I found on the web, and sync and download all their content to provide a personalized, offline library of content. Instead, we have nothing but fragmentation today — individual native apps for each news publisher, RSS readers like Newsblur providing a way to get the latest updates from a group of them, and mobile web browsers for everything else.
Many in the industry see AvantGo as the first to attempt [a shift to mobile]. AvantGo’s software pulls content from a user-defined list of sites and loads the content into the PalmPilot’s synchronization queue on a desktop computer. When the PalmPilot is synchronized with the desktop machine using 3Com’s own synchronization software, the desired content is also uploaded to the Pilot — giving users a fresh stash of data to dig through offline.
- from “The Next Browser War?”, Wired, 1999
Obviously, devices running iOS and Android are finally starting to make ubiquitous computing a reality, and are providing less clunky ways to access web content.
But I think it’s worth reflecting on what my Palm V had going for it nearly a decade ago that has yet to arrive in this new era.
A great example of online/offline mastery that I witnessed in a mobile application is Spotify’s support for bringing an entire music playlist offline. Offers the best of both worlds.
We’re certainly living in an age of ubiquitous, wireless Internet access, whether via cell towers or wifi. But that doesn’t mean that the smartest applications are those that exclusively rely on data access being present. In fact, making your application rely on data access will almost certainly make it slower than it could be.
Due to our obsession with connectivity, I think many developers are forgetting about use cases where the mobile device is meant to be a smart accessory to our lives, rather than a portal to the web. These are no longer “thin clients” — instead, we should think of them as powerful handheld computers. Recent iPhone and Android devices feature dual-core processors at speed above 1Ghz and 1GB of RAM. In 1998, typical desktop computers were slower than this (e.g. a single-core P3 600 Mhz w/ 128MB of RAM and 10GB of disk storage).
Yes, we’ve put the power of a desktop computer into the palm of a user’s hand. But, are we really tapping this opportunity?]]>
But how misled I actually was—at least, in Walker Percy’s eyes. In his essay, “The Loss of the Creature,” Percy recalls a scene from The Heart Is a Lonely Hunter:
…the girl hides in the bushes to hear the Capehart in the big house play Beethoven. Perhaps she was the lucky one after all. Think of the unhappy souls inside, who see the record, worry about the scratches, and most of all worry about whether they are getting it, whether they are bona fide music lovers. What is the best way to hear Beethoven: sitting in a proper silence around the Capehart or eavesdropping from an azalea bush?
Percy here contrasts two different approaches to viewing art—the girl who informally and spontaneously encounters the work of art, out of context, as opposed to the “unhappy souls inside” who formally prepare themselves for a kind of pre-packaged listening experience. Percy wonders which is better—a question meant for the reader’s pondering. But his essay offers his answer: we can only truly see or hear a piece of art by “the decay of those facilities which were designed to help the sightseer”. Perhaps Percy is right—it might have been better if my experience with Hamlet had been an accidental discovery rather than a guided tour, an “eavesdropping from an azalea bush” rather than “proper silence around the Capehart.” Perhaps I should have encountered the text unaware of its origin but intrigued by its mystery. After sitting by a tree and reading the text front-to-back, perhaps then I would be able to “see” Shakespeare in Percy’s sense of the word.
Percy’s noble task is to open our minds to the possibility that we are not the masters of what we know—that, in part, what we know and what we see, when approached passively, have a lot more to do with “preformed symbolic complex” than with ourselves. Percy’s exploration achieves one of the main goals of all philosophy—to change the way we think about things. He changes the meaning of many concepts human beings tend to take for granted. Sight is no longer the mere act of seeing, but “a struggle,” an act of understanding and appreciation. “Sovereignty,” in relation to things, is no longer some abstract concept of “power,” but an ability to interpret for oneself. Education—or perhaps more specifically, its dynamic—is reshaped, for it is no longer a passive act (i.e. “being taught to”) but an action that relies much more upon the student, who “may have the greatest difficulty in salvaging the creature itself from the educational package in which it is presented”. These concept-alterations are thus meant to alter our reality; they aim to help us rediscover in art what he calls in his opening paragraph an island, “Formosa.” This previously untouched island is beautiful to its discoverer “because, being first, he has access to it and can see it for what it is”. The metaphor of seeing an object as the discovery of untouched territory suggests that every thing in this world has a certain rawness only present upon initial discovery or with the conscious effort of recovery. Once we rediscover art, Percy thinks we will “catch fire at the beauty of it”. I, however, am no longer as hopeful as Percy when it comes to literature, especially those works of the Western Canon with which Percy was surely acquainted, and of which he will likely become a part. It may sound plausible, but I think it impossible to recover a work by the “breakdown of the symbolic machinery by which the experts present the experience to the consumer”. It is one thing to encounter a piece of art out of context and thus have its rawness left unaltered by the “symbolic machinery”. It is quite another to temporarily unlearn pieces of information that may influence our interpretation.
Whether I like it or not, “great authors” are almost always introduced to me—even hyped—before I approach their work. This is to be expected: how else would I know to read them? Furthermore, for those works I read whose authors have not yet achieved greatness, the standards by which I judge their work are undeniably Shakespearean, Borgesean, Joycean.
Even if I could have avoided all the English classes that taught me artists without first showing me their art, I would have still run across that author’s name somewhere, for those of the Canon are relatively ubiquitous in everyday life and conversation. In my search for knowledge, I would defer to the “expert” on general knowledge: the Encyclopedia. There I would find some “unadulterated truths,” such as these, direct from the Columbia Encyclopedia, Sixth Edition: Shakespeare is considered “the greatest playwright who ever lived;” Borges is “widely-hailed” as “the foremost Spanish-American writer” of the 20th century; Joyce was perhaps “the most influential and significant novelist of the 20th century,” and, in fact, “he was a master of the English language, exploiting all of its resources.” There it was, then. Before I even had Ulysses in my young, anxious hands, I knew the one who wrote it was not just a writer—no, no, he was a master. And we always bow to masters.
My Formosa is an island, but I do not discover it—I am, instead, born into it, and led around it by hand. Its tumultuous shore, I am told, is Shakespearean, its luscious sand Joycean, and its maze-like forest Borgesean. I am bound by it. “When you visit other islands,” I am told by its keeper, “see how it compares to this one—your first island, your one and only Formosa.”
Surely, Percy did not see Formosa in my sense. Mine is an island-as-boundary; his is an island-as-essence. I know that my discourse, the Western Canon, limits me and frames how I see things. I also doubt whether I will ever be able to forget those limits—to see an artwork for its idealistic essence, in Percy’s sense. And so, I ask: How do we extract the essence of an artwork from the cloud of united critical praise and reverence for those works deemed canonical?
This becomes a complicated endeavor: in the world of professional interpretation, context, it seems, is everything. One can see this clearly in the art classrooms, in which sometimes the image has much less to do with art and much more to do with a story—a story which must be learned, one not easily decipherable from the image itself. How could one learn of Matisse without learning of Picasso, and vice versa? How could one disregard the temperamental relationship when even Picasso once said, “You have got to be able to picture side by side everything Matisse and I were doing at that time.” In this sense, sometimes artists go so far as to produce a kind of meta-art—that is, art about art itself.
This, however, is merely an exercise. Most art does not beckon the gazer to contextualize and draw conclusions based on the story that went into its creation. Most artists expect their art to stand alone, to fight a battle in the perceiver’s mind without being told where that battle takes place or who the opposing forces are. We may not see exactly what the artist intended, but that is what makes art so very vital—none of us sees it the same way, though all of us have the right to see it, one way or another. One could even go so far as to say knowing anything about how one is supposed to see a piece of art removes much from the experience of viewing art. This is similar to Percy’s argument that he relates to the girl in the azalea bush. Jeanette Winterson also writes of an analogous idea: “When I read Adrienne Rich or Oscar Wilde, rebels of very different types, the fact of their homosexuality should not be uppermost. I am not reading their work to get at their private lives. I am reading their work because I need the depth-charge it carries”. This “depth-charge” remains undefined by Winterson, but we get a sense of it as the complexity of emotion wrapped up in the writing—the connection the author shares with us when we have read the written work. Is not the ultimate goal of writing—or any art, for that matter—to attain a sort of timelessness, a feeling or idea not limited to the banality of facts or details? Winterson describes this connection as “hands full of difficult beauty”. She carefully and purposefully chooses this very complex juxtaposition of words. Beauty is given a quantity here, and we can almost imagine it as an abundance of glowing sand, piled high upon open palms, freely flowing through the fingers. The generous artist gives this to us without cost or effort. It is seen plainly in his or her art, not in hype or critical analysis or biography or scholarly research. Yet it is “difficult” in the sense that we will not take it, for we fear we cannot, for we fear that we will let it slip through our fingers and fall to the ground; then we will never be able to claim that we conquered that art, that we cut that beauty down to a manageable size and stuck it in our pocket for future use as bragging material.
Winterson and Percy would probably agree that background information—an artless “introduction to an author”—is damaging to the reader who wishes to discover an artist for the first time. “Art must resist autobiography,” Winterson writes, “if it wishes to cross boundaries of class, culture … and … sexuality. Literature is not a lecture delivered to a special interest group, it is a force that unites its audience”. Why, then, are we so insistent in connecting the biography of the artist to the art he or she produces? Mr. Broza spoke of the theory that Shakespeare was gay before we read Twelfth Night. Why did this matter if Twelfth Night was truly any good, was worthy of the title “canonical?” Before I read “El Sud,” I was told by my teacher that Borges had an experience in which he hurt his head and had to be rushed to a hospital, much like the main character in that story. I was also given the standard introduction to Borges’ common themes—“the Labyrinth” of life, the bifurcations of time, and the thin line between reality and the world of dreams. Was all this necessary for me to draw out of the story its “depth-charge?” Or did this force me to read the work and ask myself, “Am I getting it? Am I a bona fide scholar of Borges?” Either my teacher believed I was too unskillful a reader to ever draw that out for myself, or she believed Borges was too unskillful a writer for anyone to see these things without a proper Borgesean introduction; either assumption is foolish, arrogant.
I fear students today are led to believe that reading literature is a three-step, mechanized process: (1) read an introduction to an author and his or her themes; (2) read the work in question; and (3) connect parts or passages of the work to the themes previously delineated. Is this the way it should be? Winterson does not think so:
Learning to read is a skill that marshals the entire resources of body and mind. I do not mean endless dross-skimming that passes for literacy. I mean the ability to engage with a text as you would another human being. To recognize it in its own right, separate, particular, to let it speak in its own voice, not in a ventriloquism of yours. To find its relationship to you that is not its relationship to anyone else.
Winterson thinks we can form a unique relationship with a work, but only if we read it properly. If the themes are spelled out beforehand, if what you are supposed to extract from the work is explained to you before you read it, then how can this relationship be unique? The mark of bad—and by that I mean “average”—English teachers is to teach their relationship with the work to their students; indeed, they cannot help it (whether they are conscious of it or not is another question entirely). And what use is biography for our relationship with the work? Biographical connections are merely ones between the artist and the work he or she produced; they are thus irrelevant to us, except as trivia or an afterthought. A memorable sentence of Nietzsche’s comes to mind: “There has never been a time when art was chattered about so much and valued so little” (The Birth of Tragedy, 107). This has much more relevance when we consider that the only way to “value” a piece of literature is to read it, and not to “chatter” about its significance beforehand. I offer Nietzsche’s entire 22nd section of The Birth of Tragedy (pp. 104-108), which sheds further light on the idea of the homogenization of the interpretation of art. My analysis of Nietzsche will never do justice to his thoughtful analysis of the problem itself, so I leave it untouched and invite you to experience it for yourself.
Literary interpretation is inductive: expert opinion should not be our guide. One is meant to look at a piece of literature—if it is a novel, look at its descriptions, its characters, the dialogue, the symbolism—and from it draw out general conclusions, which we sometimes label “themes.” When scientists discover things, they too use inductive reasoning. For them, the text is not a written work; it is, instead, the phenomena of the world surrounding them. Stephen Jay Gould evaluates the scientific experiments of Paul Broca in his essay “Women’s Brains.” Broca discovered that women, on average, had smaller brains, and thus he concluded that they were naturally less intelligent than men. Gould makes it a point to say, more than once, that Broca’s data were indeed sound. “But science,” Gould retorts, “is an inferential exercise, not a catalog of facts. Numbers, by themselves, specify nothing. All depends upon what you do with them”. Gould here says something powerful—that all science is interpretive. We have a tendency to defer to the expert authority, and we often see scientists as simply the reporters of facts. But Gould asserts that scientists are human too, and no scientist ever reports the facts alone. All facts must lead to some sort of conclusion, and that conclusion is an interpretation of recorded data. Broca misinterpreted his data because he worked deductively, if only for a moment: he used the perceived fact that “women are, on the average, a little less intelligent than men” to explain his data, rather than drawing out from his data an explanation grounded in something more than a social prejudice (Gould suggests height, age, cause of death, and body size as possibilities Broca could have explored). Jumping back to Percy’s example—is not the contrast between the girl and the people sitting around the Capehart a contrast between induction and deduction? The girl observes the music itself, and then perhaps concludes that it is beautiful or that it is not. Those inside, on the other hand, already have an idea of what great music is, and they are trying to see if that idea explains that which they are hearing (a process Percy calls “getting it”).
When I am told that Shakespeare is great, I read him only to “get it;” I read him only to deduce that his writing is great because I am told beforehand, and thus I know beforehand, that he is a great writer. This is what makes me question the Canon in the first place. I am skeptical of its legitimacy because I can never judge it fairly. In saying this, I wonder (perhaps fear) if I am one of many “idealistic resenters who denounce competition in literature as in life”, as Harold Bloom once described those students who hope for the expansion of the Canon. I do not think so, for I have come to realize that it is not the Western Canon I question—it is not that I do not believe its member-works and member-authors are great. I do not even necessarily have strong feelings one way or another as to whether the Canon should be “opened up.” What I do believe, however, is that one is making a grave mistake when one teaches the works of the Canon beginning with the premise that the works within it are already great. Doing this is not logically much different from Broca looking at his data beginning with the premise that women are not as intelligent as men. I thus emphasize what should already be evident: the Canon does not make the artwork within it great; it is the artwork that makes the Canon great. By remembering this, our interpretation of these works can be richer and much more complicated than a mere deductive confirmation of expert opinion.
I have come to realize there are two sides to this coin—true, the Western Canon restricts me in some ways, but in many other ways it can also catapult me to new heights. Although I see the need for a change in the way the Canon (and any literature for that matter) is taught, I, like Bloom, think studying those who were widely hailed in the past is essential for exploring deeper and richer thought in the future. “There can be no strong, canonical writing without the process of literary influence”, writes Bloom, who describes literary influence as “a conflict between past genius and present aspiration, in which the prize is literary survival or canonical inclusion”. In this sense, the Canon serves a useful and vital function: to provide for the thinkers of today the fruits of centuries of intellectual labor, to allow us to begin our explorations where they only left off. The Western Canon does not only frame me, but, paradoxically, it also urges me onward, upward. Knowing this, it is difficult for me to feel entirely contained by my Formosa, for there are such great heights to which I may leap, so many undiscovered territories awaiting my arrival.
Bloom, Harold. The Western Canon. Harcourt, 1994.
“Borges, Jorge Luis; Joyce, James; Shakespeare, William.” Columbia Encyclopedia. 6th ed. 2000.
Gould, Stephen Jay. “Women’s Brains.” Encounters: Essays for Exploration and Inquiry. 2nd ed. Ed. Pat C. Hoy II and Robert DiYanni. New York: McGraw-Hill, 2000. 305-10.
Nietzsche, Friedrich. The Birth of Tragedy and Other Writings. Ed. Raymond Guess and Ronald Speirs. Trans. Ronald Speirs. New York: Cambridge UP, 1999.
Percy, Walker. “The Loss of the Creature.” Ways of Reading. Ed. David Bartholomae and Anthony Petrosky. Boston:Bedford, 1996.
Winterson, Jeanette. “The Semiotics of Sex.” Encounters: Essays for Exploration and Inquiry. 2nd ed. Ed. Pat C. Hoy II and Robert DiYanni. New York: McGraw-Hill, 2000. 642-51.
This essay was originally published in 2003 for NYU’s expository writing journal, Mercer Street. I recently re-read it and thought it might be wise to save it for posterity on my personal site. I sometimes think back to some of the ideas I wrestled with in this essay when reading modern works.]]>
I was joking with some friends the other day that my “to read” list keeps growing every day, and it only seems like things are added but never removed. I made the following analogy: it grows by the bucket full and shrinks by the thimble full, to which my coworkers replied, “you need bigger thimbles and smaller buckets.” If only it were that easy.
Unfortunately, I’m not getting used to this 9-to-5 stuff even if it is only 9-to-5. The other day I watched a video of Andy Hertzfield (one of the original software developers on the Mac team at Apple) and he was talking about how when he was my age he would work 80 hour weeks and just poured his heart and soul and to work. And I thought: I can’t do that on my current project. Why should I?
Even though I make good use of my commute time by automatically synchronizing my favorite reads, e-books, PDF’s, etc. along with videos and anything else I can find that’s of worthy consumption, even at a modest 40 hours a week I still feel like my life is dominated by doing someone else’s work. A much different feeling than college, that’s for sure.
I’m a good programmer. No doubt that’s what makes me valuable at Morgan Stanley and even other organizations I could probably work at. I get this feeling inside me though, it’s a feeling like, “Andrew, what the heck are you doing?”
But then I get a second internal dialogue. This one sells me on the rationality of not quitting my job and making a steady income so I can save money and set up a comfortable life for myself. Buy the things I need. Please the people around me. This little devil over my shoulder tells me that if I save money now, I can always change the world later. This little devil warns me if I try to change the world now, I may end up simply not changing it, and end up penniless.
I love software development, both the process, the problems, and even the people. My company may need good software developers, but the outside world needs them more. The world needs people to start fighting for change. But how can I ask that of the world, if I’m not even in the fight myself?
I guess for now I can just keep reading and watching and preparing for the day when I finally have the courage. It’s a cruel thing: being “content” at your job, but unhappy.
It is dawning on me: however “irrational” it may be, leaving is my only option.
See Wall Street Technologists Flee for Startup Life, What One Does, and Parse.ly.
Leaving was, indeed, the only option. How glad I am, in retrospect, that I had the courage!]]>
Poynter smartly rang the bell feeling it was necessary to tell publishers Buzzfeed is a “real news site.” Yes, it is. It’s a news website and it’s the future. Buzzfeed is data driven, and it knows in a real and provable way what its readers want – and it’s growing like gangbusters. As HuffPo proved – and as the history of digital media keeps proving again and again – data rules. Data is how you find audience. Data is how you retain it. Sure, old guard websites deploy analytics to track usage patterns on the sites themselves, but they are missing the boat on analyzing the important stuff – share, search and social – to inform their edit and product decisions.
from Print is Dead, Long Live Print?]]>
Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for – in order to get to the job you need to pay for the clothes and the car, and the house you leave vacant all day so you can afford to live in it.
The election yesterday was important because it was a rejection of the repugnant brand of conservatism that argues government should have no ambition beyond self-immolation and individuals should have no ambition beyond themselves.
Obama is a flawed leader, but he is a deft politician. He has managed to win a national debate about a moral truth in society. One newspaper declared, “A Liberal America”, but it’s more like “A Mixed Economy America”.
For many urban states, the countervailing forces between government and private enterprise has always been understood. States like New York, California, Illinois, and Massachusetts. One does not need to make a strong case for government; people see its benefits all around them. But Obama has shown the debate can even be won in states like Virginia, Wisconsin and Ohio, however narrowly.
The Republicans now find themselves going through some soul-searching, much as the Democrats did in 2000. For a view of how that side views the world, today’s op-ed in the WSJ is a must-read. Notice the alarmism about healthcare — a “liberal entitlement dream” — and the belief that Obama did not earn his victory, so much as stumble into it. For a few years, expect the right wing to repeat, “it is better to be lucky than to be right”.
In a way, Obama’s second term is not so much about the man, but about the idea that government has a role to play in society. Under Obama’s vision, government’s role is not to whither away and die. It is to seize an opportunity to create public domestic good. It is to protect the individual’s liberty — as constitutionally mandated — wherever it may be threatened: economics, health, education, civil rights, the environment.
Making that idea the conventional wisdom would certainly help us conceive a more moral society, even within our lifetimes. Perhaps there is a reason for a little hope.]]>
Michael Scherer, TIME Nov. 1, 2012, 7 a.m.
Justin Elliott, ProPublica, Nov. 1
Kim Barker, ProPublica, Nov. 1
This post was co-published with TIME. It originally appeared on ProPublica.
About a week before election day, a young girl, maybe 10 years old, confronted Colorado House candidate Sal Pace in a pew at his Pueblo church. “She said, ‘Is it true that you want to cut my grandmother’s Medicare?’” Pace remembers.
Like many other Democrats around the country, Pace has spent months trying to rebut the charge that President Obama’s health care reforms hurt Grandma by cutting Medicare by $716 billion. In fact, the same cuts in payments to medical providers found in Obamacare can also be found in the House Republican budget, and they do not directly limit patient care. “I told the little girl that the ads are full of lies and that it’s not right for people to lie,” he said.
What Pace couldn’t tell the girl was who exactly is to blame. That’s because the moneymen behind the outfit spending the most on the Medicare attack ads in Pace’s district will not show their faces. The money is being spent through a Washington-based group, Americans for Tax Reform (ATR), that calls itself a “social welfare” nonprofit, so it does not need to reveal its donors to the public. In mid-October, the group popped up in Pace’s district, which is about the size of New York state, and promised to spend $1.3 million there in the campaign’s final three weeks. In one day, Pace spokesman James Dakin Owens said, “They basically matched us dollar for dollar for everything we had raised in the campaign. It was an 800-pound gorilla that just jumped in.”
This sort of thing has been happening a lot this year in House and Senate races around the country. Candidates have found their modest war chests, filled with checks for $2,500 or less, swamped by outside groups, which have no limits on the donations they can collect. In all, more than $800 million was spent through mid-October on election ads by outside groups, according to the Center for Responsive Politics. Of that total, nearly 1 in 4 dollars is so-called dark money, meaning the identities of the donors remain a secret. Voters watching TV, listening to the radio or receiving direct-mail appeals know only the names of the front organizations that bought the ads, names that range from the well known (U.S. Chamber of Commerce) to the anodyne (Government Integrity Fund) to the borderline absurd (America Is Not Stupid).
Spending by outside groups is nothing new in American politics. The Willie Horton ad attacking Michael Dukakis in the 1988 presidential campaign was paid for by an outside group, as were the Swift Boat Veterans for Truth spots that skewered John Kerry in 2004. But in the past two years, American politics has been transformed by a surge in spending. One fact tells the story: Explicit political-ad spending by outside groups in 2012 is on track to double the combined total spent by outside groups in each of the four elections since 2002.
Ads purchased with untraceable money tend to be among the most vicious. Nearly 9 in 10 dark-money spots are negative, and an analysis by the Annenberg Public Policy Center found that 26 percent of the ads are deceptive, a slightly higher rate than that for ads by groups that disclose their donors’ identities. In a year that has been marked by enormous enthusiasm among wealthy conservatives, there is another trend in anonymous spending: Almost all of it u2014 83 percent, according to one review u2014 has been directed against Democrats. This has some in Obama’s party fretting about the outsize ability of wealthy individuals and institutions to shape the electoral landscape while hiding their identities behind front groups. “If we don’t find some way to respond to this, it’s going to turn us into a plutocracy, where a very few powerful people control the public agenda,” said former Ohio Gov. Ted Strickland.
Most of the secretive spending this year has been coordinated through a close-knit network of veteran Republican strategists in Washington who meet regularly to share polling data and decide which group should focus on what races around the country. “There’s no duplication. There’s no wasteful survey research done,” says Scott Reed, a Republican consultant working on the U.S. Chamber of Commerce advertising effort. “They have totally changed the way you run a campaign.”
The man behind the Colorado ads, Grover Norquist, is not shy about discussing the mechanics behind mounting multimillion-dollar dark-money campaigns. His organization works closely with such other dark-money giants as the U.S. Chamber; Crossroads GPS, co-founded by Karl Rove, a former adviser to George W. Bush; and Americans for Prosperity, founded by Charles and David Koch. While the groups can’t talk to campaigns, they can talk to one another. “For years, coordination was the thing you couldn’t do,” Norquist says about the shift in power from campaigns to outside groups. “Now it’s the thing you are most allowed to do.”
Not only do the groups share strategy, but they can share money as well. Under current rules, many campaign-finance lawyers say, nondisclosing groups must spend less than half their budget on political communications to keep their social-welfare status. In practice, this means there is a cost to anonymity. For every dollar spent on a dark-money political ad, another dollar must be spent on some nonpolitical effort. But by sharing, these groups have found ways to make the money go further.
In 2010, Crossroads GPS gave a $4 million grant to Norquist’s group, ATR u2014 money that was earmarked for nonpolitical activities. Norquist used the money to finance his regular operations, freeing up about $4 million from other sources to spend on political communication. In effect, the nonpolitical Crossroads GPS money was transformed into political money by passing it through ATR. “That is part of the sales pitch you make to donors,” Norquist explained. “If you contribute a dollar to ATR, you are freeing up another dollar that you have already raised.”
Norquist would not say if he had received another large grant from Crossroads GPS this cycle, but he did say he expects the group’s political spending to have nearly tripled in 2012 to about $12 million. As for Pace, Norquist said the ads running in Colorado are meant as punishment for the candidate’s voting record in the state legislature. “What does he think we are going to do?” Norquist asked. “The tax-raising twit.”
Federal oversight of these groups is close to nonexistent. Of the roughly 104,000 people who work for the Internal Revenue Service, about 900 work in the tax-exempt division that monitors this spending. There is little hope of forcing groups like Norquist’s to disclose the identities of their donors. Republicans oppose such steps, and the courts have made it easier for the groups to operate secretly. A 2010 decision by the Supreme Court overturned a law banning unions and corporations from giving directly to efforts intended to influence elections. A subsequent court ruling created super PACs, independent groups that buy campaign ads with unlimited checks from disclosed donors. They now work in tandem with dark-money groups in races around the country.
That means there are likely to be far more candidates facing Pace’s predicament. In California’s Central Valley, José Hernández, a former farmworker and NASA astronaut, has been withstanding blistering attacks on television from outside groups as he challenges Representative Jeff Denham, a freshman Republican. Over the course of the year, Hernández has faced $3.1 million in outside spending against him, by the U.S. Chamber and a group called the American Action Network, neither of which discloses its donors. That is more than twice as much as Hernández has been able to raise for his campaign. “These folks have been throwing everything on the wall to see what will stick,” says Hernández. “It makes our job harder, but a lot of people see through all of this.” The election on Nov. 6 will tell how many.
With reporting by Kim Barker and Justin Elliott/ProPublica—
This flow visualization explores the connection between various PACs and Super PACs, their levels of support, and the campaign organizations they are actually benefitting.
This visualization explores where the money actually comes from: the biggest individual donors to the Super PACs who are sustaining their activities.
Bob Kaiser, an editor of the Washington Post, wrote the following memo to his colleagues in 1992, forecasting (mostly correctly) the next 20 years in computing, the changing content ecosystem, and the remaining role for editors:
I was taken aback by predictions at the conference about the next stage of the computer revolution. It was offered as an indisputable fact that the rate of technological advancement is actually increasing.
[...] packages of text, photos and film [...] could be used to create customized news products at many different levels of sophistication. At the top end, such a product might contain the text (or spoken text) of a Post story on the big news of the day, accompanied by CNN’s live footage and/or Post photographers’ pictures, plus instantly available background on the story, its principal actors, earlier stories on the same subject, etc. All of this could be read on segments of a large, bright and easy-to-read screen (screens are also being improved at a great rate).
[...] A lot of them seemed to regard the content of new media as a given, or something that could be pulled off a shelf and dealt with like a commodity. [...however, ] they are the products of talented reporters and, above all, editors who make informed choices for their readers and viewers.
[...] After this conference I am more convinced than ever that this is a key to our success. Our devoted customers like lots of things about The Post [...m]ost important, they like the package much more than any of its elements. The same is true of Vanity Fair readers or 60 Minutes watchers. Successful media provide an experience, not just bits of information.
[...] Confronted by the information glut of the modern world, I suspect even the computer-comfortable citizens of the 21st Century will still be eager to take advantage of reporters and editors who offer to sort through the glut intelligently and seek to make sense of it for them. Interestingly, when I asked a number of people at the conference what they’d like to be able to do in the electronic future, many spoke of finding all the extant journalism on subjects of interest to them.
No one volunteered that he/she was eager to have access to the full transcript of Congressional hearings and debates, or the full screenplays of new movies, or the list of every transaction on yesterday’s NASDAQ. They all expressed a preference for processed information —- in other words, what we can provide.
Take a look at this proposed Washington Post homepage from 1992. The primary source document is available as a PDF memo. It’s worth reading in its entirety.]]>
In 2007, Paul Graham gave a variety of causes for startup death in How Not To Die. He wrote:
When startups die, the official cause of death is always either running out of money or a critical founder bailing. Often the two occur simultaneously. But I think the underlying cause is usually that they’ve become demoralized. You rarely hear of a startup that’s working around the clock doing deals and pumping out new features, and dies because they can’t pay their bills and their ISP unplugs their server.
The other major thing Graham advises startups not to do: “other things”. Namely:
[D]on’t go to graduate school, and don’t start other projects. Distraction is fatal to startups. Going to (or back to) school is a huge predictor of death because in addition to the distraction it gives you something to say you’re doing. If you’re only doing a startup, then if the startup fails, you fail.
In early 2011, I wrote a post, Startups: Not for the faint of heart, that discussed Parse.ly’s survival through a one-year bootstrapping period after Dreamit Ventures Philly ’09. Since then, I’ve witnessed yet more startup deaths, and especially extended “troughs of sorrow”.
As a result, I’ve had a kind of mild survivor guilt, and have started to look for patterns in causes in the deaths I have witnessed.
How to Survive
I’ve explored various causes of startups death. This is by no means an exhaustive list, but illustrates some patterns I have seen over the years. You may wonder if I have any positive advice to offer about survival, rather than just cataloging the diseases I see in autopsies.
Startups are unknown battlefields full of landmines. Studying failures is, in many ways, a positive instruction. It’s a map of the landmines. As for concrete advice, I can offer this one suggestion: Be persistent.
In other words, to survive, you must continue moving forward. I don’t think startups win because they have smarter staff, better ideas, or a clearer understanding of market trends. Surely, those things help, but they aren’t the main thing.
The way to win is to keep playing.
Acknowledgments: Thanks to Chris Clarke, Ben Taitelbaum (blog), Jack Groetzinger (blog), and San Kim (blog) for reviewing a draft of this post. The reflections were also informed by a panel at beCamp 2012.
Image Credits: The Startup Curve | Fake Grimlock’s You Must Burn
Translations: Chinese | Have you translated this post? Contact me.]]>
In it, the authors write:
Reporting is sufficient for showing whether or not we had a good month, but not insightful enough to tell us what we were doing right, where we went wrong, and what we might replicate and discard to perform better in the next reporting period.
Relying exclusively on the pageview — an important and dominant metric in online media — leads to some startling conclusions.
The problem with pageviews is that they are a lagging indicator and the lowest common denominator metric. Our experience suggests that the only reliable method for increasing pageviews is reporting on murder, mayhem, and scandal. It’s simple. Pageviews go up when something bad happens. Unless your reporters want to make a career out of late-night arson, you’ll want a way to even out the demand for your product.
This is an important conflict. There’s a difference between content that draws in the most users, and content that makes waves in the media narrative, re-inforces a positive association with your site, and puts a spotlight on your brand. Originality may not draw the most pageviews, but it may draw the most important pageviews.
The article lays out a theory for how to segment users based on “top”, “middle”, and “bottom” of funnel. This is a funnel of one-time visitors thru paid subscribers / brand champions.
The authors write:
The goal is to move people from the top to the middle of the funnel — to increase the size of the loyal, engaged audience over time. This is not about getting a one-time boost in visits or pageviews. Achieving this takes a combination of on-site marketing along with some additional filters.
Read the full article for more good nuggets.]]>
The information is there, but it’s there to a fanatic, you know, somebody wants to spend a substantial part of their time and energy exploring it and comparing today’s lies with yesterday’s leaks and so on. That’s a research job and it just simply doesn’t make sense to ask the general population to dedicate themselves to this task on every issue.
Very few people are going to have the time or the energy or the commitment to carry out the constant battle that’s required to get outside of MacNeil/Lehrer or Dan Rather or somebody like that. The easy thing to do, you know — you come home from work, you’re tired, you’ve had a busy day, you’re not going to spend the evening carrying on a research project, so you turn on the tube and say, “it’s probably right”, or you look at the headlines in the paper, and then you watch the sports or something.
That’s basically the way the system of indoctrination works. Sure, the other stuff is there, but you’re going to have to work to find it.
The above quote was from an interview captured in the film, Manufacturing Consent, with MIT professor and political writer Noam Chomsky.
It’s amazing how technology can make dramatic changes over the course of just ~20 years. WordPress, Google News, Wikipedia, RSS/Atom, Twitter, Wikileaks. Millions of websites with terabytes of information. It’s still a “research job” to sort through it all, but no longer is accessibility a core problem.
Chomsky was recently interviewed about the purpose of education (you can watch it on Blip.tv here). In it, he calls widespread Internet technology a “hammer” — a tool you can use to build great things or cause great injuries (possibly, to yourself). He believes education is key to using the Internet for its positive qualities. He states:
The Internet is extremely valuable if you know what you’re looking for. [...] If you know what you’re looking for, you have a framework of understanding, which directs you to particular things (and sidelines lots of others). [...] Of course, you always have to be willing to ask: “is my framework the right one?” [...] But you can’t pursue any kind of inquiry without a relatively clear framework that is directing your search, helping you choose what’s significant, what isn’t, what ought to be put aside, what ought to be pursued, and so on.
You can’t expect someone to become a biologist, say, by giving them access to Harvard University’s Biology library, and saying, “Have at it!” The Internet is the same, except magnified enormously.
If you don’t understand or know what you’re looking for — if you don’t have some conception of what matters, always, of course, with the proviso that you’re willing to question it if it seems to be going in the wrong direction — then exploring the Internet is just picking out random factoids that don’t mean anything.
Behind any significant use of contemporary technology — the Internet, communication systems, graphics, whatever it might be — there must be some well-constructed, directive, conceptual apparatus, otherwise it is very unlikely to be helpful; in fact, it may turn out to be harmful.
Random exploration through the Internet turns out to be a cult-generator. Pick a factoid here, a factoid there, and somebody else re-inforces it, and all of the sudden you have some crazed picture with some factual basis, but nothing to do with the world. You have to know how to evaluate, interpret, and understand.
Cultivating that capacity — to seek what’s significant, always willing to question whether you’re on the right track — is what education is always going to be about, whether using computers & the Internet, or pencil & paper & books.
Last year, I attended one of Tufte’s one-day courses in NYC. I even showed him an early, prototype version of Parse.ly Dash. His feedback — even if it came quickly in 5 minutes — was helpful in understanding how to move the product forward.
I thought, when attending his presentation, that my main takeaways would be in the field I associated with him, namely, information visualization. But actually, my main takeaways were about communication, teaching, and journalism.
Tufte is an eloquent speaker who chooses his words carefully. He realizes that each medium has a capacity to communicate an idea. His spoken words are about intimacy and impact. His written words are about elaboration and illumination. His accompanying graphics are moments for reflection and self-motivated information discovery.
In short, watching him teach a course is to realize how anemic many of us are with regard to communication. Tufte views the tools of communication as having different strengths and weaknesses. Too often, modern professionals have problems combining these tools effectively. For example, limiting ourselves to spoken word (phone calls), written word without graphics (e-mail), and summary lists (project management tools).
In our distributed team and engineering-oriented culture at Parse.ly, we often talk about code-as-communication. We try to communicate with actual code — whether fully functional, prototyped, or stubbed — since this often best communicates software ideas with more density and clarity than an auxiliary artifact. But one can’t use code to communicate design considerations, process/workflow, or strategy; for these, we must resort to more universal forms of communication.
Take note: no matter how anemic and limited our communication approaches end up being, outside of a corporate setting, no one ever thinks to communicate via PowerPoint. I don’t give my family an update in a slide deck. I don’t tell my girlfriend how my day went in bullet points. This should already tell us that there is something wrong with that particular medium.
Tufte views teaching as effective communication with an aim to inform and cause reflection on the part of the student.
PowerPoint is, in Tufte’s words, “pushy”. It “sets up a dominance relationship between speaker and audience, as the speaker makes power point with heirarchical bullets to passive followers. Such aggressive, stereotyped, over-managed presentations[...] are characteristic of hegemonic systems…”
By contrast, teachers “seek to explain something with credibility”. He goes on, “the core ideas of teaching — explanation, reasoning, finding things out, questioning, content, evidence, credible authority not patronizing authoritarianism — are contrary to the cognitive style of PowerPoint, and the ethical values of teachers differ from those engaged in marketing.” (Read more in his Metaphors for Presentations.)
Tufte draws many of his goals from the ideals of journalism, especially print journalism. A well-written news article has a certain structure, style, and presentation priority (see e.g. the inverted pyramid and AP Stylebook). The frontpages of newspapers are made for scanning and are densely packed with headlines, text, and images. The printed pages of magazines combine text, images, charts, tables, and graphs as necessary, and use color and size to draw attention to one area or another.
He is especially fond of scientific journals and magazines, such as Nature. I believe his respect for these media organizations above others comes from the unique challenge they face: to take something inherently complex — such as a scientific discovery or result of research — and present it to an audience who might have limited understanding. These organizations also operate under the constraint of limited space — often devoting only a single page to a complex topic. Tufte’s point, here, is that if the journalists at Nature can condense an important scientific discovery into a page of information and have it still be readable and comprehensible, then you have no excuse for giving one-hour-long PowerPoint presentations that convey nothing of substance.
PowerPoint at NASA
The height of Tufte’s contempt for PowerPoint probably came when he was asked by NASA to review the cause of the Columbia Shuttle Disaster from the point-of-view of a corporate communication failure.
In his scathing report, “PowerPoint does Rocket Science”, Tufte takes apart the PowerPoint presentations that were made by Boeing engineers to NASA management when assessing the damage caused by the foam debris detected at launch. In it, he writes:
In the reports, every single text-slide uses bullet-outlines with 4 to 6 levels of hierarchy. Then another multi-level list, another bureacracy of bullets, starts afresh from a new slide. How is it that each elaborate architecture of thought always fits exactly on one slide? The rigid slide-by-slide hierarchies, indifferent to content, slice and dice the evidence into arbitrary compartments, producing an anti-narrative with choppy continuity. Medieval in its preoccupation with hirarchical distinctions, the PowerPoint format signals every bullet’s status in 4 or 5 different simultaneous ways: by the order in sequence, extent of indent, size of bullet, style of bullet, and size of type associated with the various bullets. This is a lot of insecure format for a simple engineering problem.
It is worth reading every word of Tufte’s analysis.
In erecting an edifice of bureaucratic corporate communication around the simple technical content of the problem at hand, NASA managers and engineers managed to bury the lede: that their safety models were insufficient to predict with confidence the integrity of the shuttle and that there remained serious doubts among engineers of the shuttle’s overall safety.
So, the next time you are thinking about giving a PowerPoint presentation, you may want to ask yourself: Is this the best way I can communicate my ideas? Is this how I would want this material taught to me? How would a journalist elaborate my ideas when presenting them to the public?]]>
Things fully distributed teams are happy to live without:
Things fully distributed teams do miss out on:
… far more money can be made out of people who want to write novels than out of people who want to read them. And an astonishing number of individuals who want to do the former will confess to never doing the latter. “People would come up to me at parties,” author Ann Bauer recently told me, “and say, ‘I’ve been thinking of writing a book. Tell me what you think of this …’ And I’d (eventually) divert the conversation by asking what they read … Now, the ‘What do you read?’ question is inevitably answered, ‘Oh, I don’t have time to read. I’m just concentrating on my writing.’”
When I was younger, I thought there was no greater ambition than becoming the writer of the next great novel. However, this article made me reflect on my own media consumption habits, and what a small audience I would affect even if I did write such a work.
I think similarly about painting and sculpture and classical music. These expressive forms are certainly demanding of skill, but who is the audience?
It would be unfair to consider television programming or film the new novel. Certainly, these media have the capacity to change people’s ideas and have a wide impact. But, even with the technology and cost barriers breaking down on film production, it lacks the visceral nature of writing. Anyone with an idea and a pen (or laptop) can pursue writing, but you have to be a technician of sorts to make a film.
By this disqualification, software — though increasingly recognized as an art form — is definitely not it, either. So, what is?]]>
Three months ago, I put together a summary of the best links discussing the ouster as it happened. Today, combined with the NYTimes piece, it’s easy to get a full context and perspective on this story.