We released it for our talk, “Real-time streams & logs with Apache Kafka and Storm” at PyData Silicon Valley 2014.
An initial release (0.0.5) was made. It includes a command-line tool,
sparse, with the ability to set up and run local Storm-friendly Python projects.
If you run
sparse quickstart, it will quick-start a local Storm + Python project using the
streamparse framework, using a project template. The basic example will implement a simple word count against a stream of words. Going into that directory and doing
sparse run will actually spin up a local Apache Storm cluster and execute your topology of Python code against the local cluster.
In short: it’s never been easier to develop with Storm and Python, thanks to streamparse. In the coming weeks and months, we plan to bundle a lot more functionality which will make it easier and easier to use Python’s excellent data analysis stack atop real-time streams of data using Storm.
How it works
Under the hood,
streamparse is a new implementation of Storm’s multi-lang protocol for Python. For doing local running of Storm topologies, it leverages the
lein build tool, which is the library’s only local requirement. This is used to resolve dependencies to Storm itself.
A small command-line tool (which happens to be written in Clojure) is bundled with streamparse. This tool handles 100% of the Java interop for you, as well as compiling and validating your topology definitions. This leverage Storm’s extremely handy Clojure DSL. This gets rid of all the “rough edges” of the fact that Storm is a JVM-based technology, while also allowing you to mix Java, Clojure, Python, Ruby, or any other language that supports the multi-lang protocol in a single Storm topology.
sparse tool will not only download the full Storm framework locally, but it will also let you spin up clusters with 2, or 10, or 100 parallel processes in complex data flow topologies within seconds. It will let you debug these topologies locally using slices of your real-time data. And, it will package your topologies as an uberjar for submission to a production Storm cluster — which could be running across 10′s or 100′s of machines — without you lifting a finger or learning anything about Java or Clojure.
All of streamparse’s extensions will leverage fabric, invoke, and virtualenv to manage remote Storm worker machines and synchronize Python dependencies. Configuration of local, beta, and production environments is handled with a simple
streamparse is currently being developed by Mike Sukmanowsky (@msukmanowsky), Keith Bourgoin (@kbourgoin), and me (@amontalenti), though we are looking for other contributors. Also, if you’re interested in more information on Parse.ly’s contributions to open source, our presentations at conferences, and what it’s like to work on our team, check out the Parse.ly Code & Tech page.