I’ve been sharing these documents with friends who ask me, “I want to start programming and build a web app, where do I start?”. These resources have also been useful to existing programmers who know C, C++ or Java, but who want to embrace dynamic and web-based programming.
Python is the core programming language used at Parse.ly. It also happens to be a quickly-growing language with wide adoption in the open source community, and it is a very popular choice for web startups.
I’ve written a blog post with some original materials for learning Python, import this — learning the Zen of Python with code and slides.
This is a good starting point, but you may also find these resources very helpful:
- For absolute beginners, “Learn Python the Hard Way”. This teaches Python using a series of programming examples, but it really assumes you have no programming background whatsoever. After going through the examples in LPTHW, it may be a good idea to supplement your understanding with Think Python.
- For existing programmers, “Dive into Python 3″. This teaches Python from the starting point that you have already programmed in a mainstream language like C or Java, and want to know what makes Python really cool/good. Similar audience to my “Zen of Python” slides. Note that this tutorial teaches Python 3, but most people still use Python 2.7. See Python2orPython3 on Python wiki to see the differences.
- For advanced programmers, “Python Essential Reference, 4th Edition”. Unfortunately, this book costs money, but it’s basically the best book on Python on the market, and it’s very up-to-date. It’s very dense and weighs in at 717 pages, so this is only for those who want to go deep on Python.
- For cheap advanced programmers, “Official Python Tutorial”. Though the Python tutorial doesn’t have the best narrative style nor the best real-world examples, for advanced programmers, it will teach the reality of the language in a comprehensible way. And, it’s free.
Since HTML is basically useless without CSS, you can get by with a short tutorial on HTML and then more advanced tutorials on CSS styling. Here’s what I recommend.
Learn the basics of HTML from MDC’s Introduction to HTML and Wikipedia’s page on HTML. This is a rare case where using Wikipedia is actually a perfect way to get the right background because half the battle with understanding HTML is understanding its history.
An excellent new guide to HTML & CSS together has been published by Shay Howe in 2013.
These look like a great first stop.
You can also use these dedicated resources for CSS specifically:
- For absolute beginners: Use W3C’s official tutorial on Starting with HTML + CSS. This was written all the way back in 2004, but provides the basics with screenshots and real code examples, so is a great way to get started.
- For existing programmers: Mozilla has done a great job putting together a quick and readable tutorial that gives you the basics at a glance.
- For advanced programmers: You’ll want to buy the best book on the subject, CSS Mastery. It has the best explanation of the box model and browser rendering engine’s that I’ve seen, and covers all the edge cases nicely.
- For cheap advanced programmers: You’ll need to look over the MDC (Mozilla) CSS Reference. Pay particularly close to articles on the Box Model and the Visual Formatting Model.
OK: take a deep breath. You’re learning the building blocks of a modern web application: backend / frontend programming languages and their associated code libraries. Let’s aim to solidify this knowledge using modern web frameworks.
Putting it all together: Python web frameworks
Put simply, a view function lets a web developer respond to user queries and interactions with dynamically rendered response pages. Typically, a view function will query a database, which is where persistent data may live that the user is aiming to retrieve. There are a slew of database technologies and depending on the requirements of a web application, they may combine several database technologies to respond to requests. It will then take the retrieved information and render it into a page that the user can view. This rendering process is the job of the template engine, which is able to plug dynamic values into page templates. Google likely has a single result page template, but depending on your query (and potentially, user profile data), the template will be populated with different results and advertisements. Finally, the web server is a piece of software that receives the requests (e.g. responds to google.com and to the URL for searching), executes the view functions, and returns the responses to the browser. The web server is like the glue that binds everything together.
Now that you understand these basics, you have to face an unfortunate truth: lots of different web frameworks exist that provide this functionality.
Since you only want to develop web apps fast, I’m only going to briefly cover three of these frameworks, and their relative trade-offs. These are: Django, Tornado, and Flask.
Django is, by far, the most popular web framework for Python. It has excellent documentation and is very opinionated in how you should structure your web application. There are also a number of books written about it and a slew of open source modules and extensions.
Django has been used for a number of use cases: enterprise software-as-a-service web applications; consumer-facing, page-oriented software; rapid web application prototypes; content management systems; the list goes on and on.
Let’s evaluate it on the important functionality areas above:
- View Functions: They are defined either as plain Python functions or classes defined inside modules, typically a module called “views.py” living within a Django “application”, which is nothing more than a Python package that contains that file. They are mounted to certain URLs using a special URL dispatcher using regular expression patterns.
- Template Engine: Django has its own template engine designed to be user-friendly even to non-programmers. In this respect, the language is somewhat limited and quirky, and does not really re-use your knowledge of Python for templating. Many advanced programmers end up using an alternative template engine with Django, such as Jinja2.
- Database: Django is very opinionated about your database engine. It was written with the idea that everyone would use a SQL database system of some sort, such as MySQL, Postgres, or SQLite. It provides an object-relational mapper system, or ORM, which makes it easy to define new data storage objects through what are called Models. It also provides an excellent and customizable automatic admin interface that allows instance data to be created and managed using web-based interface, complete with support for search, filtering, bulk operations, and the like. Despite these advantages, the Django ORM is derided as being a poorer codebase with a worse architecture than the more widely respected SQLAlchemy project.
- Web Server: There is no web server bundled in Django, save a development server not meant to be used in production. This leaves it up to you to integrate Django with a number of WSGI-compliant web servers that are out there, including Apache, nginx, gunicorn, and others.
At Parse.ly, we use Django for our main web application, but swap the default template engine for Jinja2. Though we have a Postgres database that benefits a bit from Django’s ORM and admin interface, the bulk of our data is stored in MongoDB, Redis, and Solr, and thus does not leverage the ORM at all. (See my related article, “On multi-form data”, for an explanation of why we combine databases.) Further, for other parts of our system that require access to our Postgres DB, we use SQLAlchemy. We run Django under nginx and uwsgi.
Tornado is a web framework that was released by Facebook after its acquisition of Friendfeed. It has a significant architectural difference from Django in that it is built to solve the C10k problem: the challenge of building web servers to handle thousands of simultaneous web connections at one time. As a result, it bundles its own web server and expects you to use it.
Traditional web frameworks like Django expect that every web request will be handled by a separate web server thread. With thousands of simultaneous connections, this can overwhelm your web server with excessive memory usage, causing the server to slow down or even crash. Tornado is written the same way as other asynchronous web servers like nginx and NodeJS. As a result, it has the same scaling benefits: it can handle thousands of concurrent requests while keeping memory of your server stable.
This architectural difference has ramifications throughout your codebase, however. Tornado view functions tend to look different, and the usage of databases tends to be entirely different, too. So this isn’t the best choice for beginners, unless you know for a fact that your application is going to involve lots of concurrent connections from the get-go. Examples of this include: web chat systems, telephony applications, API servers, mobile backends, or some classes of “real-time” web applications.
Tornado has a nice overview document intended for beginners. The recent O’Reilly book, Introduction to Tornado, is also an excellent (and quick) read that goes through most the facilities available in the framework.
- View Functions. Tornado view functions are implemented via classes known as Handlers, which are subclasses of
tornado.web.RequestHandler. Similarly to Django, there is a URL dispatcher called the Application that maps URL regex patterns to Handlers. Unlike Django view functions, Tornado view functions are not meant to do much work. The reason for this is that all view functions run in a single thread, and thus any long-running code will slow down your entire web server. Instead, the responsibility of the function is to delegate work to other asynchronous services handled by Tornado’s server. The primary candidate here is to have Tornado make an asynchronous HTTP request to some other service. There are also some databases and database drivers that are written in an “asynchronous” style which you can use reliably with Tornado, but in general, the idea is to avoid database queries in your view functions.
- Database: As mentioned earlier, Tornado doesn’t expect your view functions to hit a database often since this could slow your entire web server down. As a framework, it expects data querying to be “your problem”. There is a small wrapper for MySQL included, but this almost seems like an afterthought. Instead, I have seen most people put Tornado in front of other HTTP services that might be written using blocking frameworks like Django or Flask. I have also seen usage of async-friendly data stores such as MongoDB, CouchDB, and Solr. CouchDB and Solr both use HTTP as the client interface, so it is easy to hit these directly using Tornado’s built-in HTTP client. MongoDB has perhaps the best support: they shipped an official asynchronous driver called Motor, meant for use specifically with Tornado. Async drivers will likely become more common in the Python 3.x era, as Guido van Rossum (Python’s creator) is working on PEP 3156 to unify all of the async/event-driven Python frameworks.
- Web Server: Tornado bundles its own web server, which is perhaps the most powerful and convenient aspect of the framework. The beautiful thing about this is that the exact same web server you run locally for development is the one you can run in production.
Flask is the newest web framework of these. The author of the framework has a PyCon presentation explaining its motivation. Funny enough, it was built out of an April Fool’s joke where the author “zipped up” two of his existing projects — Jinja2 (template engine) and Werkzeug (HTTP library) — and glued them together with a small Python file, thus declaring it a new web “microframework”.
The joke became a real open source project which is notable for its simplicity, respect of Python’s facilities, strong documentation, and ease of use. Due to its reliance on existing, high-quality Python modules, the actual web framework is only approximately 1,000 lines of code. The quickstart application requires only a single Python file which, when run, gives you a working development web server that renders a dynamic response. For all these factors and more, it is my preferred web framework for new web applications, especially those that wouldn’t benefit from Django’s admin interface or Tornado’s concurrent request scaling.
- View Functions: View Functions are as simple as it gets in Flask. They are simply plain Python functions. They are mounted to URL patterns using a Python Decorator called
route. This includes support for Variable Rules which tend to be much more comprehensible compared to regex-based routes as in Django and Tornado. Similarly to Django, view functions in Flask are where the bulk of your application’s logic will go, including things like database queries.
- Template Engine: Flask is meant to be used with Jinja2, an excellent and well-documented template engine that is also widely used by Django developers as a drop-in replacement for that framework’s template engine. It strikes a balance between Django’s template language, meant to be understood by non-programmers (see Template Designer Documentation) while also having good interoperability with Python code and support for a wide range of control structures.
- Database: This is the least opinionated part of the Flask framework; it makes no recommendation as to what database to use, considering this to be beyond the scope of a core web framework. That said, the Flask Extension Registry contains some modules that help integrate Flask with this or that database technology, such as Flask-SQLAlchemy (provides support for all SQL data stores) and Flask-PyMongo (provides support for MongoDB connections). However, you can just easily query databases by simply importing appropriate Python client libraries which often exist for that particular DB — and that is, indeed, “The Flask Way” of doing this.
- Web Server: Though no web server is bundled in the framework, it can be deployed even more easily than Django to any number of WSGI-compliant web servers, as described in the Deployment Options section of their documentation. The built-in debug-mode development server is extremely handy for local development, supporting full stack traces and even an embedded Python interpreter for inspecting the state of variables at the time of the web server crash.
Conclusion: pick a stack
This article has lots of resources that can help you pick a stack, but I have some opinions about how you can get started easily.
- Use Python 2.7. Python 3 isn’t fully ready yet, but will be soon.
- Target HTML4 and CSS2. Though you need to be aware of HTML5 and CSS3, the lion’s share of web development today targets the earlier versions of these standards, and the changes to them are mostly incremental.
- Start with Flask, switch as necessary. If you are just getting started with web development, you’ll be able to assemble an application with the above components easily in Flask. You may not know yet whether your application requires thousands of concurrent requests (as provided by Tornado) or whether you would benefit extensive open source plugins / a full-featured data model framework (as in Django). So defer those decisions to when you are better able to make them in an informed and careful way.
- Pick the simplest database possible, upgrade later. Since Flask doesn’t impose any database on you, you can choose to pick the simplest database that could possibly work. Some good candidates for your early days are SQLite, MongoDB, or Redis. As you start to understand your requirements more, you may need to upgrade to a full-fledged SQL RDBMS such as Postgres or a full-text index such as Solr. But you won’t know for sure until you fully understand the form of your data, so why lock into heavyweight solutions up-front? (see my related article, “On multi-form data”)
- Pick a host that matches your system administration skill level. You will need a Linux server available to you to deploy your app, and for that I suggest simple VPS providers such as Rackspace Cloud or Linode. You will hear a lot of people mention Amazon EC2 but I recommend you only switch to EC2 later once you understand its quirks and tradeoffs vs traditional providers (as well as its benefits). Setting your server up may require some knowledge of UNIX and system administration which is beyond you. If that is the case, you can consider a shared hosting provider such as Webfaction for your early days, which has good Python support and prefab deployment setups for Django, Flask, and nginx available. If you don’t mind paying a premium to have someone else host all your infrastructure for you, you may also want to consider a PaaS provider such as Heroku or dotCloud. Personally I don’t recommend these services for economic reasons, but they have many proponents.
- Develop locally, deploy simply. Each of these web frameworks have workflows for developing locally. You should use those until you get comfortable with the frameworks. That said, the day will come where you want to see your web application live on a real web server. For that moment, if you are developing with Django or Flask, I recommend you deploy to uWSGI behind nginx. Flask has docs for this, so does Django. To actually push your code to your servers, I recommend you start with extremely simple UNIX tools: rsync and ssh. With rsync, you can copy your Python project quickly to your server, and thanks to rsync’s incremental copy, it will only copy changed files. For example:
rsync -Pav myproject/ remoteserver:myproject/. With ssh, you can execute remote commands on your server such as
ssh remoteserver sudo restart nginxto restart your nginx web server. Once you start to need fancier deployment options, you can upgrade to Fabric, a Python-based deployment tool, replacing your rsync command with project tools and replacing your ssh commands with calls to run().
- Don’t develop on Windows. This is unfortunate but true. The lack of support for UNIX tools on Windows puts it at a significant disadvantage for building modern web applications that deploy to Linux servers running Python. If you are running Windows on your development workstation and don’t feel you have a choice about the matter, you will want to investigate virtualization options to do your development under a Linux guest virtual machine. These include virtualbox, vagrant, and VMWare.
I look forward to seeing the web apps you build and deploy.
UPDATE: I’ve converted this blog post into a full-blown tutorial. You can find the code, slides, and video in this post. Covers all my “suggested” technologies, like Twitter Bootstrap, jQuery, Flask, Jinja2, MongoDB, Fabric, nginx, supervisor, uWSGI, etc.
Do you want to do modern Python web development on a daily basis, working on some of the most interesting problems at the intersection of large-scale data analysis and information visualization? Check out Parse.ly — we’re hiring! Engineers work ideally in Eastern or Central Time Zone, as this is a remote position for our fully distributed team.