logpad

TLS and Python: Stripe Open Source Retreat

Programmed to Receive

Earlier this year, Stripe announced an Open Source Retreat in which it offered to fund development on a couple of open source projects for 3 months. I sent in a proposal to work on a pure-python implementation of the TLS protocol and got selected. As a result, I am working on this full time for three months, since July 28. I just completed my first week of the Open Source Rereat, and this post is about that week.

Heavy Heads and Dim Sights

As Ivan Ristić says in his latest book, though the history is full of major cryptographic protocols with critical design flaws, there are even more examples of various implementation problems in well­known projects. Certificate validation flaws and connection authentication failures, insufficient error checking , basic constraints check failure, random number generation failures, protocol downgrade attacks, truncation attacks , and the more recent Heartbleed vulnerability are only a few examples.

The major cryptographic primitives are well understood and, given choice, no one attacks them first. But the primitives are seldom useful by themselves; they need to be combined into schemes and protocols and then implemented in code. These additional steps then become the main point of failure, which is what this project aims at avoiding, by means of a carefully designed Transport Layer Security (TLS) v1.2 implementation in PyCA's Cryptography library for Python. (Note that at present, this project is being worked on in a separate repository under the PyCA group.)

I spent the first couple of days of the retreat to set up the test infrastructure. Glyph, cyli, radix, Alex Gaynor, dreid and I spent a good part of the week discussing and hacking on the project.

One such really useful conversation was about declarative parsers. It was a fun discussion, and so I will summarize it for you.

For a well designed network protocol you should be able to ask two questions, "Are these bytes a valid message?" and "Is this message valid for my current state?"

So, when we talk about parsing a protocol, we're mostly talking about answering the first question. And when we talk about processing we're talking about answering the second question. An explicit state machine makes processing much simpler by specifying all the valid states and transitions and inputs that cause those transitions. A declarative parser makes parsing much simpler by specifying what a valid message looks like (rather than the steps you need to take to parse it). By saying what the protocol looks like, instead of how to parse it, you can more easily recognize and discard invalid inputs. And if you do all the message parsing before you try processing any messages it becomes easier to avoid strange state transitions in your processor, transitions that could lead to bugs.

The gist being that you don't want to breed weird machines. How you describe your protocol depends a lot on the tools you're using. We decided to use Construct to parse TLS record structures.

We also talked about the history of software licenses, software laws, and guerrilla warfare. That was interesting too, but I won't summarize it for you, because some things are only fun when discussed in person. (Sorry! Be here next time.)

Such a Lovely Place!

Stripe has an awesome office, and I've got the best desk. Really. And a nice big monitor. Oh and lots of coffee. People have been nice to be so far, and no one has given me a stink-eye for daring to speak of TLS.

My week was all the more fun thanks to all my friends in San Francisco who worked hard to make me feel at home, and ensured I feed myself well. I realized how much I tire everyone out when this Sunday Glyph exclaimed, "Uh, we'll be less energetic tod... OH THANK GOD SHE DOESN'T HAVE HER COMPUTER WITH HER!" (Apologies!)

Another huge chunk of thanks goes to all the folks of PyCA who regularly reviewed my code, making writing software more rewarding.

Up ahead in the distance...

Next up, I'll be focusing on the state machines in TLS. Along the way, I'd also be writing more TLS record parsers. You can follow the development on Github, and feel free to participate and contribute!

blog