TODO:
- Clarify terminology: SSL vs TLS
Mitmproxy is an enormously flexible tool. Knowing exactly how the proxying
process works will help you deploy it more creatively, and let you understand
its fundamental assumptions and how to work around them. This document explains
mitmproxy's proxy mechanism by example, starting with the simplest explicit
proxy configuration, and working up to the most complicated interaction -
transparent proxying of SSL-protected traffic in the presence of SNI.
Configuring the client to use mitmproxy as an explicit proxy is the simplest
and most reliable way to intercept traffic. The proxy protocol is codified in
the [HTTP RFC](http://www.ietf.org/rfc/rfc2068.txt), so the behaviour of both
the client and the server is well defined, and usually reliable. In the
simplest possible interaction with mitmproxy, a client connects directly to the
proxy, and makes a request that looks like this:
GET http://example.com/index.html HTTP/1.1
This is a proxy GET request - an extended form of the vanilla HTTP GET request
that includes a schema and host specification, and it includes all the
information mitmproxy needs to proceed.
1 |
The client connects to the proxy and makes a request. |
2 |
Mitmproxy connects to the upstream server and simply forwards
the request on. |
The process for an explicitly proxied HTTPS connection is quite different. The
client connects to the proxy and makes a request that looks like this:
CONNECT example.com:443 HTTP/1.1
A conventional proxy can neither view nor manipulate an SSL-encrypted data
stream, so a CONNECT request simply asks the proxy to open a pipe between the
client and server. The proxy here is just a facilitator - it blindly forwards
data in both directions without knowing anything about the contents. The
negotiation of the SSL connection happens over this pipe, and the subsequent
flow of requests and responses are completely opaque to the proxy.
## The MITM in mitmproxy
This is where mitmproxy's fundamental trick comes in to play. The MITM in its
name stands for Man-In-The-Middle - a reference to the process we use to
intercept and interfere with these theoretially opaque data streams. The basic
idea is to pretend to be the server to the client, and pretend to be the client
to the server. The tricky part is that the Certificate Authority system is
designed to prevent exactly this attack, by allowing a trusted third-party to
cryptographically sign a server's SSL certificates to verify that the certs are
legit. If this signature is from a non-trusted party, a secure client will
simply drop the connection and refuse to proceed. Despite the many shortcomings
of the CA system as it exists today, this is usually fatal to attempts to MITM
an SSL connection for analysis.
Our answer to this conundrum is to become a trusted Certificate Authority
ourselves. Mitmproxy includes a full CA implementation that generates
interception certificates on the fly. To get the client to trust these
certificates, we register mitmproxy as a CA with the device manually.
## Complication 1: What's the remote hostname?
To proceed with this plan, we need to know the domain name to use in the
interception certificate - the client will verify that the certificate is for
the domain it's connecting to, and abort if this is not the case. At first
blush, it seems that the CONNECT request above gives us all we need - in this
example, both of these values are "example.com". But what if the client had
initiated the connection as follows:
CONNECT 10.1.1.1:443 HTTP/1.1
Using the IP address is perfectly legitimate because it gives us enough
information to initiate the pipe, even though it doesn't reveal the remote
hostname.
Mitmproxy has a cunning mechanism that smooths this over - upstream certificate
sniffing. As soon as we see the CONNECT request, we pause the client part of
the conversation, and initiate a simultaneous connection to the server. We
complete the SSL handshake with the server, and inspect the certificates it
used. Now, we use the Common Name in the upstream SSL certificates to generate
the dummy certificate for the client. Voila, we have the correct hostname to
present to the client, even if it was never specified.
## Complication 2: Subject Alternate Name
Enter the next complication. Sometimes, the certificate Common Name is not, in
fact, the hostname that the client is connecting to. This is because of the
optional Subject Alternate Name field in the SSL certificate that allows an
arbitrary number of alternate domains to be specified. If the expected domain
matches any of these, the client wil proceed, even though the domain doesn't
match the certificate Common Name. The answer here is simple: when extract the
CN from the upstream cert, we also extract the SANs, and add them to the
generated dummy certificate.
## Complication 3: Server Name Indication
One of the big limitations of conventional SSL is that each certificate
requires its own IP address. This means that you couldn't do virtual hosting
where multiple domains with independent certificates share the same IP address.
In a world with a rapidly shrinking IPv4 address pool this is a problem, and we
have a solution in the form of the Server Name Indication extension to the SSL
and TLS protocols. This lets the client specify the remote server name at the
start of the SSL handshake, which then lets the server select the right
certificate to complete the process.
SNI breaks our upstream certificate sniffing process, because when we connect
without using SNI, we get served a default certificate that may have nothing to
do with the certificate expected by the client. The solution is another tricky
complication to the client connection process. After the client connects, we
allow the SSL handshake to continue until just _after_ the SNI value has been
passed to us. Now we can pause the conversation, and initiate an upstream
connection using the correct SNI value, which then serves us the correct
upstream certificate, from which we can extract the expected CN and SANs.
## Putting it all together
Lets put all of this together into the complete explicitly proxied HTTPS flow.
1 |
The client makes a connection to mitmproxy, and issues an HTTP
CONNECT request. |
2 |
Mitmproxy responds with a 200 Connection Established, as if it
has set up the CONNECT pipe. |
3 |
The client believes it's talking to the remote server, and
initiates the SSL connection. It uses SNI to indicate the hostname
it is connecting to. |
4 |
Mitmproxy connects to the server, and establishes an SSL
connection using the SNI hostname indicated by the client. |
5 |
The server responds with the matching SSL certificate, which
contains the CN and SAN values needed to generate the interception
certificate. |
6 |
Mitmproxy generates the interception cert, and continues the
client SSL handshake paused in step 3. |
7 |
The client sends the request over the established SSL
connection. |
7 |
Mitmproxy passes the request on to the server over the SSL
connection initiated in step 4. |
When a transparent proxy is used, the HTTP/S connection is redirected into a
proxy at the network layer, without any client configuration being required.
This makes transparent proxying ideal for those situations where you can't
change client behaviour - proxy-oblivious Android applications being a common
example.
To achieve this, we need to introduce two extra components. The first new
component is a router that transparently redirects the TCP connection to the
proxy. Once the client has initiated the connection, it makes a vanilla HTTP
request, which might look something like this:
GET /index.html HTTP/1.1
Note that this request differs from the explicit proxy variation, in that it
omits the scheme and hostname. How, then, do we know which upstream host to
forward the request to? The routing mechanism that has performed the
redirection keeps track of the original destination. Each different routing
mechanism has its own ideosyncratic way of exposing this data, so this
introduces the second component required for working transparent proxying: a
host module that knows how to retrieve the original destination address from
the router. Once we have this information, the process is fairly
straight-forward.
1 |
The client makes a connection to the server. |
2 |
The router redirects the connection to mitmproxy, which is
typically listening on a local port of the same host. Mitmproxy
then consults the routing mechanism to establish what the original
destination was. |
3 |
Now, we simply read the client's request... |
4 |
... and forward it upstream. |
The process for transparently proxying an HTTPS request is a merger of the
methods we've outlined for transparently proxying HTTP, and explicitly proxying
HTTPS. We use the routing mechanism to establish the upstream server address,
and then proceed as for explit HTTPS connections to establish the CN and SANs,
and cope with SNI.
1 |
The client makes a connection to the server. |
2 |
The router redirects the connection to mitmproxy, which is
typically listening on a local port of the same host. Mitmproxy
then consults the routing mechanism to establish what the original
destination was. |
3 |
The client believes it's talking to the remote server, and
initiates the SSL connection. It uses SNI to indicate the hostname
it is connecting to. |
4 |
Mitmproxy connects to the server, and establishes an SSL
connection using the SNI hostname indicated by the client. |
5 |
The server responds with the matching SSL certificate, which
contains the CN and SAN values needed to generate the interception
certificate. |
6 |
Mitmproxy generates the interception cert, and continues the
client SSL handshake paused in step 3. |
7 |
The client sends the request over the established SSL
connection. |
7 |
Mitmproxy passes the request on to the server over the SSL
connection initiated in step 4. |