mirror of
https://github.com/Grasscutters/mitmproxy.git
synced 2024-12-02 12:01:17 +00:00
342 lines
12 KiB
HTML
342 lines
12 KiB
HTML
|
|
||
|
TODO:
|
||
|
|
||
|
- Clarify terminology: SSL vs TLS
|
||
|
|
||
|
|
||
|
Mitmproxy is an enormously flexible tool. Knowing exactly how the proxying
|
||
|
process works will help you deploy it more creatively, and let you understand
|
||
|
its fundamental assumptions and how to work around them. This document explains
|
||
|
mitmproxy's proxy mechanism by example, starting with the simplest explicit
|
||
|
proxy configuration, and working up to the most complicated interaction -
|
||
|
transparent proxying of SSL-protected traffic in the presence of SNI.
|
||
|
|
||
|
|
||
|
<div class="page-header">
|
||
|
<h1>Explicit HTTP</h1>
|
||
|
</div>
|
||
|
|
||
|
Configuring the client to use mitmproxy as an explicit proxy is the simplest
|
||
|
and most reliable way to intercept traffic. The proxy protocol is codified in
|
||
|
the [HTTP RFC](http://www.ietf.org/rfc/rfc2068.txt), so the behaviour of both
|
||
|
the client and the server is well defined, and usually reliable. In the
|
||
|
simplest possible interaction with mitmproxy, a client connects directly to the
|
||
|
proxy, and makes a request that looks like this:
|
||
|
|
||
|
<pre>GET http://example.com/index.html HTTP/1.1</pre>
|
||
|
|
||
|
This is a proxy GET request - an extended form of the vanilla HTTP GET request
|
||
|
that includes a schema and host specification, and it includes all the
|
||
|
information mitmproxy needs to proceed.
|
||
|
|
||
|
<img src="explicit.png"/>
|
||
|
|
||
|
<table class="table">
|
||
|
<tbody>
|
||
|
<tr>
|
||
|
|
||
|
<td><b>1</b></td>
|
||
|
|
||
|
<td>The client connects to the proxy and makes a request.</td>
|
||
|
|
||
|
</tr>
|
||
|
|
||
|
<tr>
|
||
|
|
||
|
<td><b>2</b></td>
|
||
|
|
||
|
<td>Mitmproxy connects to the upstream server and simply forwards
|
||
|
the request on.</td>
|
||
|
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
|
||
|
|
||
|
<div class="page-header">
|
||
|
<h1>Explicit HTTPS</h1>
|
||
|
</div>
|
||
|
|
||
|
The process for an explicitly proxied HTTPS connection is quite different. The
|
||
|
client connects to the proxy and makes a request that looks like this:
|
||
|
|
||
|
<pre>CONNECT example.com:443 HTTP/1.1</pre>
|
||
|
|
||
|
A conventional proxy can neither view nor manipulate an SSL-encrypted data
|
||
|
stream, so a CONNECT request simply asks the proxy to open a pipe between the
|
||
|
client and server. The proxy here is just a facilitator - it blindly forwards
|
||
|
data in both directions without knowing anything about the contents. The
|
||
|
negotiation of the SSL connection happens over this pipe, and the subsequent
|
||
|
flow of requests and responses are completely opaque to the proxy.
|
||
|
|
||
|
## The MITM in mitmproxy
|
||
|
|
||
|
This is where mitmproxy's fundamental trick comes in to play. The MITM in its
|
||
|
name stands for Man-In-The-Middle - a reference to the process we use to
|
||
|
intercept and interfere with these theoretially opaque data streams. The basic
|
||
|
idea is to pretend to be the server to the client, and pretend to be the client
|
||
|
to the server. The tricky part is that the Certificate Authority system is
|
||
|
designed to prevent exactly this attack, by allowing a trusted third-party to
|
||
|
cryptographically sign a server's SSL certificates to verify that the certs are
|
||
|
legit. If this signature is from a non-trusted party, a secure client will
|
||
|
simply drop the connection and refuse to proceed. Despite the many shortcomings
|
||
|
of the CA system as it exists today, this is usually fatal to attempts to MITM
|
||
|
an SSL connection for analysis.
|
||
|
|
||
|
Our answer to this conundrum is to become a trusted Certificate Authority
|
||
|
ourselves. Mitmproxy includes a full CA implementation that generates
|
||
|
interception certificates on the fly. To get the client to trust these
|
||
|
certificates, we register mitmproxy as a CA with the device manually.
|
||
|
|
||
|
## Complication 1: What's the remote hostname?
|
||
|
|
||
|
To proceed with this plan, we need to know the domain name to use in the
|
||
|
interception certificate - the client will verify that the certificate is for
|
||
|
the domain it's connecting to, and abort if this is not the case. At first
|
||
|
blush, it seems that the CONNECT request above gives us all we need - in this
|
||
|
example, both of these values are "example.com". But what if the client had
|
||
|
initiated the connection as follows:
|
||
|
|
||
|
<pre>CONNECT 10.1.1.1:443 HTTP/1.1</pre>
|
||
|
|
||
|
Using the IP address is perfectly legitimate because it gives us enough
|
||
|
information to initiate the pipe, even though it doesn't reveal the remote
|
||
|
hostname.
|
||
|
|
||
|
Mitmproxy has a cunning mechanism that smooths this over - upstream certificate
|
||
|
sniffing. As soon as we see the CONNECT request, we pause the client part of
|
||
|
the conversation, and initiate a simultaneous connection to the server. We
|
||
|
complete the SSL handshake with the server, and inspect the certificates it
|
||
|
used. Now, we use the Common Name in the upstream SSL certificates to generate
|
||
|
the dummy certificate for the client. Voila, we have the correct hostname to
|
||
|
present to the client, even if it was never specified.
|
||
|
|
||
|
|
||
|
## Complication 2: Subject Alternate Name
|
||
|
|
||
|
Enter the next complication. Sometimes, the certificate Common Name is not, in
|
||
|
fact, the hostname that the client is connecting to. This is because of the
|
||
|
optional Subject Alternate Name field in the SSL certificate that allows an
|
||
|
arbitrary number of alternate domains to be specified. If the expected domain
|
||
|
matches any of these, the client wil proceed, even though the domain doesn't
|
||
|
match the certificate Common Name. The answer here is simple: when extract the
|
||
|
CN from the upstream cert, we also extract the SANs, and add them to the
|
||
|
generated dummy certificate.
|
||
|
|
||
|
|
||
|
## Complication 3: Server Name Indication
|
||
|
|
||
|
One of the big limitations of conventional SSL is that each certificate
|
||
|
requires its own IP address. This means that you couldn't do virtual hosting
|
||
|
where multiple domains with independent certificates share the same IP address.
|
||
|
In a world with a rapidly shrinking IPv4 address pool this is a problem, and we
|
||
|
have a solution in the form of the Server Name Indication extension to the SSL
|
||
|
and TLS protocols. This lets the client specify the remote server name at the
|
||
|
start of the SSL handshake, which then lets the server select the right
|
||
|
certificate to complete the process.
|
||
|
|
||
|
SNI breaks our upstream certificate sniffing process, because when we connect
|
||
|
without using SNI, we get served a default certificate that may have nothing to
|
||
|
do with the certificate expected by the client. The solution is another tricky
|
||
|
complication to the client connection process. After the client connects, we
|
||
|
allow the SSL handshake to continue until just _after_ the SNI value has been
|
||
|
passed to us. Now we can pause the conversation, and initiate an upstream
|
||
|
connection using the correct SNI value, which then serves us the correct
|
||
|
upstream certificate, from which we can extract the expected CN and SANs.
|
||
|
|
||
|
|
||
|
## Putting it all together
|
||
|
|
||
|
Lets put all of this together into the complete explicitly proxied HTTPS flow.
|
||
|
|
||
|
<img src="explicit_https.png"/>
|
||
|
|
||
|
<table class="table">
|
||
|
<tbody>
|
||
|
<tr>
|
||
|
<td><b>1</b></td>
|
||
|
<td>The client makes a connection to mitmproxy, and issues an HTTP
|
||
|
CONNECT request.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>2</b></td>
|
||
|
|
||
|
<td>Mitmproxy responds with a 200 Connection Established, as if it
|
||
|
has set up the CONNECT pipe.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>3</b></td>
|
||
|
|
||
|
<td>The client believes it's talking to the remote server, and
|
||
|
initiates the SSL connection. It uses SNI to indicate the hostname
|
||
|
it is connecting to.</td>
|
||
|
</tr>
|
||
|
|
||
|
<tr>
|
||
|
<td><b>4</b></td>
|
||
|
|
||
|
<td>Mitmproxy connects to the server, and establishes an SSL
|
||
|
connection using the SNI hostname indicated by the client.</td>
|
||
|
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>5</b></td>
|
||
|
|
||
|
<td>The server responds with the matching SSL certificate, which
|
||
|
contains the CN and SAN values needed to generate the interception
|
||
|
certificate.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>6</b></td>
|
||
|
|
||
|
<td>Mitmproxy generates the interception cert, and continues the
|
||
|
client SSL handshake paused in step 3.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>7</b></td>
|
||
|
|
||
|
<td>The client sends the request over the established SSL
|
||
|
connection.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>7</b></td>
|
||
|
|
||
|
<td>Mitmproxy passes the request on to the server over the SSL
|
||
|
connection initiated in step 4.</td>
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
|
||
|
|
||
|
<div class="page-header">
|
||
|
<h1>Transparent HTTP</h1>
|
||
|
</div>
|
||
|
|
||
|
When a transparent proxy is used, the HTTP/S connection is redirected into a
|
||
|
proxy at the network layer, without any client configuration being required.
|
||
|
This makes transparent proxying ideal for those situations where you can't
|
||
|
change client behaviour - proxy-oblivious Android applications being a common
|
||
|
example.
|
||
|
|
||
|
To achieve this, we need to introduce two extra components. The first new
|
||
|
component is a router that transparently redirects the TCP connection to the
|
||
|
proxy. Once the client has initiated the connection, it makes a vanilla HTTP
|
||
|
request, which might look something like this:
|
||
|
|
||
|
<pre>GET /index.html HTTP/1.1</pre>
|
||
|
|
||
|
Note that this request differs from the explicit proxy variation, in that it
|
||
|
omits the scheme and hostname. How, then, do we know which upstream host to
|
||
|
forward the request to? The routing mechanism that has performed the
|
||
|
redirection keeps track of the original destination. Each different routing
|
||
|
mechanism has its own ideosyncratic way of exposing this data, so this
|
||
|
introduces the second component required for working transparent proxying: a
|
||
|
host module that knows how to retrieve the original destination address from
|
||
|
the router. Once we have this information, the process is fairly
|
||
|
straight-forward.
|
||
|
|
||
|
<img src="transparent.png"/>
|
||
|
|
||
|
|
||
|
<table class="table">
|
||
|
<tbody>
|
||
|
<tr>
|
||
|
<td><b>1</b></td>
|
||
|
<td>The client makes a connection to the server.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>2</b></td>
|
||
|
|
||
|
<td>The router redirects the connection to mitmproxy, which is
|
||
|
typically listening on a local port of the same host. Mitmproxy
|
||
|
then consults the routing mechanism to establish what the original
|
||
|
destination was.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>3</b></td>
|
||
|
|
||
|
<td>Now, we simply read the client's request...</td>
|
||
|
</tr>
|
||
|
|
||
|
<tr>
|
||
|
<td><b>4</b></td>
|
||
|
|
||
|
<td>... and forward it upstream.</td>
|
||
|
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
|
||
|
<div class="page-header">
|
||
|
<h1>Transparent HTTPS</h1>
|
||
|
</div>
|
||
|
|
||
|
The process for transparently proxying an HTTPS request is a merger of the
|
||
|
methods we've outlined for transparently proxying HTTP, and explicitly proxying
|
||
|
HTTPS. We use the routing mechanism to establish the upstream server address,
|
||
|
and then proceed as for explit HTTPS connections to establish the CN and SANs,
|
||
|
and cope with SNI.
|
||
|
|
||
|
<img src="transparent_https.png"/>
|
||
|
|
||
|
|
||
|
<table class="table">
|
||
|
<tbody>
|
||
|
<tr>
|
||
|
<td><b>1</b></td>
|
||
|
<td>The client makes a connection to the server.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>2</b></td>
|
||
|
|
||
|
<td>The router redirects the connection to mitmproxy, which is
|
||
|
typically listening on a local port of the same host. Mitmproxy
|
||
|
then consults the routing mechanism to establish what the original
|
||
|
destination was.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>3</b></td>
|
||
|
|
||
|
<td>The client believes it's talking to the remote server, and
|
||
|
initiates the SSL connection. It uses SNI to indicate the hostname
|
||
|
it is connecting to.</td>
|
||
|
</tr>
|
||
|
|
||
|
<tr>
|
||
|
<td><b>4</b></td>
|
||
|
|
||
|
<td>Mitmproxy connects to the server, and establishes an SSL
|
||
|
connection using the SNI hostname indicated by the client.</td>
|
||
|
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>5</b></td>
|
||
|
|
||
|
<td>The server responds with the matching SSL certificate, which
|
||
|
contains the CN and SAN values needed to generate the interception
|
||
|
certificate.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>6</b></td>
|
||
|
|
||
|
<td>Mitmproxy generates the interception cert, and continues the
|
||
|
client SSL handshake paused in step 3.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>7</b></td>
|
||
|
|
||
|
<td>The client sends the request over the established SSL
|
||
|
connection.</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td><b>7</b></td>
|
||
|
|
||
|
<td>Mitmproxy passes the request on to the server over the SSL
|
||
|
connection initiated in step 4.</td>
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
|
||
|
|
||
|
|