HTTP persistent connection

From Wikipedia,

HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using the same TCPconnection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.

Schema of multiple vs. persistent connection.

Contents

Operation

Under HTTP 1.0, there is no official specification for how keepalive operates. It was, in essence, tacked on to an existing protocol. If the browser supports keep-alive, it adds an additional header to the request:

Connection: Keep-Alive

Then, when the server receives this request and generates a response, it also adds a header to the response:

Connection: Keep-Alive

Following this, the connection is NOT dropped, but is instead kept open. When the client sends another request, it uses the same connection. This will continue until either the client or the server decides that the conversation is over, and one of them drops the connection.

In HTTP 1.1 all connections are considered persistent unless declared otherwise.[1] The HTTP persistent connections do not use separate keepalive messages, they just allow multiple requests to use a single connection. However, the default connection timeout of Apache 2.0 httpd is as little as 15 seconds and for Apache 2.2 only 5 seconds. The advantage of a short timeout is the ability to deliver multiple components of a web page quickly while not tying up multiple server processes or threads for too long.

Advantages

According to RFC 2616 (page 46), a single-user client should not maintain more than 2 connections with any server or proxy. A proxy should use up to 2×Nconnections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times, avoid congestion. If HTTP pipelining is correctly implemented, there is no performance benefit to be gained from additional connections (while additional connections may cause issues with congestion).[6]

Disadvantages

It has been suggested with modern widespread high-bandwidth connections, Keep-Alive might not be as useful as it once was. The webserver will keep a connection open for a certain number of seconds (By default 15 in Apache), which may hurt performance more than the total performance benefits.

For services where single documents are regularly requested ( for example image hosting websites ), Keep-Alive can be massively detrimental to performance due to keeping unnecessary connections open for many seconds after the document was retrieved.

Keep-Alive can cause unexpected behavior if a browser is configured to use a proxy. If a browser establishes a persistent connection to a proxy, it can then send HTTP requests for different hosts over the same connection. If a rudimentary proxy then establishes a persistent connection to a remote server, it may accidentally send it HTTP requests intended for another server. [8]

Use in web browsers

Netscape Navigator (since at least 4.05) and Internet Explorer (since at least 4.01) support persistent connections to Web servers and proxies.

Netscape does not close persistent connections using timeout. All idling persistent connections are queued. When there is a need to open new persistent connections while connecting to a different server, the idle connections are killed by the browser using some form of LRU algorithm.[9]

Internet Explorer supports persistent connections. By default, versions 6 and 7 use 2 persistent connections while version 8 uses 6.[10] Persistent connections time out after 60 seconds of inactivity which is changeable via the Windows Registry.[11]

Mozilla Firefox supports persistent connections. The number of simultaneous connections can be customized (per-server, per-proxy, total). Persistent connections time out after 115 seconds (1.92 minutes) of inactivity which is changeable via the configuration.[12]

Opera supports persistent connections since 4.0.[13] The number of simultaneous connections can be customized (per-server, total).

 

关于KeepAlive的分析
现在的一些服务器都可以设置KeepAlive是否开启,以及KeepAlive的超时时间,服务器支持的KeepAlive数量(数量一般不会很大,否则会对服务器产生很大的压力)。
那么我们考虑3种情况:
1、用户浏览一个网页时,除了网页本身外,还引用了多个 javascript 文件,多个 css 文件,多个图片文件,并且这些文件都在同一个 HTTP 服务器上。
2、用户浏览一个网页时,除了网页本身外,还引用一个 javascript 文件,一个图片文件。
3、用户浏览的是一个动态网页,由程序即时生成内容,并且不引用其他内容。
对于上面3中情况,1 最适合打开 KeepAlive ,2 随意,3 最适合关闭 KeepAlive
打 开 KeepAlive 后,意味着每次用户完成全部访问后,都要保持一定时间后才关闭会关闭 TCP 连接,那么在关闭连接之前,必然会有一个服务器进程对应于该用户而不能处理其他用户,假设 KeepAlive 的超时时间为 10 秒种,服务器每秒处理 50 个独立用户访问,那么系统中 Apache 的总进程数就是 10 * 50 = 500 个,如果一个进程占用 4M 内存,那么总共会消耗 2G 内存,所以可以看出,在这种配置中,相当消耗内存,但好处是系统只处理了 50次 TCP 的握手和关闭操作。
如果关闭 KeepAlive,如果还是每秒50个用户访问,如果用户每次连续的请求数为3个,那么 Apache 的总进程数就是 50 * 3 = 150 个,如果还是每个进程占用 4M 内存,那么总的内存消耗为 600M,这种配置能节省大量内存,但是,系统处理了 150 次 TCP 的握手和关闭的操作,因此又会多消耗一些 CPU 资源。