HTTP

Updated: 2018-12-15

HTTP/3 vs HTTP/2

TL;DR: from TCP to UDP.

  • HTTP/3 is on top of QUIC / UDP

    • no need multi round handshakes
    • does not include error correction
    • does not exist in an insecure or unencrypted version
    • based on Google's work on QUIC
  • HTTP/2 is on top of TCP

    • multiple handshakes
    • TCP is like a "data pipe", or stream, it does not understand the data it transmits, so additional security is provided by TLS/SSL
    • based on Googles SPDY
  • backwards compatible with HTTP/1.1

HTTP/2 vs WebSocket

https://blog.sessionstack.com/how-javascript-works-deep-dive-into-websockets-and-http-2-with-sse-how-to-pick-the-right-path-584e6b8e3bf7

  • Websocket: Say you want to build a Massive Multiplayer Online Game that needs a huge amount of messages from both ends of the connection. In such a case, WebSockets will perform much, much better. In general, use WebSockets whenever you need a truly low-latency, near realtime connection between the client and the server.
  • HTTP/2: If your use case requires displaying real-time market news, market data, chat applications, etc., relying on HTTP/2 + SSE will provide you with an efficient bidirectional communication channel while reaping the benefits from staying in the HTTP world.

HTTP/2 vs HTTP 1.1

http://http2.github.io/

HTTP practically only allows one outstanding request per TCP connection. In the past, browsers have used multiple TCP connections to issue parallel requests

"One major goal is to allow the use of a single connection from browsers to a Web site."

  • Header Compression: HTTP Header size will be greatly reduced
  • Single Connection. Only one connection to the server is used to load a website, and that connection remains open as long as the website is open. This reduces the number of round trips needed to set up multiple TCP connections.
  • Multiplexing. Multiple requests are allowed at the same time, on the same connection. Previously, with HTTP/1.1, each transfer would have to wait for other transfers to complete. HTTP/2 is especially useful when dealing with TLS connections. The TLS handshake can be quite long but thanks to reduced latency and multiplexing, other requests can do their work without being blocked.
  • HTTP/2 Server Push: Resources can be pushed to the client before they are requested
  • Prioritization: Resources can have dependency levels allowing the server to prioritize which requests to fulfill first
  • Binary: HTTP/2 is a binary protocol making it a lot more efficient when transferring data

Deprecated Techniques

With HTTP/1.1, many techniques were used to speed up websites that are no longer necessary with HTTP/2.

  • Domain Sharding. Loading files from multiple subdomains so that more connections may be established. The increase in parallel file transfers adds to server connection overhead.
  • Image Sprites. Combining image files to reduce requests. The file must be loaded before any image from the file can be shown, and the large image file ties up RAM.
  • Combining Files. CSS and JavaScript files are often combined to reduce the number of requests. This makes the user wait for files before any of it can run and consumes additional RAM.
  • Inlining. CSS and JavaScript code, or even images, are placed directly into the HTML, reducing connections but using additional RAM and delays page rendering until the HTML is finished downloading.
  • Cookieless Domains. Static resources like images, CSS and JavaScript files don’t require cookies, so many developers started sending these from a cookieless domain to save bandwidth and time. With HTTP/2, the headers (including cookies) are compressed, so the sizes of the requests are very small in comparison with HTTP/1.1.
  • dealing with REST APIs, you will no longer have to batch requests.
  • Many of the techniques mentioned above by developers placed additional strain on servers due to extra connections opened by browsers. These connection-related techniques are no longer necessary with HTTP/2. The result is lower bandwidth requirements, less network overhead and lower server memory usage.
  • On mobile phones, multiple TCP connections could cause issues with the mobile network, causing them to drop packets and resubmit requests. The additional requests just added to the server load.
  • HTTP/2 itself brings benefits for a server, as well. Fewer TCP connections are necessary, as stated above. HTTP/2 is easier to parse, more compact and less error-prone.

Data Push Illustration

(Image from this blogpost)

POST vs. PUT vs. PATCH

POST: not idempotent, POST for multiple times will add multiple records PUT: idempotent, PUT multiple time the result will be the same

PUT: update full representation PATCH: modify part of data, PATCH is neither safe nor idempotent.

HTTP Request

A GET Example

GET http://example.com/foo/bar
Host: x
Accept: application/xml

A POST Example:

POST /path/script.cgi HTTP/1.0
From: [email protected]
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 32

key1=val1&key2=val2

Accept vs Content-Type

  • Accept: specify the media type of the response content it is expecting
  • Content-Type: specify the media type of request entity-body being sent from the client to the server

e.g.

"please give me a json, not xml, if possible"

Accept: application/json

"the entity attached is a json, please use the correct parser when you(server) receive it"

Content-Type: application/json

Return Code

Use 400 if the request parameters are wrong. Use 412 if one of the If-* request headers like If-Match, If-Modified-Since, etc are wrong.

Why? That's just what RFC says. See for example this extract of If-Match specification:

If none of the entity tags match, or if "*" is given and no current entity exists, the server MUST NOT perform the requested method, and MUST return a 412 (Precondition Failed) response. This behavior is most useful when the client wants to prevent an updating method, such as PUT, from modifying a resource that has changed since the client last retrieved it.

"basic authentication" vs "form-based authentication"

Basic authentication, or “basic auth” is formally defined in the Hypertext Transfer Protocol standard, RFC 1954. When a client (your browser) connects to a web server, it sends a “WWW-Authenticate: Basic” message in the HTTP header. Shortly after that, it sends your login credentials to the server using a mild obfuscation technique called base64 encoding. When HTTPS is used, these credentials are protected, so it’s not considered insecure, which is why basic auth gained widespread use over the years. The biggest problem with basic auth has to do with the logging off the server, as most browsers tend to cache sessions and have inconsistently dealt with the need to properly close and clear connection states (or sessions) so that another (different) user couldn’t log back in by refreshing the browser.

Form-based authentication is not formalized by any RFC. In essence, it is a programmatic method of authentication that developers create to mitigate the downside of basic auth. Most implementations of form-based authentication share the following characteristics:

  1. They don’t use the formal HTTP authentication techniques (basic or digest).

  2. They use the standard HTML form fields to pass the username and password values to the server.

  3. The server validates the credentials and then creates a “session” that is tied to a unique key that is passed between the client and server on each http put and get request.

  4. When the user clicks “log off” or the server logs the user off (for example after certain idle time), the server will invalidate the session key, which makes any subsequent communication between the client and server require re-validation (resubmission of login credentials via the form) in order to establish a new session key.

As with basic auth, form-based auth does not protect login credentials when connected over HTTP, therefore it is not more “secure” than basic auth in how it handles user credentials. It is however more secure when it comes to properly logging the user off after a certain period of inactivity or if the user no longer requires use of the system and decides to log out.

Steps in a HTTP Request

  • DNS Lookup: URL->IP
  • TCP Connection(socket)
  • Send HTTP Request
  • Server Response

Basic authentication

HTTP Basic authentication (BA) implementation is the simplest technique for enforcing access controls to web resources because it doesn't require cookies, session identifiers, or login pages; rather, HTTP Basic authentication uses standard fields in the HTTP header, obviating the need for handshakes.

They are merely encoded with Base64 in transit, but not encrypted or hashed in any way.

Because the BA field has to be sent in the header of each HTTP request, the web browser needs to cache credentials for a reasonable period of time to avoid constantly prompting the user for their username and password.

HTTP does not provide a method for a web server to instruct the client to "log out" the user.

HTTP supports the use of several authentication mechanisms to control access to pages and other resources. These mechanisms are all based around the use of the 401 status code and the WWW-Authenticate response header.

The most widely used HTTP authentication mechanisms are:

Basic

The client sends the user name and password as unencrypted base64 encoded text. It should only be used with HTTPS, as the password can be easily captured and reused over HTTP.

Digest

The client sends a hashed form of the password to the server. Although, the password cannot be captured over HTTP, it may be possible to replay requests using the hashed password.

NTLM

This uses a secure challenge/response mechanism that prevents password capture or replay attacks over HTTP. However, the authentication is per connection and will only work with HTTP/1.1 persistent connections. For this reason, it may not work through all HTTP proxies and can introduce large numbers of network roundtrips if connections are regularly closed by the web server.

Basic Authentication

If an HTTP receives an anonymous request for a protected resource it can force the use of Basic authentication by rejecting the request with a 401 (Access Denied) status code and setting the WWW-Authenticate response header as shown below:

HTTP/1.1 401 Access Denied WWW-Authenticate: Basic realm="My Server" Content-Length: 0

The word Basic in the WWW-Authenticate selects the authentication mechanism that the HTTP client must use to access the resource. The realm string can be set to any value to identify the secure area and may used by HTTP clients to manage passwords.

Most web browsers will display a login dialog when this response is received, allowing the user to enter a username and password. This information is then used to retry the request with anAuthorization request header:

GET /securefiles/ HTTP/1.1
Host: www.httpwatch.com
Authorization: Basic aHR0cHdhdGNoOmY=

The Authorization specifies the authentication mechanism (in this case Basic) followed by the username and password. Although, the string aHR0cHdhdGNoOmY= may look encrypted it is simply a base64 encoded version of <username>:<password>. In this example, the un-encoded string "httpwatch:foo" was used and would be readily available to anyone who could intercept the HTTP request.