Understanding HTTP Will Help You:
- Manually query web servers
- Understand client browser and server interactions
- Make better use of the web protocol
In a typical process, the browser interprets the URL
http://www.host.com:80
As an indication to use the Hypertext Transfer Protocol, contact a computer through the internet at www.host.com through port 80. The browser connects and then sends a request header to the server.The dissection of a Client Request Header is as follows:
| Client Method & HTTP Version |
GET / HTTP/1.1
|
Request the document |
| Request Header |
Accept: image/gif, image/x-xbitmap, image/jpeg
|
Species type of accepted document |
Accept-language: en-us
|
Specifies client's preferred language |
Accept-encoding: gzip, compress
|
Specifies the client can interpret server responses encoded in the given formats |
User-Agent: Mozilla/4.0
|
Client Identifies itself |
Host:www.host.com
|
Client tells the server what it thinks the server's address is |
| Entity Header |
Content-type: application/x-www-form |
Specifies the clients preferred content type |
| Content-length: 23 |
Species the length of the document requested |
| General Header |
Connection: keep-alive |
Tells server to keep connection unless specifically told to close |
| Entity-Body |
Actual HTML Page |
|
Now at the end of this reference tutorial, you'll be able to, in detail, understand all the nuances of that request header. Then the server responds with a Response Header. Such as:
| HTTP Version |
HTTP/1.1 200 OK |
Tells the client which HTTP version to use |
| General Header |
Date: Mon, 06 Dec 1999 20:54:26 GMT
|
Indicates current date on server |
| Response Header |
Server: Apache/1.3.6 (Unix)
|
Indicates to the client what software version the server is using |
| Entity Header |
Last-modified: Fri, 16 Oct 1998 13:13:04 GMT
|
Specifies the most recent modification of a document |
ETag: "2fd4fsg431"
|
Provides a unique identity of each document on the server for the client |
Accept-Ranges: bytes
|
Tells the client that the server can return subsections of a document and what method is preferred. |
Content-length:327
|
Tells how large the entity-body is in bytes (327 bytes, in this example). |
Connection: close
|
Indicates that the server will close the connection after the server's response (i.e. the server won't stay open). |
Content-type: text/html
|
Tells the browser what knid of document is included in the entity-body |
| Entity-Body |
The actual HTML web page |
|
Now all lines except line 12 are the actual response header. The response header leads the entity-body -- what all these headers have been prepping up for, which, in this example, is line 12. The entity-body is always a web page of some sort in HTTP.
Client Methods
Client methods are commands or requests issued by a client to a server. They either prep the server to what the client is demanding or create awareness of the client intentions. The eight most common client request methods are:
- GET -- retrieve a resource (document most likely) on the server
- HEAD -- retrieve information about the document, not the document itself
- POST -- client provides information, typically for database records
- PUT -- client provides a new document to the server
- DELETE -- removes a document from the server
- TRACE -- learn the path of a document through proxy servers
- OPTIONS -- server shows available methods for a document
- CONNECT -- client talks to a HTTPS server through a proxy server
GET Method: Retrieve a Document
The GET Method is issued by either of the following:
- File accessible by the web server
- Output of a CGI script or server extension (Apache modules, JSP, etc.)
- Result of a server-side computation
- Information obtained from hardware, like a webcam
After the client uses the GET method, it responds with the status line, headers, and data requested by the client. Basically GET initiates the Client Request Header followed by he Server Response Header detailed above. You'll see the Client Request Headers and Server Response Headers combo typically following a GET method.
HEAD Method: Retrieve header information (not the document)
The HEAD method initiates a similar sequence as the GET method except that, of course, no entity-body is included. Just the headers without the entity-body for the sake of retrieving the following possible information:
- Modification time of document. Used purely for document caches.
- Size of document to estimate the time of downloading and entity-body (this is used whenever you see "Seconds Remaming"in a download status bar).
- Document type to ensure the client gets a document type it can read
- Server type for customized queries
POST Method: Send data to server
The Post Header allows the client to specify data to be sent, usually via form fields to a data-handling program, typically server-side with:
- CGI
- Netware server gateway
- command-line interface
- Server-side document annotation
- Database operations and tasks
Usually the POST header is only used when the network service service reassures interface with client and/or command-line interface.
Jumping with the Entity Body: Frequently, the server receives the entity-body only if the client POST requests it, and then hops with it to another server for more processing.
Content-Type: POST must have a content-type format observing the client entity-body, transforming the data into client variables and values to be processed. Data type examples are:
MAXLENGTH=20, SIZE=25, and <INPUT NAME = "user">
Steps of processing the POST method:
- Server recognizes POST method
- Processes URL
- Executes program tied to URL
- Pipes client entity-body to special program database
- CGI then interprets, decodes, processes, and releases the data
URL-encoded GET format: With the GET method under the <FORM> tag, the request is:
- GET /cgi-bin/creaet.pl?user=util-tester&pass1=1234&pass2=1234 HTTP /1.1
- Host: examples.host.com.
The actual variable/value pairs are included in the URL and are seperated by an appersand (&) and are defined with "=". ASCII has set values for special characters, too. Spaces for example aer defined by (+) or "%20". The CGI program then decodes the user values directly from the POST.
Files uploaded with POST
<form method="post" action="post.pl" encode-type=multi-part/form-data"> <input name-"the_file" type="file">
The above code uploads a file in MIME message to the server server when the "submit" button is pressed by the client browser.
PUT: Store a file on a server
PUT /example.html HTTP/1.1 Connection: close Content-type: text/html
That above code permanently stores the file on the host through port 80, typically after an authorization request.
DELETE Method
Ths simply deletes URLs (the opposite of the "PUT" using the following format:
DELETE /images/logo2.gif HTTP/1.1 Host: hypothetical.ora.com
The server then responds with:
HTTP /1.0 200 OK
Date: Fri, 04 Oct 1996 14:31:02 AM
Content-length:21
Almost always, as with the PUT method, an authorization request is required with the DELETE method. For security reasons, you can't have just any old Joe Shmoe uploading and deleting files from your web-server and site.
TRACE Method
This simply shows the programmer how a client's message is modified as it passes through a type of proxy-server. MAX-FORWARDS shows the number of interconnecting proxy servers. Typically, this is a few. It returns the clients HTTP headers as the entity-body with the contenty-type of the message HTTP. Each new server decrements the MAX-FORWARDS header so it always terminates at "0" and MAX-FORWARDS >0 always for any of the in-between proxy servers.
OPTIONS Method
The OPTIONS method is simply a client request for which methods the server allows. In the following code, "*"indicates in the entire server (or URL):
OPTIONS * HTTP/1.1
Server Response:
HTTP/1.1 200 OK
Public: GET, HEAD, POST, TRACE
CONNECT Method
This simply establishes a connection through a proxy serverto another server.
Server Response Codes
These are three digit status codes sent from the server to the client indicating certain states, errors, or openings.
| 100-199 |
Informational, tellst he client to confirm or switch protocols |
|
100 -- Client can continue
101 -- Switching protocol |
| 200-299 |
Client Requests |
|
200 -- Ok; Successful Request
201 -- Status Code Created; A new URL always dishes out a status code to the "Location"header, this shows where the data was placed.
202 -- Accepted Request; Request was received but not immediately acted upon.
203 -- Non-Authoritative Information; always a local 3rd-party copy, not part of the process.
204 -- No entity-body content; a header and status code are in the reponse, but not entity-body is included. Great for forms.
205 -- Reset Content; Browser should clear the form, used specifically for forms.
206 -- Partial Content; Server returned partial content, specified by the Range of the header situation. |
| 300-399 |
Redirected Requests -- Dealing with moved documents |
|
300 -- Multiple Sources;
301 -- Moved Permanently; Document was permanently moved with a new "location" address.
302 --Found; Status code used (similar to 307 -- Moved Temporarily)
303 -- Other Retrieval; requested document should be acquired with GET command or at different URL
305 -- Use proxy; the requested URL must be acquired through a proxy server specified with the "Location" header. |
| 400-499 |
Incomplete Requests From Client |
|
400 -- Bad Request; The server couldn't respond because of a syntax error from the client request.
401 -- Unauthorized; The server responds with this code and the "www-authenticate"header for the client to respond with
402 -- Payment required; still to be implemented
403 -- Forbidden; Server doesn't want to specify why the document couldn't be found (could be lost, no authorization, a variety of reasons)
404 -- Not Found; The server responds with this when the requested document at the specified URL does not exist.
405 -- Method not allowed; The server responds with the "Allow" header, specifying which methods are allowed.
406 -- Not acceptable; Server responds with Content-type headers specifying which media headers are allowed.
409 -- Conflict; current request conflicts with an additional request or the server's configuration.
413 -- Request too large; The server will not process the request because it is too large.
|
| 500-599 |
Server Errors (CGI Errors) |
|
500 -- Internal Server Error; CGI program crashed
501 -- Not implemented; The client requested action cannot be implemented. An unhelpful response, similar to 403 -- Forbidden. Ideally the server would respond with the specified reason, like 405 -- Method not allowed, or 406 -- not acceptable
502 -- Bad Gateway; The server indicates that some of the helper, or proxy, servers had invalid responses.
503 -- Service Unavailable; The requested service is unavailable, but typically is responded with a Retry-after command
505 -- HTTP Version not supported.
|
Headers
- General Headers -- Indicate Date, Connection Maintained Methods and others.
- Request Headers -- Client Request Headers; used only for client requests these convey the client's configuration and desired document.
- Server Resonse Headers -- Server's response about server configurations
- Entity Headers -- Used with POST/PUT Methods
General Headers
- Cache-Control: Specifies the behavior behind a caching system with request directives of no-cache, max-age=seconds, and response directives of public, private, no-cache, transform.
- Connection: Species what to do with the connection to the server with close, keep-alive.
- Date: Specifies the date timestamp in one of three formats. The preferred RFC 1123 time is: Mon, 06 Oct 2006 05:07:11 GMT. All dates are specified with GMT.
- Pragma: This specifies proxy and gateway directives, specifically with no-cache.
- Trailer: Specifies the trailer header in a chunked message.
- Transfer-encoding: Specifies the encoding method, specifically with the chunked method.
- Upgrade: With the upgrade header, the client can specify additional protocols it understands, to which the server can respond with HTTP/1.1 Upgrading protocols Upgrade: HTTP/1.2.
- Via: This header is updated by proxy servers as messages are sent back and forth between client and server and vice-versa. This is especially useful for debugging purposes. The via header is almost always used with the "MAX-FORWARDS" header in the TRACE method.
- Warning: The warning header simply responds with additional information in regards to the response header, like Response Stale and the like.
Client Request Headers
All of these headers or sent from the client to the server, usually specifying a preference about accepted image types or character sets, or clients give authorization codes through these request headers, but some client request headers simply showing the host with which the client believes its communicating.
- Accept: specifies the media type the client prefers. Examples are: text/*, image/gif.
- Accept-charset: specifies the character set preferred by the client.
- Accept-encoding: specifies the encoding mechanism preferred by the client. Examples are gzip, compress.
- Authorization: This provides the clients authorization to access a specific data of a URL. Interestingly, the string of username:password is encoded in base 64 BASIC authorization scheme.
- Cookies: This contains simply the name=value pair that has been sored for that URL.
- From: This is used for e-mail and simply specifies the email of client's user.
- If-modified-Since: Specifies a date and if the URL document has been modified by a certain date, the document is returned.
- Host: This shows simply the name of the hostname and port contacted by the client to show what server the client thinks (and hopefully is) talking to. An example is www.hostname.com: 80.
Server Response Headers
The information communicated in server response headers communicate what the server is doing, server details, and responses from the server.
- Accept-ranges: Displays the actual range requested type. For example: Accept-ranges: bytes.
- Age: Species the age of a documentin seconds. For example: Age: 3421.
- ETag: This displays the entity_tag of a given document. This can be used with the If-match and the If-none-match request headers.
- Location: Specifies the new URL location of a document. Typically with the response code of 201 -- Created, or 301 -- Permanently moved.
- Set-cookie: specifies name=value pairs and uses options such as expires=date, path=pathname, and domain=domain-name.
- Vary: Specifies that the entity has multiple sources and may therefore vary according to specified list of headers. Examples are: Accept-language, Accept-encoding.
Entity Headers
- Allow: Shows which methods the server allows.
- Expires: Specifies the date-time of when a document may change.
URL Encodings
This eliminates ambiguity for special characters with CGI programs like spaces, "!", and "&", for example. All the characters are covered in ASCII and CGI, but here's a short table.
| Character |
ASCII |
CGI |
| Space |
32 |
+ or %20 |
| ! |
33 |
%21 |
| @ |
64 |
%40 |
| & |
38 |
%26 |
Client and Server Identification
Clients send user-agent headers (optional), while servers send server headers. Some benefits of these are:
- Servers can respond with customized content
- Surveys and statistics of browsers can be assessed
- Software that violates HTTP specifications can be tracked. When a server IDs itself, there's a small risk, if the user knows the type of server, it may be able to exploit a certain version. So some servers simply don't display the some of the server headers, but that may be excessively heightened security.
Referring Documents
Referring documents shows what page the client wsa on when it cicked on a link, so this is great for debugging sites or measuring where client users entered the site or how a specific page on the site was measured. For example:
Server says:
HTTP/1.1 200 OK
Date: Tue 04 Nov 2007 5:10:55
ETag: a34f2020
Content-length: 3400
And when the client clocks on the sales.html page the client sends this header:
GET/sales.html HTTP/1.1
Define: http://www.hostname.com/contact.html
So it shows that the client user accessed the sales.html page from, while on, the contact.html page. Great for learning how users navigate your site and which pages typically link to other pages.
Retrieving Content
Some requests may omit "Content-length".
The four ways for receiving data toa client from a server is:
- Reference the size from "content-length" header and read in that amount of bytes.
- If the size of the document is too dynamic, or simply not shown, the client receives data untill the HTTP/1.1 Connection:closed or Out-dated is reached.
- Another header, like "transfer-encoding:chunked" shows the client when the document ends.
- Byte Ranges: Accept-ranges: bytes Range:0-65536 (16 to the 4th) / 83000.
The above code shows which partion is being sent (bytes 0-65536) and how large the total file is. So theoretically, the next part would be Range:65537-83000.
Media Types
Knowing the data type beforehand allows it to load up audio/imaging respectively.
HTTP weaved in Internet Media types like MIME types. "Accept" tells the server what the client can accept (default is "all types"). Examples are:
- Accept: */* (all media types accepted)
- Accept: image/* (all image types accepted)
- Accept: image/gif (only GIF accepted by clients)
The server then responds with content-type headers corresponding to that specific media type requested by the client.
Cookies
Cookies are not part of the built-in HTTP specification, but are really connected means of transferring and storing data.
Server's code of :
"Set cookie:"
The header is set to a cookie on the client's page. This updates the server CGI and sends the client a tailored/modified document (usually with username) specifying that process.
When client visits again, it will see a cookie is needed:
set-cookie: acct=0234; domain=host.com; expires=Sun, 11 Feb 2003.
Authorization
Used to request documents. Username:password is formatted under the BASIC base-64 scheme (Sometimes the "digest" encoding is used, too).
Server might respond with a 401, or have a username:password dialog box.
Persistent Connections
The server responds with the code,
Connection: keep-alive or
close.
HTTP 1.1 has the keep-alive as default so it's not necessary, but the "close" command is key for the client to send to the server or else the connection with the server will remained opened.
Client Caching
Clients cache data into a storage if on a proxy server.
If-modified-since
Header etimes a server response of:
200 -- the document was modified
304 -- not modified
Servers can send last-modified headers with the document to let the client know when the last change was made to the header.
Entity Tags
Entity Tags are unique to each document -- even document doubles. So checking the last modified of entity tags is the best way to go to get the exact time each unique document was modified.