Spyderbyte
Spyderbyte
About
Mission
Design
Contact

SpyderByte Design: Tutorials and Writings

Understanding HTTP Will Help You:

  • Manually query web servers
  • Understand client browser and server interactions
  • Make better use of the web protocol
In a typical process, the browser interprets the URL

http://www.host.com:80

As an indication to use the Hypertext Transfer Protocol, contact a computer through the internet at www.host.com through port 80. The browser connects and then sends a request header to the server.The dissection of a Client Request Header is as follows:

Client Method & HTTP Version GET / HTTP/1.1
Request the document
Request Header Accept: image/gif, image/x-xbitmap, image/jpeg
Species type of accepted document
Accept-language: en-us
Specifies client's preferred language
Accept-encoding: gzip, compress
Specifies the client can interpret server responses encoded in the given formats
User-Agent: Mozilla/4.0
Client Identifies itself
Host:www.host.com
Client tells the server what it thinks the server's address is
Entity Header Content-type: application/x-www-form Specifies the clients preferred content type
Content-length: 23 Species the length of the document requested
General Header Connection: keep-alive Tells server to keep connection unless specifically told to close
Entity-Body Actual HTML Page

Now at the end of this reference tutorial, you'll be able to, in detail, understand all the nuances of that request header. Then the server responds with a Response Header. Such as:

HTTP Version
HTTP/1.1 200 OK
Tells the client which HTTP version to use
General Header
  • Date: Mon, 06 Dec 1999 20:54:26 GMT
  • Indicates current date on server
    Response Header Server: Apache/1.3.6 (Unix)
    Indicates to the client what software version the server is using
    Entity Header Last-modified: Fri, 16 Oct 1998 13:13:04 GMT
    Specifies the most recent modification of a document
    ETag: "2fd4fsg431"
    Provides a unique identity of each document on the server for the client
    Accept-Ranges: bytes
    Tells the client that the server can return subsections of a document and what method is preferred.
    Content-length:327
    Tells how large the entity-body is in bytes (327 bytes, in this example).
    Connection: close
    Indicates that the server will close the connection after the server's response (i.e. the server won't stay open).
    Content-type: text/html
    Tells the browser what knid of document is included in the entity-body
    Entity-Body The actual HTML web page

    Now all lines except line 12 are the actual response header. The response header leads the entity-body -- what all these headers have been prepping up for, which, in this example, is line 12. The entity-body is always a web page of some sort in HTTP.

    Client Methods

    Client methods are commands or requests issued by a client to a server. They either prep the server to what the client is demanding or create awareness of the client intentions. The eight most common client request methods are:
    1. GET -- retrieve a resource (document most likely) on the server
    2. HEAD -- retrieve information about the document, not the document itself
    3. POST -- client provides information, typically for database records
    4. PUT -- client provides a new document to the server
    5. DELETE -- removes a document from the server
    6. TRACE -- learn the path of a document through proxy servers
    7. OPTIONS -- server shows available methods for a document
    8. CONNECT -- client talks to a HTTPS server through a proxy server

    GET Method: Retrieve a Document

    The GET Method is issued by either of the following:

    • File accessible by the web server
    • Output of a CGI script or server extension (Apache modules, JSP, etc.)
    • Result of a server-side computation
    • Information obtained from hardware, like a webcam

    After the client uses the GET method, it responds with the status line, headers, and data requested by the client. Basically GET initiates the Client Request Header followed by he Server Response Header detailed above. You'll see the Client Request Headers and Server Response Headers combo typically following a GET method.

    HEAD Method: Retrieve header information (not the document)

    The HEAD method initiates a similar sequence as the GET method except that, of course, no entity-body is included. Just the headers without the entity-body for the sake of retrieving the following possible information:
    • Modification time of document. Used purely for document caches.
    • Size of document to estimate the time of downloading and entity-body (this is used whenever you see "Seconds Remaming"in a download status bar).
    • Document type to ensure the client gets a document type it can read
    • Server type for customized queries

    POST Method: Send data to server

    The Post Header allows the client to specify data to be sent, usually via form fields to a data-handling program, typically server-side with:

    • CGI
    • Netware server gateway
    • command-line interface
    • Server-side document annotation
    • Database operations and tasks

    Usually the POST header is only used when the network service service reassures interface with client and/or command-line interface.

    Jumping with the Entity Body: Frequently, the server receives the entity-body only if the client POST requests it, and then hops with it to another server for more processing.

    Content-Type: POST must have a content-type format observing the client entity-body, transforming the data into client variables and values to be processed. Data type examples are:
    MAXLENGTH=20, SIZE=25, and <INPUT NAME = "user">

    Steps of processing the POST method:

    1. Server recognizes POST method
    2. Processes URL
    3. Executes program tied to URL
    4. Pipes client entity-body to special program database
    5. CGI then interprets, decodes, processes, and releases the data

    URL-encoded GET format: With the GET method under the <FORM> tag, the request is:

    • GET /cgi-bin/creaet.pl?user=util-tester&pass1=1234&pass2=1234 HTTP /1.1
    • Host: examples.host.com.

    The actual variable/value pairs are included in the URL and are seperated by an appersand (&) and are defined with "=". ASCII has set values for special characters, too. Spaces for example aer defined by (+) or "%20". The CGI program then decodes the user values directly from the POST.

    Files uploaded with POST

    <form method="post" action="post.pl" encode-type=multi-part/form-data"> <input name-"the_file" type="file">

    The above code uploads a file in MIME message to the server server when the "submit" button is pressed by the client browser.

    PUT: Store a file on a server

    PUT /example.html HTTP/1.1 Connection: close Content-type: text/html

    That above code permanently stores the file on the host through port 80, typically after an authorization request.

    DELETE Method

    Ths simply deletes URLs (the opposite of the "PUT" using the following format:

    DELETE /images/logo2.gif HTTP/1.1 Host: hypothetical.ora.com

    The server then responds with:

    HTTP /1.0 200 OK
    Date: Fri, 04 Oct 1996 14:31:02 AM
    Content-length:21

    Almost always, as with the PUT method, an authorization request is required with the DELETE method. For security reasons, you can't have just any old Joe Shmoe uploading and deleting files from your web-server and site.

    TRACE Method

    This simply shows the programmer how a client's message is modified as it passes through a type of proxy-server. MAX-FORWARDS shows the number of interconnecting proxy servers. Typically, this is a few. It returns the clients HTTP headers as the entity-body with the contenty-type of the message HTTP. Each new server decrements the MAX-FORWARDS header so it always terminates at "0" and MAX-FORWARDS >0 always for any of the in-between proxy servers.

    OPTIONS Method

    The OPTIONS method is simply a client request for which methods the server allows. In the following code, "*"indicates in the entire server (or URL):

    OPTIONS * HTTP/1.1
    Server Response:
    HTTP/1.1 200 OK
    Public: GET, HEAD, POST, TRACE

    CONNECT Method

    This simply establishes a connection through a proxy serverto another server.


    Server Response Codes

    These are three digit status codes sent from the server to the client indicating certain states, errors, or openings.

    100-199 Informational, tellst he client to confirm or switch protocols
    100 -- Client can continue
    101 -- Switching protocol
    200-299 Client Requests
    200 -- Ok; Successful Request
    201 -- Status Code Created; A new URL always dishes out a status code to the "Location"header, this shows where the data was placed.
    202 -- Accepted Request; Request was received but not immediately acted upon.
    203 -- Non-Authoritative Information; always a local 3rd-party copy, not part of the process.
    204 -- No entity-body content; a header and status code are in the reponse, but not entity-body is included. Great for forms.
    205 -- Reset Content; Browser should clear the form, used specifically for forms.
    206 -- Partial Content; Server returned partial content, specified by the Range of the header situation.
    300-399 Redirected Requests -- Dealing with moved documents
    300 -- Multiple Sources;
    301 -- Moved Permanently; Document was permanently moved with a new "location" address.
    302 --Found; Status code used (similar to 307 -- Moved Temporarily)
    303 -- Other Retrieval; requested document should be acquired with GET command or at different URL
    305 -- Use proxy; the requested URL must be acquired through a proxy server specified with the "Location" header.
    400-499 Incomplete Requests From Client
    400 -- Bad Request; The server couldn't respond because of a syntax error from the client request.
    401 -- Unauthorized; The server responds with this code and the "www-authenticate"header for the client to respond with
    402 -- Payment required; still to be implemented
    403 -- Forbidden; Server doesn't want to specify why the document couldn't be found (could be lost, no authorization, a variety of reasons)
    404 -- Not Found; The server responds with this when the requested document at the specified URL does not exist.
    405 -- Method not allowed; The server responds with the "Allow" header, specifying which methods are allowed.
    406 -- Not acceptable; Server responds with Content-type headers specifying which media headers are allowed.
    409 -- Conflict; current request conflicts with an additional request or the server's configuration.
    413 -- Request too large; The server will not process the request because it is too large.
    500-599 Server Errors (CGI Errors)
    500 -- Internal Server Error; CGI program crashed
    501 -- Not implemented; The client requested action cannot be implemented. An unhelpful response, similar to 403 -- Forbidden. Ideally the server would respond with the specified reason, like 405 -- Method not allowed, or 406 -- not acceptable
    502 -- Bad Gateway; The server indicates that some of the helper, or proxy, servers had invalid responses.
    503 -- Service Unavailable; The requested service is unavailable, but typically is responded with a Retry-after command
    505 -- HTTP Version not supported.


    Headers

    • General Headers -- Indicate Date, Connection Maintained Methods and others.
    • Request Headers -- Client Request Headers; used only for client requests these convey the client's configuration and desired document.
    • Server Resonse Headers -- Server's response about server configurations
    • Entity Headers -- Used with POST/PUT Methods

    General Headers

    • Cache-Control: Specifies the behavior behind a caching system with request directives of no-cache, max-age=seconds, and response directives of public, private, no-cache, transform.
    • Connection: Species what to do with the connection to the server with close, keep-alive.
    • Date: Specifies the date timestamp in one of three formats. The preferred RFC 1123 time is: Mon, 06 Oct 2006 05:07:11 GMT. All dates are specified with GMT.
    • Pragma: This specifies proxy and gateway directives, specifically with no-cache.
    • Trailer: Specifies the trailer header in a chunked message.
    • Transfer-encoding: Specifies the encoding method, specifically with the chunked method.
    • Upgrade: With the upgrade header, the client can specify additional protocols it understands, to which the server can respond with HTTP/1.1 Upgrading protocols Upgrade: HTTP/1.2.
    • Via: This header is updated by proxy servers as messages are sent back and forth between client and server and vice-versa. This is especially useful for debugging purposes. The via header is almost always used with the "MAX-FORWARDS" header in the TRACE method.
    • Warning: The warning header simply responds with additional information in regards to the response header, like Response Stale and the like.

    Client Request Headers

    All of these headers or sent from the client to the server, usually specifying a preference about accepted image types or character sets, or clients give authorization codes through these request headers, but some client request headers simply showing the host with which the client believes its communicating.
    • Accept: specifies the media type the client prefers.  Examples are: text/*, image/gif.
    • Accept-charset: specifies the character set preferred by the client.
    • Accept-encoding: specifies the encoding mechanism preferred by the client. Examples are gzip, compress.
    • Authorization: This provides the clients authorization to access a specific data of a URL. Interestingly, the string of username:password is encoded in base 64 BASIC authorization scheme.
    • Cookies: This contains simply the name=value pair that has been sored for that URL.
    • From: This is used for e-mail and simply specifies the email of client's user.
    • If-modified-Since: Specifies a date and if the URL document has been modified by a certain date, the document is returned.
    • Host: This shows simply the name of the hostname and port contacted by the client to show what server the client thinks (and hopefully is) talking to. An example is www.hostname.com: 80.

    Server Response Headers

    The information communicated in server response headers communicate what the server is doing, server details, and responses from the server.
    • Accept-ranges: Displays the actual range requested type. For example: Accept-ranges: bytes.
    • Age: Species the age of a documentin seconds. For example: Age: 3421.
    • ETag: This displays the entity_tag of a given document. This can be used with the If-match and the If-none-match request headers.
    • Location: Specifies the new URL location of a document. Typically with the response code of 201 -- Created, or 301 -- Permanently moved.
    • Set-cookie: specifies name=value pairs and uses options such as expires=date, path=pathname, and domain=domain-name.
    • Vary: Specifies that the entity has multiple sources and may therefore vary according to specified list of headers. Examples are: Accept-language, Accept-encoding.

    Entity Headers

    • Allow: Shows which methods the server allows.
    • Expires: Specifies the date-time of when a document may change.

    URL Encodings

    This eliminates ambiguity for special characters with CGI programs like spaces, "!", and "&", for example. All the characters are covered in ASCII and CGI, but here's a short table.

    Character ASCII CGI
    Space 32 + or %20
    ! 33 %21
    @ 64 %40
    & 38 %26

    Client and Server Identification

    Clients send user-agent headers (optional), while servers send server headers. Some benefits of these are:
    • Servers can respond with customized content
    • Surveys and statistics of browsers can be assessed
    • Software that violates HTTP specifications can be tracked. When a server IDs itself, there's a small risk, if the user knows the type of server, it may be able to exploit a certain version. So some servers simply don't display the some of the server headers, but that may be excessively heightened security.

    Referring Documents

    Referring documents shows what page the client wsa on when it cicked on a link, so this is great for debugging sites or measuring where client users entered the site or how a specific page on the site was measured. For example:
    Server says:
    HTTP/1.1 200 OK
    Date: Tue 04 Nov 2007 5:10:55
    ETag: a34f2020
    Content-length: 3400
    And when the client clocks on the sales.html page the client sends this header:
    GET/sales.html HTTP/1.1
    Define: http://www.hostname.com/contact.html

    So it shows that the client user accessed the sales.html page from, while on, the contact.html page. Great for learning how users navigate your site and which pages typically link to other pages.

    Retrieving Content

    Some requests may omit "Content-length".
    The four ways for receiving data toa client from a server is:

    1. Reference the size from "content-length" header and read in that amount of bytes.
    2. If the size of the document is too dynamic, or simply not shown, the client receives data untill the HTTP/1.1 Connection:closed or Out-dated is reached.
    3. Another header, like "transfer-encoding:chunked" shows the client when the document ends.
    4. Byte Ranges: Accept-ranges: bytes Range:0-65536 (16 to the 4th) / 83000.
      The above code shows which partion is being sent (bytes 0-65536) and how large the total file is. So theoretically, the next part would be Range:65537-83000.

    Media Types

    Knowing the data type beforehand allows it to load up audio/imaging respectively.
    HTTP weaved in Internet Media types like MIME types. "Accept" tells the server what the client can accept (default is "all types"). Examples are:
    • Accept: */* (all media types accepted)
    • Accept: image/* (all image types accepted)
    • Accept: image/gif (only GIF accepted by clients)

    The server then responds with content-type headers corresponding to that specific media type requested by the client.

    Cookies

    Cookies are not part of the built-in HTTP specification, but are really connected means of transferring and storing data.
    Server's code of : "Set cookie:"
    The header is set to a cookie on the client's page. This updates the server CGI and sends the client a tailored/modified document (usually with username) specifying that process.
    When client visits again, it will see a cookie is needed: set-cookie: acct=0234; domain=host.com; expires=Sun, 11 Feb 2003.

    Authorization

    Used to request documents. Username:password is formatted under the BASIC base-64 scheme (Sometimes the "digest" encoding is used, too).

    Server might respond with a 401, or have a username:password dialog box.

    Persistent Connections

    The server responds with the code, Connection: keep-alive or close.
    HTTP 1.1 has the keep-alive as default so it's not necessary, but the "close" command is key for the client to send to the server or else the connection with the server will remained opened.

    Client Caching

    Clients cache data into a storage if on a proxy server.

    If-modified-since

    Header etimes a server response of:
    200 -- the document was modified
    304 -- not modified
    Servers can send last-modified headers with the document to let the client know when the last change was made to the header.

    Entity Tags

    Entity Tags are unique to each document -- even document doubles. So checking the last modified of entity tags is the best way to go to get the exact time each unique document was modified.


    Design

     

    About | Mission | Portfolio | Design | Contact

    Copyright ©2005 Spyderbyte Web Design • WebmasterContact Spyderbyte Web Design

    130 Palm Tree Lane, Montecito, California 93108 • 773-991-6391

    </