This notes is for learning/educational purpose only. Use it at your own risks.
THE HTTP PROTOCOL
The HTTP Protocol is connectionless protocol. Client sends HTTP request to web server, then gets HTTP response. No session is formed, nothing is remembered.
The HTTP Request
The first line of every HTTP request consists of three items, separated by spaces.
- Verb indicating HTTP method i.e. GET, POST, PUT, DELETE, OPTIONS, HEAD, TRACE, CONNECT
- Requested URL for a particular resource
- HTTP Version being used i.e. HTTP/1.0, HTTP/1.1 etc.
Other lines in HEAD part of HTTP request contains other HTTP headers like Host, Referer, User-Agent, Cookie, Connection, Accept, Accept-Encoding etc.
- The ‘Host’ header – mandatory in HTTP/1.1. It specifies hostname. Essential when multiple hosts run on same IP like in case of virtual hosting.
- The ‘User-Agent’ header – provides info about browser or client software which generates request.
- The ‘Referer’ header – contains URL from which request has been originated.
- The ‘Cookie’ header – contains parameters (name value pairs) that server has issued to client.
- The ‘Accept’ header – tells server what content client can accept i.e. image types, document formats etc.
- The ‘Accept-Encoding’ header – tells server what content encoding client can accept.
- The ‘Authorization’ header – submits credentials to server for HTTP authentication types.
- The ‘If-Modified-Since’ header – tells when client last received requested resource. If resource is not modified since then client can use cached copy, using response status code 304.
- The ‘If-None-Match’ header – specifies entity tag, an identifier denoting content of body. The client submits entity tag to server. Server determines with entity tag if client can use cached copy of resource.
- The ‘Origin’ header – used in cross domain AJAX requests to indicate the domain from which request has originated.
The HTTP Response
The first line contains HTTP version, HTTP response status code, textual ‘phrase’ describing the response.
- The ‘Server’ header – contains a banner indicating web server software being used. Information may or may not be accurate, often obfuscated due to security reason to prevent information gathering.
- The ‘Set-Cookie’ header – issues browser a further cookie, this is submitted back in Cookie header of subsequent requests to this server.
- The ‘Pragma’ header – tells browser not to store response in its cache.
- The ‘Content-Length’ header – tells length of message body in bytes.
- The ‘Expires’ header – specifies date & time when the content should expire and resource is re-requested.
- The ‘Access-Control-Allow-Origin’ header – indicates if resource can be retrieved via cross domain AJAX requests.
- The ‘Cache-Control’ header – tells caching directives to browser.
- The ‘Etag’ header – specified an entity tag.
- The ‘Expires’ header – tells browser how long content of body is valid.
- The ‘Location’ header – used in redirection responses to specify target of redirect
- The ‘WWW-Authenticate’ header – used in responses having 401 status code to provide authentication details
- The ‘X-Frame-Options’ header – indicates whether and how current response may be loaded within browser.
Note: See Chapter 13 for explanation of few headers in relation to cross domain AJAX requests.
HTTP General headers
- The ‘Connection’ header – tells other host if it should close TCP connection after HTTP transmission or keep it open.
- The ‘Content-Encoding’ header – tells what encoding is used for content in body i.e. gzip – used by some apps to compress response for faster transmission
- The ‘Content-Type’ header – tells type of content in message body i.e. text/html, text/xml, text/json etc.
The HTTP Methods
- GET method for retrieving resources. It can send parameters in URL query string. URL might be logged on server. Never put sensitive information in query string.
- POST method performs actions. Request parameters can be in URL query string and in body of message. Parameters in body aren’t saved in bookmarks or server logs.
- HEAD method returns only header not the body. Used to check if resource is available.
- OPTIONS shows what HTTP method is accepted by web server.
- PUT uploads to server. Usually disabled, but might be used with APIs.
Cookies are resubmitted in each request to same domain by client to server.
Optional attributes are:
- expires – date when cookie becomes invalid. If not specified, cookie is valid till browser is closed.
- domain – specified domain for which cookie is valid. This must be same from which cookie is received. “Same-Origin-Policy”
- path – specifies URL path for which cookie is valid.
- secure – if it is set, cookie is transmitted only in HTTPS requests.
Status Code Groups
- 1xx – Informational
- 2xx – Request was successful.
- 3xx – Client is redirected to different resource
- 4xx – Request contains an error of some kind
- 5xx – Server encountered an error fulfilling the request
Important Status Codes
- “200 OK” – request succeeded, response body contains result
- “301 Moved Permanently” – redirects browser to another location, client should use new URL in future
- “302 Found” – temporary redirect
- “304 Not Modified” – browser should use cached copy
- “400 Bad Request” – invalid HTTP request
- “401 Unauthorized” – requires HTTP authentication, ‘WWW-Authenticate’ tells type of authentication required
- “403 Forbidden” – no one is allowed to access
- “404 Not Found” – requested resource does not exist
- “500 Internal Server Error” – unhandled exception, server side error
- “503 Service Unavailable” – Web server is responding, but application it serves does not respond
Redirections from HTTP to HTTPS is not okay. One can hijack session and force to use unsecure connection.
Uniform Resource Locator (URL)
- If protocol is absent, it defaults to HTTP.
- If port is absent, it uses default port for protocol i.e. 80 for HTTP, 443 for HTTPS etc.
Representational State Transfer (REST)
- Architectural style for distributed systems.
- In REST style, URL contains parameters in URL file path rather than query strong. i.e.
- Corresponds to REST style as:
HTTP has its own authentication mechanism using various schemes:
- Basic : User sends credentials as base64 encoded string in request message.
- NTLM : challenge-response mechanism and uses a version of Windows NTLM protocol
- Digest : challenge-response mechanism and uses MD5 checksums of nonce with user credentials.
All schemes are cryptographically weak, so it is recommended to use HTTPS.
- HTTP over SSL (Secure Socket Layer)
- SSL is actually TLS (Transport Layer Security) now. Original versions of SSL is deprecated.
- SSL encrypts data while in transit.
- Browser sends requests to proxy server. Proxy fetches resources and sends back to client.
- Proxies can provide caching, authentication, and access control.
- When HTTPS is used, browser cant perform SSL handshake with proxy as it breaks the secure tunnel. Hence browser must use proxy as pure TCP-level relay to pass data between browser and server in both direction. To establish this relay, browser makes HTTP request to proxy with CONNECT method specifying server hostname and port as URL. If proxy allows, it returns 200 OK and keeps TCP connection open.
HTTPS and MITM Attacks
- HTTPS connections use public-key cryptography. Only endpoints can decrypt traffic.
- Companies wanting to restrict HTTPS traffic have two choices:
- Perform complete MITM (Man-In-The-Middle) with fake certificates or real root certificates from trusted CA’s.
- Allow encrypted traffic to trusted domains without possibility of inspection.
Same Origin Policy
- A residing on one domain can cause arbitrary request to another domain. But it cannot itself process the data returned from that request.
- A page residing on one domain can load a script from another domain and execute this within its own context. This is because scripts are assumed to contain code rather than data. So cross domain access should not lead to disclosure of any sensitive information.
- A page residing on one domain cannot read or modify cookies or other DOM data belonging to another domain.
- Heavy use of AJAX for asynchronous requests (behind the scene requests).
- Increased cross-domain integration using various techniques.
- Use of new technologies on client side i.e. XML, JSON and Flex.
- Supports user-generated content, information sharing and interaction.
State and Sessions
Stateful data required to supplement stateless HTTP. This data is held in server-side structure called session. Some state data is stored on client, often cookies or hidden form fields.
URLs may contain only printable ASCII characters (0x20 to 0x7e, inclusive). To transfer other characters, characters must be url encoded.
URL encoded form has % prefix followed by character’s two digit ASCII code expressed in hex. i.e.
- = (%3d)
- % (%25)
- Space (%20)
- New Line (%0a)
- Null byte (%00)
The + character represents URL encoded space.
Always encode following during web app security testing:
space % ? & = ; + #
- Character encoding standard designed to support world’s all writing systems.
- For using over HTTP, 16-bit Unicode encoding has %u as prefix followed by character’s unicode code point expressed in hexadecimal.
- UTF-8 is variable length encoding standard employing one or more bytes to express each character. For multibyte character it uses each byte expressed in hexadecimal and preceded by % prefix. i.e.
Copyright symbol is %c2%a9
- It can be used sometimes to defeat input validation mechanisms.
HTML entities to represent specific literal characters i.e.
" = “
' = ‘
& = &
< = <
> = >
Also, any character can be HTML encoded using its ASCII code in decimal form i.e.
" = "
' = '
OR, by using ASCII code in hex i.e.
" = "
' = '
HTML encoding user data before sending to another user is used to prevent XSS.
- Represents binary data using 64 ASCII characters (A-Za-z0-9/=) using six bits at a time in block of 3 bytes. If final input block results in fewer than 3 chunks of output data, output is padded with one or two = characters.
- Used to encode email attachments for sending via SMTP.
Hexadecimal representation of each ASCII characters.