In this section, we'll learn about the HTTP protocol, how it works, and the security aspects of it and which methods are supported when performing a request.
This will provide you with the basic knowledge of HTTP, which is important to understand how to build tools and test for security issues in web applications.
HTTP was designed to enable communication between clients and servers.
HTTP is a TCP/IP-based communication protocol operating in the application layer. Normally, we use a web browser to interact with web applications but in this training, we will leave the browser behind and use Python to talk with web applications. This protocol is media independent.
This means that any type of data can be sent via HTTP as long as the client and server know how to handle the data content. And it is stateless, which means that the HTTP server and the clients are aware of each other during the request to transaction only. Due to this characteristic, neither the client or the server retain information between requests, which will later be helpful when you perform some attacks.
The HTTP protocol is available in two different versions:
- HTTP/1.0: This uses a new connection for each request/response transaction
- HTTP/1.1: This is where the connection can be used by one or more request response transactions
HTTP is not a secure protocol, which means that all communication is clear text, which is susceptible to interception and tampering.
Generally, HTTP is being served on port 80. The following is an example of what a simple transaction looks like:
On the left, we have the client, which sends an HTTP GET request to the server asking for the resource test.html. The server returns an HTTP response with a 200 OK code, some header, and the content test.html if it exists on the server.
If it does not exist, it will return a 404 Not Found response code. This represents the most basic GET request in the web application world.
In 1994, HTTPS was introduced to add security on top of HTTP. HTTPS is not a protocol itself, but the result of layering HTTP on top of Secure Socket Layer (SSL) or Transport Layer Security (TLS).
HTTPS creates a secure channel over an insecure network. This ensures reasonable protection from eavesdroppers and man-in-the-middle attacks provided that adequate cipher suites are used and that the service certificate is verified and trusted. So, whenever the application handles sensitive information, such as banking payments, shopping websites, login pages, and profile pages, it should use HTTPS. Basically, if we handle processes or store customer data, it should use HTTPS.
In HTTP, methods indicate the desired action to be performed on the chosen resource, also known as HTTP verbs. HTTP/1.0 defines three methods:
- HEAD: This will only return the headers and the status code without its content
- GET: This is the standard method used to retrieve resource content given a URI
- POST: This is a method used to submit content to the server, form data, files, and so on
Then, HTTP/1.1 introduced the following methods:
- OPTIONS: This provides the communication options for the target resource
- PUT: This requests to store a resource identified by the given URI
- DELETE: This removes all representations of the target resource identified by the given URI
- TRACE: This method echoes the received request so that the client can see what changes or editions have been made by intermediate servers
- CONNECT: This establishes a tunnel to the server identified by a given URI used by HTTPS
- PATCH: This method applies partial modifications to a resource
HEAD, GET, OPTIONS, and TRACE are by convention defined as safe, which means they are intended only for information retrieval and should not change the state of the server.
On the other hand, methods such as POST, PUT, DELETE, and PATCH are intended for actions that may cause side effects either on the server or external side effects. There are more methods than these. I encourage you to explore them.
We have seen that HTTP is a client server protocol, which is stateless.
This protocol doesn't provide any security and hence HTTPS was created to add a secure layer on top of HTTP. We also learned that there are some different methods that will instruct the server to perform different actions on the chosen resources.
In this section, we'll take a look at the structure of a URL, the request and response headers, and an example of GET requests using Telnet to understand how it works at a low level.
I bet you have seen thousands of URLs by now. It's now time to stop and think about the URL structure. Let's see what each part means:
The first part is the protocol in web applications. The two protocols used are HTTP and HTTPS. When using HTTP, the port that will be used is 80, and when using HTTPS, the port will be 443.
The next part is the host we want to contact. Next, we can see the resource or the file location in that server. In this example, the directory is content and the resource is section. Then, we have the question mark symbol that indicates what's to come is the query string. These are the parameters that will be passed to the section of the page for processing purposes.
There are some alternatives such as adding username and password for authentication before the host, or explicitly defining the port for cases where the web server is not listening in the standard 80 or 443 ports.
Now, let's talk about headers. Headers are a core part of HTTP requests a...