When I started as a teenager with programming and web development I didn’t quite understand how this whole browser, server, TCP, HTTP, headers, cookies whatever work. So I’ll try to introduce those things slowly and move more and more into the security aspects. I will try to keep this series a little bit more high-level, but I certainly will dig more into the lower-level networking parts at some point.
The local machine or computer
For now, we will stay on our local machine and won’t make any connections to a server. We start with displaying or rendering websites in the browser. So. You already know that browsers can render HTML files.
So let’s start by constructing a simple one. The first line is the doctype. In previous HTML versions, this was more important, because it included important render information for different modes for the browsers in the past.
The HTML skeleton
But nowadays it’s mostly irrelevant. Then comes the standard HTML skeleton.
HTML tag. Including a head and a body. Oh and if it wasn’t really clear, HTML is hierarchical. An HTML tag always opens and closes again. And inside you can have more tags.
So you should read HTML always in like pairs of tags. And think of it in a hierarchical order. Indentation helps to do that. Let’s add a heading with h1. And some text in paragraphs with p. Let’s also add an image with the IMG tag. And while we are on it a link to another site. There are a lot of different HTML tags that provide different user interface features. Some are for styling text stuff like headers or images. Others provide input elements for formulas, A simple textbox, and a submit button. Generally, the syntax for tags is like this. Less-than (-) sign, followed by the tag name without space.
After that, you can have multiple key and value pairs divided by spaces. These are called the attributes of an HTML tag. The values should be quoted — but they don’t have to. Another small thing that is helpful for certain kinds of attacks. Now we can open this file in the browser.
The browser display
The browser parses the HTML code and starts drawing the elements to the screen. When you right-click somewhere you can select “inspect element”. Which opens up the Chrome developer tools. Other modern browsers offer similar tools. In the first tab called “elements”, you can see all the HTML document elements.
It has this nice hierarchical display where you can open and collapse tags. You can also go in there and change the HTML. This is obviously just changing what the browser currently displays, and when you refresh you obviously see the original file again. You are not modifying the actual file on your hard drive. Depending on your humor, that might already be enough to have fun by faking the content of a website, make a screenshot and post it to Facebook, claiming that it was real.
Browsers are also very good at something else. Html is kinda like a programming language. But when you write some weird HTML code the page doesn’t break like a python or c program would.
Browsers are very good at “fixing” the crappy HTML code we write. Fixing is maybe the wrong word.
- let’s say they are very liberal in what they accept. Here you see some examples. We put the form inside of a paragraph p tag which is a violation of the standard. But in the inspect view you can see how the browser fixed that, by making a p tag before and after the form.
Or they don’t care if you forget to close your tags. It doesn’t complain if you use weird characters in weird places. It doesn’t complain about IDs that are supposed to be unique being used multiple times. You can imagine that this might aid in exploitation if you are able to modify (or generally inject new HTML code into the) HTML of a site. You can use this to get around certain restrictions. For example, if you cannot use slashes for some reason, then simply don’t close the tag.
Anyhow, besides HTML there are a few other very common technologies used.
Second is CSS
CSS is another type of language that allows you to style the HTML page. For example, changing colors. You have multiple options to do that.
You can give each HTML element a style attribute. You can use a style tag, or you can reference a separate CSS file. CSS syntax is also super easy. You start with an identifier, called the selector. So first you have to decide which HTML tags you want to style. You can define styles for all tags of a certain kind. Or all tags with a certain class using a dot before the class name.
Or only the tag with this ID using a hash symbol. You can also combine these. Input DOT class name references all input elements that have this class name.
Or comma separate if you want to give different selectors the same style. Or hierarchical, only style this paragraph if inside of a div. Inside of curly braces you define again key and values, this time semicolon-separated. There are many different styling features. Such as text colors, fonts, margins, and paddings.
And again, you can play with these things in the browser developer tools. Just click on an element and on the right you can make changes.
It’s awesome. You can immediately see the results. Really cool for learning… Now let’s give this an awful background color.
So for example you can take the document, then its HTML body, and its children. And then access and change some text. Again, all you do here is in the browser. Opening a website in the browser is like opening a picture in photoshop. Photoshop renders the image, but also offers you tools to play with that image.
And the browser also opens and renders a file. And provides tools to play with the displayed paged. Now this resource is loaded from the local filesystem, you can see that it has the URL scheme file, colon, slash, slash. Followed by its path. Usually, websites are loaded from a remote server via the HTTP protocol.