Web applications

INFORMATION GATHERING

There's an incredible deal of information you can obtain from even a securely airtight website. In many cases, shoddy website construction or holes in security can even leave glaring access credentials and sensitive information at the hands of those who know how to use tools and methods to dissect web applications.

Whether your goal is gathering OSINT (open-source intelligence) or looking for a vulnerability, knowing how to properly analyze websites is a crucial part of any penetration tester's arsenal.

0. EXPLORING THE WEBSITE

Before diving into any source code or developer tools, the first step of analyzing any hosted web element is to map and understand its structure on a macro level.

Navigate the website as any user would, but pay close attention to interactive elements where Javascript could be in play. Make note of the website's map, or structure and take note of the paths available. Are there any portions that seem unfinished, inaccessible, any broken links? Any forms or areas where user input is allowed?

All of these could point to areas of interest later on.

Finally, it will be helpful to create a table of all the website's pages/features, their URLs, and a summary of each. Here is a sample website review:

1. VIEWING THE PAGE SOURCE

The page source is made up of all the HTML, CSS, and JavaScript code that makes up the website that our browsers process behind the scenes and use to display the website as intended.

Most browsers allow you to view the source code by simply putting view-source: in front of the URL. Try going to view-source:https://www.google.com/

You can also get to the source code by using your browser's own options. Here's an article that shows you how to view the source code on any browser.


COMMENTS

Anything beginning with <!-- and ending with --> is a comment in the source code. Developers will often leave comments or notes for themselves or other web collaborators. This is usually done to explain part of the code to other programmers or as notes to themselves. The special comment brackets will ensure that anything typed between them will display as a comment when viewing the source code but will otherwise be invisible on the website.

Careless programmers may leave important information as comments in the source code, either thinking that nobody will look that hard, or maybe they forget to scrub it before publishing it. There have been web developers who have left admin credentials, secret directory locations, or links to private paths in the source code comments.

Due to the nature of comments, these elements would all be hidden if you were just browsing the website as intended, but anyone looking at the source code, a very easy thing to do, would see these comments in plain text.

The point is, you never know what you could find in the source code comments. Some information, even though not damning by itself, could provide the right context to allow an attacker to get closer to finding a vulnerability to it's always good and very easy to scan for any comments that could reveal important information.


LINKS

According to Geek for Geeks "The <a> tag (anchor tag) in HTML is used to create a hyperlink on the webpage. This hyperlink is used to link the webpage to other webpages. It’s either used to provide an absolute reference or a relative reference as its “href” value."

Here is an example of the syntax: <a href = "link"> Link Name </a>

And here's an example of what this looks like in the source code:

<body>

<h2>Welcome to GeeksforGeeks HTML Tutorial</h2>

<a href="https://www.geeksforgeeks.org/html-tutorials/">

GeeksforGeeks HTML Tutorial

</a>

</body>

It's worth taking a look at the links in the source code, especially in case there are any hidden links. Seeing the source or directory path of the link itself can also reveal a lot about the websites' own directory or the sources that the website is linking to. In some cases, knowing these locations could point to a good target or point of entry. That takes us to the site's directory.


DIRECTORY STRUCTURE

Websites are simply a collection of linked pages and functions. The connection from one page to another that we are so accustomed to when we click on a link or site element is the backbone of web browsing, and this requires every website to have a structure, or directory, of its pages, folders, and files. Most sites will use a folder structure to organize all of its pages and assets and will be hosted on some kind of web server. When you click on a link, somewhere in the HTML, the code will point to the right location in the directory to reference and will then send this page to your browser.

Here is a programming-centric breakdown of these directory structures.

This structure is invisible to the font end user, but everything in the back end of a website will reply on this structure to work.

You may not be able to view the entire directory structure just by viewing the source code (especially since there may be private folders that are not referenced in the code itself) but if you look at the links in the source code, you can start to see pieces of the directory structure, which will help you understand the structure of the website and figure out more attack surfaces.

Most websites will not allow the public to browse the website's directory, even if you know all the directory addresses and paths, but sometimes a directory, or a part of a directory, will be made accessible through "directory listing." The ability to navigate a website's directory unimpeded would be the pentesting jackpot if you wanted to gain ultimate access to a web application, so it's important to keep the websites' structure in mind.

There are types of exploits and brute force attacks that can search for common directory names and paths in a website, but the source code could provide some helpful information by itself.


FRAMEWORKS

Most websites are not built from scratch, and many will use what's known as a framework. This is a collection of premade code that allows a developer to use and modify templates to suit their website's needs. At the bottom of the source code, you'll often find a comment with details about the framework in use as well as the version, and sometimes even a link to the framework's website.

This can be a good point of entry since finding a way in through the framework website, especially if it is out of date or vulnerable to exploits, can allow someone to gain admin access to the site itself.

Known what, if any frameworks are being used by a website, as long as information about the framework is just another important base to cover when gathering basic information about a website.

2. DEV TOOLS - INSPECTOR

Developer tools are a comprehensive suite of tools that are included in most modern browsers. This toolkit was created to help web developers test and debug their own web applications, but since it can be used on any website, it's a great way to look at the inner workings of a website and gather more important information about your target.

Check out this guide for how to open up the dev tools on any browser.

This guide will focus on the dev toolkit for Chrome, but each browser will have a comparable experience.

INSPECTING WEB ELEMENTS

Let's start with the inspector tool.

With the dev tools open, right-clicking on an element of the website will display a list of options, the last of these should be "inspect."

When you select inspect, the part of code which "defines" this part of the visible page will be highlighted in the source code listed in a window as part of the dev tools panel. If you don't see it, make sure the elements tab is selected.

CSS STYLES

The inspector will give you a live look into what's being displayed on the screen alongside the code that allows it to function and display as intended. You can click on specific lines of codes and the styles tab can display all of the CSS styles that apply to this element. CSS governs the look of the site, so things like size, color, position, and display properties will be visible and editable.

Hopefully, you're already starting to see some of the possibilities.

Now the displayed code is editable and your browser window will display these changes, but none of the changes you make will change the website's source code itself. Think of it as editing a local copy of a file that only you can see on your browser. This is what makes this a sandbox environment.

MODIFYING STYLES

There are many helpful things you can do with this, one of these is editing the display CSS property. Many websites will have paywalls or elements of text that hover on top of information on the website. Without using the inspector, you may have to sign up or subscribe to have the window disappear, but with the help of the inspector, you can target the elements you want to modify, and in many cases just by changing the display property to none, you can eliminate this element from being displayed in the browser.

Here is a gif that shows this little hack in action.

You can play around with changing or disabling certain elements of a website to get a clearer picture of information, or simply understand how it's displaying what it is. By inspecting specific elements, it can save you a lot of time that you may otherwise have spent combing through lines of code that may look alien to you if you don't already have a good working knowledge of HTML, CSS, and JavaScript.

The inspector is a great way to zero in and play around with the sections of code particular to a specific element on a website.


3. DEV TOOLS - DEBUGGER

The debugger section of the dev tools allows for developers to debug their JavaScript elements and run or block certain elements of the script to troubleshoot the functionality of their website's features.

Javascript is a coding language that allows websites to have more interactivity than just displaying text, images, and links. Any websites with dynamic or interactive features use JS to achieve this.

In Chrome, the debugger console can be found as the sources tab in the Dev Tools, but in Firefox and Safari, it's called the debugger.

Since we are talking about Chrome, if you go to the sources tab, you will see a list of all of the resources a website is using. This will be organized into folders and files. Think of the directory we discussed earlier.

Clicking on a folder allows you to expand its contents and view all the files inside, and clicking on a file will allow you to view the contents of the file in the display window.

PRETTY PRINT

Most of the time, JS files will be compacted so that all of the code is on one line. While this saves space, which is why it's done, it also makes it nearly impossible to read. Because of this, many debugger features will include a "pretty print" option. Selecting this will expand the code to be more readable in the display.

You will be able to see specific lines of JS code that control what happens on the page.

BREAKPOINTS

If you click the number of the line on the left of the code, you'll notice that it turns blue. This is called adding a breakpoint. A breakpoint will tell the page to stop executing the JavaScript code and pause the current processing.

Below, you can see an example of a line in the debugger that's causing an element to disappear on the website, but if you click the number 48, you can add a breakpoint and tell the site to not run this code. Refreshing would reveal whatever element would have been removed when viewing the unedited website on a browser.

There's a lot more that you can do with the debugger, but this was just an overview of how you can use it to see and modify some of the resources used and the JS code in a website.

Here's a video that goes more in-depth into using JavaScript and the debugger to pentest web applications.

4. DEV TOOLS - NETWORK

Another helpful dev tool is the network tab/feature.

This feature allows you to keep track of external requests being made by a webpage. By going to the network tab and refreshing, you will be able to see all of the pages being requested by accessing that part of the site.

VIEW NETWORK REQUESTS

According to Chrome's own dev-tool resources:

"Each row of the Network Log represents a resource. By default the resources are listed chronologically. The top resource is usually the main HTML document. The bottom resource is whatever was requested last.

Each column represents information about a resource.

  • Status. The HTTP response code.

  • Type. The resource type.

  • Initiator. What caused a resource to be requested. Clicking a link in the Initiator column takes you to the source code that caused the request.

  • Time. How long the request took.

  • Waterfall. A graphical representation of the different stages of the request. Hover over a Waterfall to see a breakdown."

This can be a helpful way to see the communication that happens behind the scenes, and information such as the response headers, addresses, paths, methods, and cookies can reveal a lot of information in this gathering phase.

Here are some extra resouces that you can use to dive deeper into this topic:


Web app security testing with browsers: https://getmantra.com/web-app-security-testing-with-browsers/

Improve Your Hacking Skills Using Devtools | Bug Bounty Tips: https://www.youtube.com/watch?v=Y1S5s3FmFsI