How to check Broken links with Selenium Webdriver?

Before jumping to webdriver script, let’s discuss some concepts like Broken links, HTTP errors code etc.

A link is an HTML object that allows users to navigate to another page when they click on it. It is a way to go from one web page to another.

A broken link, also known as a dead link, is one that does not work, i.e., it does not redirect to the webpage it is meant to. This issue is usually caused by the website or a particular web page is unavailable or non-existent. When a user attempts to follow a broken link, an error message is displayed.

Links may be broken due to an error on the server, which can then cause the affected page to not display correctly. A valid URL has a 2xx status code. The broken links have 4xx and 5xx status codes.

HTTP Status Code Definition
400 (Bad Request) The server’s processing of the request is unable to be completed.
400 (Bad Request – Bad Host) The server cannot process the request because the hostname is invalid.
400 (Bad Request – Bad URL): The URL in the request is malformed, and the server was unable to understand it. Please check the URL and correct any errors before resending the request.
400 (Bad Request – Empty) The server returned an empty response with no content or response code.
400 (Bad Request – Timeout) The requested HTTP pages took too long to load.
400 (Bad Request – Reset) The server is busy processing other requests or has been misconfigured, so it is unable to process the request.
404 (Page Not Found) The page you requested is not available on the server.
403 (Forbidden) Authorization is required to fulfill the request.
410 (Gone) The page you are looking for has been removed.
408 (Request Time Out) The server timed out while waiting for the request.
503 (Service Unavailable) The server is currently experiencing technical difficulties and cannot process the request.
  • 404 Page Not Found – The owner removed the destination page.
  • 400 Bad Request – The server can’t process the request because the URL address is incorrect.
  • The browser can’t access the destination web page because of the user’s settings.
  • The link isn’t correct.

The process of checking for broken links in Selenium is simple. On a website, hyperlinks are made with the HTML Anchor (<a>) tag. The script needs to get the URLs for every anchor tag on a web page and check if any of them are broken.

To identify broken links in Selenium, please follow the steps below.

  1. Get all the links on a web page using the <a> tag.
  2. Assign all links to the List.
  3. Iterate every link using for loop.
  4. Send HTTP request with the HEAD method for each link.
  5. Check the response code of the HTTP response.
  6. Use the HTTP response code to see whether the link is working or not and print it in the console.
  7. Repeat the process for all links captured.
  8. At the end of the script, close the browser.

Selenium Webdriver Script

package testing.org;

import org.openqa.selenium.WebDriver;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class BrokenLinks {
    public static boolean isValid(String url) {
        /* Try creating a valid URL */
        try {
            new URL(url).toURI();
            return true;
        }

        // If there was an Exception
        // while creating URL object
        catch (Exception e) {
            return false;
        }
    }

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        WebDriver driver;
        // for running in Chrome, uncomment 2 following lines
        System.setProperty("webdriver.chrome.driver",
                "C:\\Users\\Jayant\\Pictures\\qatraininimages\\chromedriver_win32\\chromedriver.exe");
        driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
        String homePage = "http://demo.itlearn360.com";
        driver.get(homePage);
        List<WebElement> allLinks = driver.findElements(By.tagName("a"));

        for (WebElement link : allLinks) {
            String url = link.getAttribute("href");
            Boolean isvalid = isValid(url);
            if (isvalid) {
                try {
                    HttpURLConnection huc = (HttpURLConnection) (new URL(url).openConnection());
                    huc.setRequestMethod("HEAD");
                    huc.connect();
                    int respCode = huc.getResponseCode();
                    if (respCode >= 400) {
                        System.out.println(url + " is a broken link");
                    } else {
                        System.out.println(url + " is a valid link");
                    }

                } catch (MalformedURLException e) {
                    // TODO Auto-generated catch block
                    System.out.println(url + " is malformed");
                }
                catch (ClassCastException e) {
                    // TODO Auto-generated catch block
                    System.out.println(url + " is invalid link");
                }
                catch (IOException e) {
                    // TODO Auto-generated catch block
                    System.out.println(url + " is a invalid link");
                }
            }
        }

        driver.close();
    }
}

 Output

This is how you can find broken links for any site.