How to check Broken links with Selenium Webdriver?

Broken links are the silent killers of website performance, affecting both user experience and SEO rankings. They affect both user experience and SEO rankings. It is critical to identify and fix them, especially for quality assurance professionals and developers.

In this guide, we will examine how to use Selenium WebDriver, a powerful tool for web automation, to efficiently check for broken links on a website. This step-by-step process will help you improve your web testing skills, whether you are a beginner or an experienced tester.

  1. Enhances User Experience: Users get frustrated when they land on broken pages, leading to a loss in trust and potential conversions.
  2. Boosts SEO Rankings: Search engines penalize websites with broken links, affecting organic traffic.
  3. Maintains Professionalism: Broken links reflect poorly on a website’s credibility and quality.

By using Selenium WebDriver to automate the process, you can save time and make sure everything is tested thoroughly.

What is Selenium WebDriver?

As we know, the Selenium WebDriver is an open-source framework that facilitates browser interactions. It is compatible with multiple browsers and programming languages, making it a versatile tool for web testing. With Selenium WebDriver, you can automate tasks such as broken link detection across multiple pages.

1. Set Up Selenium WebDriver
  • Download and install the Selenium WebDriver library for your preferred language (e.g., Java, Python).
  • Install the appropriate browser driver (e.g., ChromeDriver, GeckoDriver).
2. Identify Links on the Page
  • Use Selenium’s findElements method to extract all links (<a> tags) on the webpage.
3. Verify Link Status
  • Send HTTP requests to each link using libraries like HttpURLConnection in Java.
  • Analyze the HTTP response codes:
    • 200: Link is valid.
    • 404: Link is broken.
4. Handle Exceptions
  • Implement error handling to manage invalid URLs or network errors.
5. Output Results
  • Print the status of each link (valid or broken) in the console.
HTTP Status CodeDefinition
400 (Bad Request)The server’s processing of the request is unable to be completed.
400 (Bad Request – Bad Host)The server cannot process the request because the hostname is invalid.
400 (Bad Request – Bad URL):The URL in the request is malformed, and the server was unable to understand it. Please check the URL and correct any errors before resending the request.
400 (Bad Request – Empty)The server returned an empty response with no content or response code.
400 (Bad Request – Timeout)The requested HTTP pages took too long to load.
400 (Bad Request – Reset)The server is busy processing other requests or has been misconfigured, so it is unable to process the request.
404 (Page Not Found)The page you requested is not available on the server.
403 (Forbidden)Authorization is required to fulfill the request.
410 (Gone)The page you are looking for has been removed.
408 (Request Time Out)The server timed out while waiting for the request.
503 (Service Unavailable)The server is currently experiencing technical difficulties and cannot process the request.

The process of checking for broken links in Selenium is simple. On a website, hyperlinks are made with the HTML Anchor (<a>) tag. The script needs to get the URLs for every anchor tag on a web page and check if any of them are broken.

To identify broken links in Selenium, please follow the steps below.

  1. Get all the links on a web page using the <a> tag.
  2. Assign all links to the List.
  3. Iterate every link using for loop.
  4. Send HTTP request with the HEAD method for each link.
  5. Check the response code of the HTTP response.
  6. Use the HTTP response code to see whether the link is working or not and print it in the console.
  7. Repeat the process for all links captured.
  8. At the end of the script, close the browser.

Selenium Webdriver Script

package testing.org;

import org.openqa.selenium.WebDriver;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.List;
import java.util.concurrent.TimeUnit;

import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class BrokenLinks {
public static boolean isValid(String url) {
/* Try creating a valid URL */
try {
new URL(url).toURI();
return true;
}

// If there was an Exception
// while creating URL object
catch (Exception e) {
return false;
}
}

public static void main(String[] args) {
// TODO Auto-generated method stub
WebDriver driver;
// for running in Chrome, uncomment 2 following lines
System.setProperty("webdriver.chrome.driver",
"your chrome driver exection file path");
driver = new ChromeDriver();
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS);
String homePage = "http://demo.itlearn360.com";
driver.get(homePage);
List<WebElement> allLinks = driver.findElements(By.tagName("a"));

for (WebElement link : allLinks) {
String url = link.getAttribute("href");
Boolean isvalid = isValid(url);
if (isvalid) {
try {
HttpURLConnection huc = (HttpURLConnection) (new URL(url).openConnection());
huc.setRequestMethod("HEAD");
huc.connect();
int respCode = huc.getResponseCode();
if (respCode >= 400) {
System.out.println(url + " is a broken link");
} else {
System.out.println(url + " is a valid link");
}

} catch (MalformedURLException e) {
// TODO Auto-generated catch block
System.out.println(url + " is malformed");
}
catch (ClassCastException e) {
// TODO Auto-generated catch block
System.out.println(url + " is invalid link");
}
catch (IOException e) {
// TODO Auto-generated catch block
System.out.println(url + " is a invalid link");
}
}
}

driver.close();
}
}

Output

  1. Use Batches: Test links in small groups to avoid overloading the server.
  2. Integrate into CI/CD Pipelines: Automate broken link detection as part of your development process.
  3. Handle Redirects Properly: Check for 3xx status codes to ensure redirect links are functional.
  4. Document Results: Generate a detailed report of broken links for better tracking.

Common Challenges and Solutions

  • Dynamic URLs: Use dynamic locators to handle links generated at runtime.
  • Session-Based Links: Test links requiring authentication using session cookies.
  • Timeout Errors: Set appropriate timeouts for slow-loading pages.

Master Selenium WebDriver at Infotek Solutions!

Take your testing skills to the next level! At Infotek Solutions, we teach you how to use Selenium WebDriver and automate tasks like finding broken links on websites. Whether you are a new or experienced QA professional, our training will equip you with the skills you need to excel in your career.

👉 Join our Selenium WebDriver Training Today!