Before jumping to webdriver script, let’s discuss some concepts like Broken links, HTTP errors code etc.
What are Broken Links?
A link is an HTML object that allows users to navigate to another page when they click on it. It is a way to go from one web page to another.
A broken link, also known as a dead link, is one that does not work, i.e., it does not redirect to the webpage it is meant to. This issue is usually caused by the website or a particular web page is unavailable or non-existent. When a user attempts to follow a broken link, an error message is displayed.
Links may be broken due to an error on the server, which can then cause the affected page to not display correctly. A valid URL has a 2xx status code. The broken links have 4xx and 5xx status codes.
HTTP Status Codes for Broken Links
HTTP Status Code | Definition |
---|---|
400 (Bad Request) | The server’s processing of the request is unable to be completed. |
400 (Bad Request – Bad Host) | The server cannot process the request because the hostname is invalid. |
400 (Bad Request – Bad URL): | The URL in the request is malformed, and the server was unable to understand it. Please check the URL and correct any errors before resending the request. |
400 (Bad Request – Empty) | The server returned an empty response with no content or response code. |
400 (Bad Request – Timeout) | The requested HTTP pages took too long to load. |
400 (Bad Request – Reset) | The server is busy processing other requests or has been misconfigured, so it is unable to process the request. |
404 (Page Not Found) | The page you requested is not available on the server. |
403 (Forbidden) | Authorization is required to fulfill the request. |
410 (Gone) | The page you are looking for has been removed. |
408 (Request Time Out) | The server timed out while waiting for the request. |
503 (Service Unavailable) | The server is currently experiencing technical difficulties and cannot process the request. |
Common Reasons for Broken Links
- 404 Page Not Found – The owner removed the destination page.
- 400 Bad Request – The server can’t process the request because the URL address is incorrect.
- The browser can’t access the destination web page because of the user’s settings.
- The link isn’t correct.
How to identify broken links in Selenium WebDriver
The process of checking for broken links in Selenium is simple. On a website, hyperlinks are made with the HTML Anchor (<a>) tag. The script needs to get the URLs for every anchor tag on a web page and check if any of them are broken.
To identify broken links in Selenium, please follow the steps below.
- Get all the links on a web page using the <a> tag.
- Assign all links to the List.
- Iterate every link using for loop.
- Send HTTP request with the HEAD method for each link.
- Check the response code of the HTTP response.
- Use the HTTP response code to see whether the link is working or not and print it in the console.
- Repeat the process for all links captured.
- At the end of the script, close the browser.
Selenium Webdriver Script
package testing.org; import org.openqa.selenium.WebDriver; import java.io.IOException; import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; import java.util.List; import java.util.concurrent.TimeUnit; import org.openqa.selenium.By; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; public class BrokenLinks { public static boolean isValid(String url) { /* Try creating a valid URL */ try { new URL(url).toURI(); return true; } // If there was an Exception // while creating URL object catch (Exception e) { return false; } } public static void main(String[] args) { // TODO Auto-generated method stub WebDriver driver; // for running in Chrome, uncomment 2 following lines System.setProperty("webdriver.chrome.driver", "C:\\Users\\Jayant\\Pictures\\qatraininimages\\chromedriver_win32\\chromedriver.exe"); driver = new ChromeDriver(); driver.manage().window().maximize(); driver.manage().timeouts().implicitlyWait(20, TimeUnit.SECONDS); String homePage = "http://demo.itlearn360.com"; driver.get(homePage); List<WebElement> allLinks = driver.findElements(By.tagName("a")); for (WebElement link : allLinks) { String url = link.getAttribute("href"); Boolean isvalid = isValid(url); if (isvalid) { try { HttpURLConnection huc = (HttpURLConnection) (new URL(url).openConnection()); huc.setRequestMethod("HEAD"); huc.connect(); int respCode = huc.getResponseCode(); if (respCode >= 400) { System.out.println(url + " is a broken link"); } else { System.out.println(url + " is a valid link"); } } catch (MalformedURLException e) { // TODO Auto-generated catch block System.out.println(url + " is malformed"); } catch (ClassCastException e) { // TODO Auto-generated catch block System.out.println(url + " is invalid link"); } catch (IOException e) { // TODO Auto-generated catch block System.out.println(url + " is a invalid link"); } } } driver.close(); } }
Output
This is how you can find broken links for any site.