In a web page, we see lot’s of links which directs user to different other screens of same or different domain.
As a QA it’s always difficult to gather all the links of a particular page or whole site and test them one by one to make sure they really work.
We can actually automate the process of collecting the links of a web page and check to make sure if they really redirect or not.
How to collect all links of a webpage?
We know already links has the html tag as a
and the actual link will be assigned to href
attribute
We can use selenium, launch url and collect all the links using the driver.findElements(By.tagName("a"))
Verify broken or empty links
Once collected all the links, loop and verify that the href attribute should not be null or blank or void
Is it this simple?
Not really, sometimes we can have the href value but the link could be broken or moved to some other url or un-authorised or even the server may be down while accessing.
now the question comes, how to know verify all the above possible causes of link not working?
The answer is using Java HttpURLConnection
This above class can provide the http status code of each link and also the error message, based on the status code or error message, we can verify if the link is broken or working fine.
Basic logic we will apply here as – if the status code returns as 200, then the link is working fine.
learn more about http status codes here
We will be using this demo site – https://qavbox.github.io/demo/links/
Under the broken links section, you will get 3 links with 3 different status code other than 200, which means the links are broken, all other links on web page should return 200
import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; import java.net.HttpURLConnection; import java.net.URL; import java.util.List; public class BrokenLinks { public static void main(String[] args) throws InterruptedException { System.setProperty("webdriver.chrome.driver","/Users/skpatro/sel/chromedriver"); WebDriver driver = new ChromeDriver(); driver.get("https://qavbox.github.io/demo/links/"); //collecting links on the web page List<WebElement> links = driver.findElements(By.tagName("a")); //print the number of links collected System.out.println("No of links are "+ links.size()); //verify if the href is not null //if not null, then get the http status code and message for (WebElement link : links) { String url = link.getAttribute("href"); if (!url.isEmpty()) { verifyLinks(url); } } Thread.sleep(2000); driver.quit(); } public static void verifyLinks(String link) { try { //create URL connection for each link and get the response HttpURLConnection httpURLConnect= (HttpURLConnection)new URL(link).openConnection(); httpURLConnect.setConnectTimeout(5000); httpURLConnect.connect(); String Message = httpURLConnect.getResponseMessage(); int status = httpURLConnect.getResponseCode(); if(status!=200) { System.out.println(link + " - is broken - " + status + " " + Message); } else{ System.out.println(link + " - is working fine - " + status + " " + Message); } }catch (Exception e) { System.out.println("can't verify the links " + e.getMessage()); } } }
Collecting all the links on the web page using selenium driver.findElements() and stored in List<String> variable.
Then, loop for each link and pass to verifyLinks() method, which actually returns http connection status and message of each link.
httpURLConnect.setConnectTimeout(5000);
//max will wait for 5 sec if no response comes
httpURLConnect.getResponseMessage()
// provides the status message, Not Found, OK etc…
httpURLConnect.getResponseCode()
//returns connection status as 200, 300, 400 etc…
Above condition will treat 300 series status code as broken links, but 301 is not considered as broken link [if your link doesn’t have https, then also the status shows as 301].
You can play with the if condition if(status!=200)
to print the link is broken or not, like you can have if condition as status code => 400
, then the links are broken.
Even individual links as well we can verify by directly calling the verifyLinks method –
verifyLinks("https://qavalidation.com/?page_id=5669123"); verifyLinks("https://the-internet.herokuapp.com/status_codes/500"); verifyLinks("https://the-internet.herokuapp.com/status_codes/301"); verifyLinks("https://qavalidation.com");
Hope this helps!