1. Python

Python Program to Find All Links on Webpage

In this article, I will discuss about How Python Programming Language can be used for extracting all of links in a webpage.
This program will will use Python Modules – urllib.request and re.
Using function urlopen form urllib.request, we can retrive the a webpage from a server as a HTTP Response Object. For extracting html code out of this just use python read() function, which will take out html code inside HTTP Response Object as Object Byte Code. But in order to find out URLs inside this Object Byte Code, this need to be converted to simpler text format like utf-8. Which can be done using decode(‘utf-8’) function on Byte Code Object. After this Urls inside this simpler text can be searched using re module using ‘”((http|ftp)s?://.*?)”‘ as pattern matching string.

Steps to Find out all links on a Webpage using Python

#importing the required modules
from urllib.request import urlopen
import re

# Connecting to a URL
webpage = urlopen("https://computersciencehub.io")

# Reading html code of Webpage
html = webpage.read().decode('utf-8')

# using re module of Python for extracting all of links in Webpage
links = re.findall('"((http|ftp)s?://.*?)"', html)

# printing list of links in a webpgae
for i in links:
Comments to: Python Program to Find All Links on Webpage

Your email address will not be published.