Sitemap

Web Scrapping using Python

Dhanush
3 min readJul 23, 2020

My First Article about Web scrapping in python for Absolute Beginners.

Photo by Chris Ried on Unsplash

What is Web Scraping?
Web Scraping means extracting information from websites by parsing the HTML of the web page.

How we do it?
Parsing an HTML webpage is really easy in Python. You can get the information you need with a few lines of code

What do we need?

  1. Pandas
  2. Beautiful Soup
  3. Selenium

1.Pandas:
Pandas is mainly used for data analysis. Pandas allow importing data from various file formats such as comma-separated-values, JSON, SQL, Microsoft Excel.
Pandas allow various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features

To install pandas:

pip install panda

2.Beautiful Soup
The GET function will get the web page for you, but you need to parse the HTML from the page to retrieve the data. That is done by BeautifulSoup.

To install BeautifulSoup:

pip install BeautifulSoup4

3.Selenium
Selenium is a portable framework for testing web applications. It supports many browsers such as Firefox, Chrome, IE, and Safari.

To install Selenium:

pip install selenium

Other supported browsers will have their own drivers available.

For more details about the Installation of Selenium
Visit the Selenium Installation Guide.

Let’s get started.

The Tutorial will be on a Basic level, There are more things that can be done using requests and BeautifulSoup.

In this tutorial, we scrap the details of available televisions from the Flipkart website

Take a look at the link that we are going to scrape, here.

  1. Import the Libraries
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver

2. Create an empty array to store the details. In this case, we create two Empty Array to store the Product name and Product price.

productName = []
productPrice = []

3. Selenium will now start a browser session. For Selenium to work, it must access the browser driver.

driver = webdriver.Chrome(’Path to the chrome webdriver’)
driver.get("paste the url here")
content = driver.page_source

In this case, Chrome web driver is used. We can also use other web drivers.
Refer to this Website.

4. Inspect the website and look for the class name and tag.

The Red highlighted area describes the respective class and tag of the section. In this case, the tag we are looking is <a>

Since we are scrapping only the name and price of the product. Inspect the tags carefully.

The Red highlighted area indicates the class and tag of the Product name

Do the same for the Price and copy the tags.

5. Using the Find and Find All methods in BeautifulSoup. We extract the data and store it in the variable. Refer the Below code

soup = BeautifulSoup(content,'html.parser')
for a in soup.findAll('a',href =True, attrs={'class':'_31qSD5'}):
name = a.find('div' , attrs{'class' :'_31qSD5'})
price= a.find('div',attrs{'class':'_1vC4OE _2rQ-NK'})
productName.append(name.text)
productPrice.append(price.text)

Above this code, variable called name and the price are introduced where the data under the tag <div> and the class name is given as the parameter for Find method. Using append we store the details in the Array we have created before.

6. Store the data in a Sheet.

df = pd.DataFrame({'Product Name':productName,'Price':productPrice})
df.to_csv('Products.csv',index= False, encoding = 'utf-8')

We store the data in Comma-separated values (CSV format)

The Whole code looks like This :

Now run the whole code.

All the data are stored as Products.csv in the path of the Python file.

Results

I hope you guys enjoyed this article on “Web Scraping using Python”. I hope this blog was informative and has added value to your knowledge. If you like my work follow my Medium. Try this to experiment with different modules. Have Fun Learning

“The beautiful thing about learning is that nobody can take it away from you.”

— B.B. King

Thank you!

--

--

Dhanush
Dhanush

Written by Dhanush

Student | Motorsports Enthusiast | Ferrari Tifosi | Dev

Responses (1)