Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Importing & Converting HTML as DOM Using BeautifulSoup
from bs4 import BeautifulSoup
HtmlSource = '<div class="sample"> this is sample text <span> I dont want this text</span></div>'
soup = Beautifulsoup(HtmlSource)
#All elements named <div>
soup.select('div')
#The element with an id attribute of author
soup.select('#author')
#All elements that use a CSS class attribute named notice
soup.select('.notice')
#All elements named <span> that are within an element named <div>
soup.select('div span')
#All elements named <span> that are directly within an element named <div>, with no other element in between
soup.select('div > span')
#All elements named <input> that have a name attribute with any value
soup.select('input[name]')
#All elements named <input> that have an attribute named type with value button
soup.select('input[type="button"]')
1. To extract all links
<a class="next" href="/apple-touch-icon-57x57.png"/>
soup.find("a",attrs={"class":"next"}).get('href')
2. To Canonical URLs
<link rel="canonical" href="https://www.flipkart.com/redmi-note-5-pro-gold-64-gb/p/itmf2fc3txmqwdkb"/>
soup.find_all('link', {'rel': 'canonical'})[0]['href']
3. To Cut Selected Element in DOM
<div class="sample">Required this text alone.<span> I don't want this text</span></div>
>>soup.text() => Required this text alone.I don't want this text
>>[i.extract() for i in soup('span')]
<div class="sample"> this is sample text </div>
>>soup.text() => Required this text alone.