import requests from bs4 import BeautifulSoup # URL of the Wikipedia page url = "https://en.wikipedia.org/wiki/Mathematics" # Fetch the webpage response = requests.get(url) if response.status_code == 200: # Check if the request was successful # Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser') # Scrape the title title = soup.select_one("#firstHeading").text print(f"Title: {title}") # Scrape all paragraphs paragraphs = soup.select("p") for paragraph in paragraphs: print(paragraph.text) # Save the introductory paragraphs to a text file intro_text = '\n'.join(para.text for para in paragraphs[:5]) with open('intro.txt', 'w', encoding='utf-8') as file: file.write(intro_text) else: print("Failed to fetch the webpage")
The program generates a text file. The text file contains the content of all the sections of the given Wikipedia page.
1. Import Libraries: First, you need to import the necessary libraries - requests
for making HTTP requests to fetch the webpage and bs4
(specifically, BeautifulSoup
) for parsing the HTML content.
2. Fetch the Webpage: Use the requests.get(url)
function to fetch the webpage. The url
should be the full URL of the Wikipedia page you want to scrape.
3. Parse the HTML Content: Once you have the webpage's HTML content, parse it with BeautifulSoup
by creating a new BeautifulSoup
object. Pass the HTML content (response.text
) and the parser ('html.parser'
) as arguments.
4. Scrape the Title: To get the title of the Wikipedia page, use the select
method with the CSS selector #firstHeading
. This selector targets the element with the ID of firstHeading
, which is where Wikipedia stores the title of the page.
5. Scrape All Paragraphs: Use the select
method again with the "p"
selector to get all paragraphs. This returns a list of all paragraph elements (<p>
tags) on the page.
6. Print and Save Paragraphs: Loop through the list of paragraphs. Print each paragraph's text by accessing the .text
attribute. To save the introductory paragraphs, join the texts of the first few paragraphs (e.g., first 5) with newline characters and write them to a text file.
Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.