Here is how to scrape a Wikipedia page in Python.

import requests
from bs4 import BeautifulSoup

# URL of the Wikipedia page
url = "https://en.wikipedia.org/wiki/Mathematics"

# Fetch the webpage
response = requests.get(url)
if response.status_code == 200:  # Check if the request was successful
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Scrape the title
    title = soup.select_one("#firstHeading").text
    print(f"Title: {title}")
    
    # Scrape all paragraphs
    paragraphs = soup.select("p")
    for paragraph in paragraphs:
        print(paragraph.text)
    
    # Save the introductory paragraphs to a text file
    intro_text = '\n'.join(para.text for para in paragraphs[:5])
    with open('intro.txt', 'w', encoding='utf-8') as file:
        file.write(intro_text)
else:
    print("Failed to fetch the webpage")

Output

The program generates a text file. The text file contains the content of all the sections of the given Wikipedia page.

Explanation

1. Import Libraries: First, you need to import the necessary libraries - requests for making HTTP requests to fetch the webpage and bs4 (specifically, BeautifulSoup) for parsing the HTML content.
2. Fetch the Webpage: Use the requests.get(url) function to fetch the webpage. The url should be the full URL of the Wikipedia page you want to scrape.
3. Parse the HTML Content: Once you have the webpage's HTML content, parse it with BeautifulSoup by creating a new BeautifulSoup object. Pass the HTML content (response.text) and the parser ('html.parser') as arguments.
4. Scrape the Title: To get the title of the Wikipedia page, use the select method with the CSS selector #firstHeading. This selector targets the element with the ID of firstHeading, which is where Wikipedia stores the title of the page.
5. Scrape All Paragraphs: Use the select method again with the "p" selector to get all paragraphs. This returns a list of all paragraph elements (<p> tags) on the page.
6. Print and Save Paragraphs: Loop through the list of paragraphs. Print each paragraph's text by accessing the .text attribute. To save the introductory paragraphs, join the texts of the first few paragraphs (e.g., first 5) with newline characters and write them to a text file.

Related HowTos

Use the polars library
Use Selenium to browse a page
Load JSON data
Build a GUI with FreeSimpleGUI
Delete a directory
Delete a file
Create matplotlib graphs
Rename a file
Get today's date and convert to string
Create a text file and write text on it
Read a text file with Python
Install dependencies
Flush the output of the print function
Prettyprint a JSON file
Create a dictionary with list comprehension
Select multiple columns in a Pandas dataframe
Profile a script
Reverse a string
Convert two lists into a dictionary
Convert an integer into a string
Generate random strings with upper case letters and digits
Extract extension from filename
Print to stderr in Python
Generate random integers between 0 and 9
Use a pythonic way to create a long multi-line string
Measure elapsed time in Python
Install specific package versions with pip
Read from stdin (standard input)
Get the class name of an instance
Check if a string is empty
Pad zeroes to a string
Delete an element from a dictionary
Check if a string is a float
Count the occurrences of an item in a list
Remove an element from a list by index
Use static methods
Remove a trailing newline
Print literal curly-brace characters in a string and also use .format on it
Exit/deactivate a virtualenv
Determine the type of an object
Limit floats to two decimal points
Know if an object has an attribute
Select an item from a list randomly
Read a file line-by-line into a list
Call a function of a module by using its name
Get the number of elements in a list
Print without a newline or space
Sort a list of dictionaries by a value of the dictionary
Remove a key from dictionary
Rename column names with Pandas
Lowercase a string
Upgrade all Python packages with pip
Get a substring of a string
Get the last element of a list
Parse a string to a float or integer
Convert a string into datetime format
Access environment variable values
Print coloured text on the terminal
Find current directory of a file
Change the size of figures drawn with Matplotlib
Manually raise an exception
Delete a file or folder
Split a list into evenly sized parts
Select rows from a DataFrame based on column values with Pandas
Install pip on Windows
Check if a given key already exists in a dictionary
Iterate over rows in a DataFrame for Pandas
Make function decorators and chain them together
Pass a variable by reference
Make a time delay
Convert bytes to a string
Copy a file
Concatenate two lists
Add new keys to a dictionary
Catch multiple exceptions in a single 'except' block
Check if a list is empty
Get the current time
Sort a dictionary by value
Use global variables in a function
List all files of a directory
Iterate over dictionaries using for loops
Check if a string contains a substring
Find the index of an item in a list
Understand how slice notation works
Make a flat list out of a list of lists
Accesses the index in for loops
Safely create a nested directory
Merge two dictionaries in a single expression
Check whether a file exists without exceptions
Convert a dictionary into a list
Convert a list into a dictionary
Duplicate a file
Append text in a text file
Use for loops
Deploy a web app to Heroku
Schedule a Python script for execution at a specific time every day
Store Python passwords securely on Windows, Mac, and Linux
Do dictionary comprehension
Do list comprehension
Create a virtual environment
Create a new file
Merge two lists
Extract items from two different lists into one list
Check if a text file is empty
Randomly select an item from a list
Generate a random integer
Break a while loop
Create a pandas DataFrame from a dictionary
Create a pandas DataFrame from a list
Get the last item of a list
Delete a column from a pandas dataframe
Access a column of a pandas dataframe
Create a class
Make a webpage request
Get the first two characters of a string
Loop through two lists at the same time

Recommended Course

Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.

Buy the Course