This is a guide on handling CSV data using Python and the pandas
library. For this tutorial, we will use a remote CSV file located at:
https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv
Let's go through the steps to read, process, and manipulate this data.
If you haven't already installed pandas
, you can do so using pip. Run the following command in your terminal or command prompt:
pip install pandas
To begin, you'll need to import pandas and use it to read the CSV file. Here's how you can do it:
import pandas as pd # Load the CSV file from a URL
url = "https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv"
data = pd.read_csv(url) # Display the first few rows of the dataframe
print(data.head())
This code loads your CSV data into a DataFrame—a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). We provided the URL of the CSV file in the read_csv() method, but you can provide the path to a local file as well.
After loading the data, you can do many kinds of data manipulations, and statistics, and even save the manipulated data in a new CSV file. Let's try to get some basic statistics about the data of our CSV file:
# Display the first 5 rows of the DataFrame
print(data.head()) # Get a concise summary of the DataFrame
print(data.info()) # Generate descriptive statistics
print(data.describe())
You can perform various data manipulations with pandas. For instance, let's say you want to filter the data to find only individuals who are older than 30:
older_than_30 = data[data['Age'] > 30]
print(older_than_30)
Or, you might want to create a new column that categorizes individuals as either 'Young' or 'Old' based on their age:
data['Category'] = data['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
print(data)
After processing the data, you might want to save the modified DataFrame back to a CSV file:
data.to_csv('processed_data.csv', index=False)
Setting index=False
ensures that the DataFrame's index (i.e., the row numbers) is not written into the file.
This was a simple tutorial on using pandas to handle CSV data. Pandas offers many more functions and capabilities for data analysis, making it a staple for Python data manipulation and analysis tasks.
Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.