Working with CSV files in Python


This is a guide on handling CSV data using Python and the pandas library. For this tutorial, we will use a remote CSV file located at:

https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv

Let's go through the steps to read, process, and manipulate this data.

Step 1: Installing Pandas

If you haven't already installed pandas, you can do so using pip. Run the following command in your terminal or command prompt:

pip install pandas

Step 2: Reading the CSV File

To begin, you'll need to import pandas and use it to read the CSV file. Here's how you can do it:

import pandas as pd
# Load the CSV file from a URL
url = "https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv"
data = pd.read_csv(url) # Display the first few rows of the dataframe
print(data.head())

This code loads your CSV data into a DataFrame—a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). We provided the URL of the CSV file in the read_csv() method, but you can provide the path to a local file as well.

Step 3: Basic Data Examination

After loading the data, you can do many kinds of data manipulations, and statistics, and even save the manipulated data in a new CSV file. Let's try to get some basic statistics about the data of our CSV file:

# Display the first 5 rows of the DataFrame
print(data.head()) # Get a concise summary of the DataFrame
print(data.info()) # Generate descriptive statistics
print(data.describe())
These functions help you quickly inspect your data for any immediate inconsistencies or understand its structure.

Step 4: Data Manipulation

You can perform various data manipulations with pandas. For instance, let's say you want to filter the data to find only individuals who are older than 30:

older_than_30 = data[data['Age'] > 30]
print(older_than_30)

Or, you might want to create a new column that categorizes individuals as either 'Young' or 'Old' based on their age:

data['Category'] = data['Age'].apply(lambda x: 'Young' if x < 30 else 'Old')
print(data)

Step 5: Saving Processed Data

After processing the data, you might want to save the modified DataFrame back to a CSV file:

data.to_csv('processed_data.csv', index=False)

Setting index=False ensures that the DataFrame's index (i.e., the row numbers) is not written into the file.

This was a simple tutorial on using pandas to handle CSV data. Pandas offers many more functions and capabilities for data analysis, making it a staple for Python data manipulation and analysis tasks.


Recommended Course

Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.