Working with CSV files in Python

This is a guide on handling CSV data using Python and the pandas library. For this tutorial, we will use a remote CSV file located at: https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv Let's go through the steps to read, process, and manipulate this data. Step 1: Installing Pandas If you haven't already installed pandas, you can do so using pip. Run the following command in your terminal or command prompt: pip install pandas Step 2: Reading the CSV File To begin, you'll need to import pandas and use it to read the CSV file. Here's how you can do it: import pandas as pd # Load the CSV file from a URL url = "https://pythonhow.com/media/django-summernote/2024-04-19/b24fcd3d-fc39-4891-8c53-71e7393fa5ac.csv" data = pd.read_csv(url) # Display the first few rows of the dataframe print(data.head()) This code loads your CSV data into a DataFrame—a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). We provided the URL of the CSV file in the read_csv() method, but you can provide the path to a local file as well. Step 3: Basic Data Examination After loading the data, you can do many kinds of data manipulations, and statistics, and even save the manipulated data in a new CSV file. Let's try to get some basic statistics about the data of our CSV file: # Display the first 5 rows of the DataFrame print(data.head()) # Get a concise summary of the DataFrame print(data.info()) # Generate descriptive statistics print(data.describe()) These functions help you quickly inspect your data for any immediate inconsistencies or understand its structure. Step 4: Data Manipulation You can perform various data manipulations with pandas. For instance, let's say you want to filter the data to find only individuals who are older than 30: older_than_30 = data[data['Age'] > 30] print(older_than_30) Or, you might want to create a new column that categorizes individuals as either 'Young' or 'Old' based on their age: data['Category'] = data['Age'].apply(lambda x: 'Young' if x < 30 else 'Old') print(data) Step 5: Saving Processed Data After processing the data, you might want to save the modified DataFrame back to a CSV file: data.to_csv('processed_data.csv', index=False) Setting index=False ensures that the DataFrame's index (i.e., the row numbers) is not written into the file. This was a simple tutorial on using pandas to handle CSV data. Pandas offers many more functions and capabilities for data analysis, making it a staple for Python data manipulation and analysis tasks.

Practice what you just learned

Solve Python exercises and get instant AI feedback on your solutions.

Try ActiveSkill for Free →

Recommended Course

Python Mega Course: Learn Python in 60 Days, Build 20 Apps
Learn Python on Udemy completely in 60 days or less by building 20 real-world applications from web development to data science.

Buy the Course