Introduction to Pandas in Python

Pandas is a powerful and versatile library in Python that provides data structures and functions designed to make data analysis fast and easy in Python. It's built on top of the NumPy library and integrates well with many other data science and machine learning libraries.

1. Core Components of Pandas:

  • Series: One-dimensional labeled array.
  • DataFrame: Two-dimensional labeled data structure with columns of potentially different types.

2. Installing Pandas:

pip install pandas

3. Basics:

# Importing Pandas library
import pandas as pd

# Creating a Series
s = pd.Series([1, 2, 3, 4])
print(s)

# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
print(df)

4. Loading Data:

Pandas provides functions to load various types of files, including CSV, Excel, SQL databases, and more.

# Loading a CSV file
df = pd.read_csv('path_to_file.csv')

# Loading an Excel file
df = pd.read_excel('path_to_file.xlsx')

5. Data Exploration:

# Display the first 5 rows
df.head()

# Display the last 5 rows
df.tail()

# Describe the dataset (statistics)
df.describe()

# Information about the DataFrame
df.info()

6. Indexing and Selection:

# Selecting a column
df['Name']

# Selecting multiple columns
df[['Name', 'Age']]

# Row selection using loc and iloc
df.loc[0]     # Selects the first row by label
df.iloc[0]    # Selects the first row by index position

7. Filtering:

# Filter rows based on conditions
adults = df[df['Age'] > 18]

8. Modifying DataFrames:

# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Dropping a column
df.drop('Salary', axis=1, inplace=True)

# Renaming columns
df.rename(columns={'Name': 'First Name'}, inplace=True)

9. Handling Missing Data:

# Checking for missing values
df.isnull()

# Dropping missing values
df.dropna()

# Filling missing values
df.fillna(value=0)

10. Grouping and Aggregation:

# Grouping by a column and aggregating
df.groupby('City').mean()

11. Merging, Joining, and Concatenating:

# Concatenating two DataFrames
result = pd.concat([df1, df2])

# Merging two DataFrames
merged_df = pd.merge(df1, df2, on='key_column')

# Joining two DataFrames
joined_df = df1.join(df2, how='inner')

12. Saving Data:

# Saving to a CSV file
df.to_csv('output.csv', index=False)

# Saving to an Excel file
df.to_excel('output.xlsx', index=False)

This is a very basic introduction to the powerful Pandas library. There's a lot more you can do with it, including time series analysis, pivot tables, multi-level indexing, and more.


Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects