pandas | TNPSC Fuhrer Notes

Pandas library in Python:

Pandas Cheat Sheet for Data Science:

1. Importing Pandas:

import pandas as pd

2. Reading Data:

df = pd.read_csv('filename.csv')  # Read from CSV
df = pd.read_excel('filename.xlsx')  # Read from Excel

3. Data Exploration:

df.head()  # Display first 5 rows
df.info()  # Display data types and non-null counts
df.describe()  # Summary statistics

4. Selection and Indexing:

df['column_name']  # Select a single column
df[['col1', 'col2']]  # Select multiple columns
df.loc[row_indexer, column_indexer]  # Label-based indexing
df.iloc[row_indexer, column_indexer]  # Integer-location based indexing

5. Filtering Data:

df[df['column'] > value]  # Conditional filtering
df[(df['col1'] > val1) & (df['col2'] == val2)]  # Multiple conditions

6. Handling Missing Data:

df.dropna()  # Drop rows with missing values
df.fillna(value)  # Fill missing values

7. Grouping and Aggregation:

df.groupby('column').mean()  # Group by and calculate mean
df.groupby(['col1', 'col2']).agg({'col3': 'sum', 'col4': 'count'})  # Custom aggregation

8. Sorting Data:

df.sort_values(by='column', ascending=False)  # Sort by column in descending order

9. Data Cleaning:

df.drop_duplicates()  # Remove duplicate rows
df.rename(columns={'old_name': 'new_name'})  # Rename columns

10. Exporting Data:

df.to_csv('new_file.csv', index=False)  # Save to CSV
df.to_excel('new_file.xlsx', index=False)  # Save to Excel

This is a concise guide, and you can explore the pandas documentation for more in-depth information.