Pandas library in Python:
Pandas Cheat Sheet for Data Science:
1. Importing Pandas:
import pandas as pd
2. Reading Data:
df = pd.read_csv('filename.csv') # Read from CSV
df = pd.read_excel('filename.xlsx') # Read from Excel
3. Data Exploration:
df.head() # Display first 5 rows
df.info() # Display data types and non-null counts
df.describe() # Summary statistics
4. Selection and Indexing:
df['column_name'] # Select a single column
df[['col1', 'col2']] # Select multiple columns
df.loc[row_indexer, column_indexer] # Label-based indexing
df.iloc[row_indexer, column_indexer] # Integer-location based indexing
5. Filtering Data:
df[df['column'] > value] # Conditional filtering
df[(df['col1'] > val1) & (df['col2'] == val2)] # Multiple conditions
6. Handling Missing Data:
df.dropna() # Drop rows with missing values
df.fillna(value) # Fill missing values
7. Grouping and Aggregation:
df.groupby('column').mean() # Group by and calculate mean
df.groupby(['col1', 'col2']).agg({'col3': 'sum', 'col4': 'count'}) # Custom aggregation
8. Sorting Data:
df.sort_values(by='column', ascending=False) # Sort by column in descending order
9. Data Cleaning:
df.drop_duplicates() # Remove duplicate rows
df.rename(columns={'old_name': 'new_name'}) # Rename columns
10. Exporting Data:
df.to_csv('new_file.csv', index=False) # Save to CSV
df.to_excel('new_file.xlsx', index=False) # Save to Excel
This is a concise guide, and you can explore the pandas documentation for more in-depth information.