handling the missing values

Handling missing values is an essential step in data preprocessing.

Detecting Missing Values:

#Detect missing values
print(df.isnull().sum())  # Count missing values in each column

Dropping Missing Values:

Drop rows with any missing value:

df_cleaned = df.dropna()

Drop rows with missing values in a specific column:

df_clean = df.dropna(subset=['col_name'])

Drop columns with any missing values:

df_cleaned = df.dropna(axis=1)

Filling Missing Values:

df_filled = df.fillna(0)  # Replace all NaN with 0
df['col_name'].fillna(df['col_name'].mean(), inplace=True)  # Fill with column mean

Forward fill or backward fill (use the previous or next value):

df_filled = df.fillna(method='ffill')  # Forward fill
df_filled = df.fillna(method='bfill')  # Backward fill
  1. Replacing Missing Values with Interpolation:

You can use interpolation to estimate and fill missing values.

df_interpolated = df.interpolate(method=‘linear’)

  1. Replacing Missing Values with Group-based Strategies:

Sometimes it’s better to replace missing values based on groups (e.g., filling with the mean of specific groups).

df[‘column_name’] = df.groupby(‘group_column’)[‘column_name’].transform(lambda x: x.fillna(x.mean()))