import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace=True)
df.fillna(df.mean(), inplace=True)
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
df = df[~((df < (Q1 - 1.5 * IQR)) | (df > (Q3 + 1.5 * IQR))).any(axis=1)]
df = df.drop(['column1', 'column2'], axis=1)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
df['column1'] = df['column1'].str.strip()
df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')
merged_df = pd.merge(df1, df2, on='column')
These are just a few examples of the data cleaning methods that can be implemented using Python. There are many more methods and libraries available for data cleaning and preprocessing in Python, depending on the specific requirements of the dataset.
Advertisement
Advertisement