Data Cleaning and Manipulation in Pandas:

 Data Cleaning and Manipulation in Pandas:

  1. Handling missing values:

In real-world datasets, missing values are quite common. Pandas provides various functions to handle missing values, such as isna(), fillna(), and dropna().

For example:

import pandas as pd 
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, None, 35]} 
df = pd.DataFrame(data) 
df.fillna(0, inplace=True

This will fill all the missing values in the DataFrame with 0.

  1. Data filtering:

Data filtering is the process of selecting rows or columns based on some condition. Pandas provides various ways to filter data, such as using boolean indexing, query() method, and loc[] and iloc[] methods.

For example:

import pandas as pd 
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, 35, 40]} 
df = pd.DataFrame(data) 
filtered_data = df[df['Age'] > 30] 

This will filter out all the rows where the age is less than or equal to 30.

  1. Data transformation:

Data transformation is the process of converting data from one form to another. Pandas provides various functions to transform data, such as apply(), map(), and replace().

For example:

import pandas as pd 
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, 35, 40]} 
df = pd.DataFrame(data) 
df['Gender'] = ['Male', 'Female', 'Male', 'Female'] 
df['Age'] = df['Age'].apply(lambda x: x + 10

This will add a new column 'Gender' to the DataFrame and increase the age of all individuals by 10.

  1. Data merging and joining:

Data merging and joining are the processes of combining data from different sources into a single DataFrame. Pandas provides various functions to merge and join data, such as merge(), concat(), and join().

For example:

import pandas as pd 
data1 = {'Name': ['John', 'Alice', 'Bob'], 'Age': [25, 30, 35]} data2 = {'Name': ['Alice', 'Bob', 'Mary'], 'Salary': [50000, 60000, 70000]} 
df1 = pd.DataFrame(data1) 
df2 = pd.DataFrame(data2) 
merged_data = pd.merge(df1, df2, on='Name'

This will merge the two DataFrames based on the 'Name' column and create a new DataFrame that contains the columns 'Name', 'Age', and 'Salary'.


Popular posts from this blog

How to use the statsmodels library in Python to calculate Exponential Smoothing

K-means Clustering 3D Plot Swiss roll Dataset

How to detect Credit Card Fraud Using Python Pandas