Data Cleaning and Manipulation in Pandas:
Data Cleaning and Manipulation in Pandas:
- Handling missing values:
In real-world datasets, missing values are quite common. Pandas provides various functions to handle missing values, such as isna()
, fillna()
, and dropna()
. For example:
pythonimport pandas as pd
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, None, 35]}
df = pd.DataFrame(data)
df.fillna(0, inplace=True)
print(df)
This will fill all the missing values in the DataFrame with 0.
- Data filtering:
Data filtering is the process of selecting rows or columns based on some condition. Pandas provides various ways to filter data, such as using boolean indexing, query()
method, and loc[]
and iloc[]
methods. For example:
bashimport pandas as pd
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
filtered_data = df[df['Age'] > 30]
print(filtered_data)
This will filter out all the rows where the age is less than or equal to 30.
- Data transformation:
Data transformation is the process of converting data from one form to another. Pandas provides various functions to transform data, such as apply()
, map()
, and replace()
. For example:
scssimport pandas as pd
data = {'Name': ['John', 'Alice', 'Bob', 'Mary'], 'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
df['Gender'] = ['Male', 'Female', 'Male', 'Female']
df['Age'] = df['Age'].apply(lambda x: x + 10)
print(df)
This will add a new column 'Gender' to the DataFrame and increase the age of all individuals by 10.
- Data merging and joining:
Data merging and joining are the processes of combining data from different sources into a single DataFrame. Pandas provides various functions to merge and join data, such as merge()
, concat()
, and join()
. For example:
pythonimport pandas as pd
data1 = {'Name': ['John', 'Alice', 'Bob'], 'Age': [25, 30, 35]}
data2 = {'Name': ['Alice', 'Bob', 'Mary'], 'Salary': [50000, 60000, 70000]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
merged_data = pd.merge(df1, df2, on='Name')
print(merged_data)
This will merge the two DataFrames based on the 'Name' column and create a new DataFrame that contains the columns 'Name', 'Age', and 'Salary'.
Comments
Post a Comment