Data Types and Structures in Pandas

May 02, 2023

In Pandas, there are three primary data structures: Series, DataFrame, and Index.

Series:

A series is a one-dimensional array that is capable of holding any data type such as integers, strings, floating-point numbers, or Python objects. A series is similar to a column in a spreadsheet. It is defined using the pd.Series() function in Pandas.

Example:

kotlin
import pandas as pd 
data = [10, 20, 30, 40, 50] 
s = pd.Series(data) 
print(s)

Output:

go
0  10 
1  20 
2  30 
3  40 
4  50 
dtype: int64

DataFrame:

A DataFrame is a two-dimensional table that is capable of holding heterogeneous data types such as integers, strings, floating-point numbers, or Python objects. It is similar to a spreadsheet or an SQL table. A DataFrame is defined using the pd.DataFrame() function in Pandas.

Example:

kotlin
import pandas as pd 
data = {'Name': ['John', 'Mike', 'Sarah', 'Jasmine'],
        'Age': [25, 30, 27, 29],
        'Gender': ['Male', 'Male', 'Female', 'Female']}
df = pd.DataFrame(data) 
print(df)

Output:

markdown
 Name   Age  Gender 
0     John    25  Male 
1     Mike    30  Male 
2    Sarah   27  Female 
3  Jasmine   29  Female

Index:

An index is an immutable array-like object that is used to label the rows and columns in a Pandas DataFrame. By default, it starts from 0 and goes up to n-1, where n is the number of rows in the DataFrame.

Example:

kotlin
import pandas as pd 
data = {'Name': ['John', 'Mike', 'Sarah', 'Jasmine'],
        'Age': [25, 30, 27, 29],
        'Gender': ['Male', 'Male', 'Female', 'Female']} 
df = pd.DataFrame(data, index=['row1', 'row2', 'row3', 'row4']) 
print(df)

Output:

markdown
    Name    Age  Gender 
row1     John    25   Male 
row2     Mike    30   Male 
row3    Sarah   27   Female 
row4  Jasmine   29   Female

In the above example, we have assigned custom row labels using the index parameter of the pd.DataFrame() function.

Search This Blog

Module Title: From VBA to Pandas: A Comprehensive Guide for Data Analysts