Pandas Data Types: Understanding Effective Utilization

Here are the Pandas data types. Pandas is an efficient data analytics library, where you read the data into series or dataframes and can analyze using SQL.

Srinimf
2 min readJun 19, 2023
Pandas data types
Photo by Isaac Smith on Unsplash

The Pandas is written in C language and is a high-performance, highly efficient, and high-level data analysis library. It allows us to work with large sets of data called Series and DataFrames.

Pandas: It’s Usage in Data Analytics

  • Calculate statistics and answer questions about the data like average, median, max, and min of each column.
  • Finding correlations between columns.
  • Tracking the distribution of one or more columns
    Visualizing the data with the help of Matplotlib, using plot bars, histograms, etc.
  • Cleaning and filtering data, whether it’s missing or incomplete, just by applying a user-defined function (UDF) or built-in function.
  • Transforming tabular data into Python to work with Exporting the data into a CSV, other file, or database.
  • Feature engineer new columns that can be applied to your analysis

Pandas data types

  • Series ➤ One-dimensional labeled array capable of holding data of any type
  • DataFrame ➤ Spreadsheet
  • Axis ➤ Column or row, axis = 0 by row; axis = 1 by column
  • Record ➤ A single row
  • dtype ➤ Data type for DataFrame or series object
  • Time Series ➤ Series object that uses time intervals, like tracking weather by the hour

Pandas Dataframe: Example

The Datafrme carnation from the dictionary you can see in the example. What is the purpose of seed() method? The seed() method to customize the start number of the random number generator.

import random
import pandas as pd
random.seed(3) # generate same random numbers every time, number
# used doesn't matter
names = [ "Jess", "Jordan", "Sandy", "Ted", "B arney", "Tyler", "Rebecca" ]
ages = [ random.randint(18, 35) for x in range( len(names) )]
people = { "names" : names, "ages" : ages }
df = pd.DataFrame.from_dict(people)
print(df)

The output

names  ages
0 Jess 25
1 Jordan 35
2 Sandy 22
3 Ted 29
4 Barney 33
5 Tyler 20
6 Rebecca 18

--

--