Home / Python / Day 8: Libraries & APIs / Introduction to Pandas

Introduction to Pandas

Pandas is the most popular Python library for data analysis, providing the DataFrame structure for working with tabular data.

What is Pandas?

Pandas builds on NumPy and provides two main data structures: Series (a labeled 1D array) and DataFrame (a labeled 2D table, like a spreadsheet). Install with pip install pandas.

Creating DataFrames

DataFrames can be created from dictionaries, lists of lists, or by reading files such as CSV with pd.read_csv().

Exploring Data

Use .head(), .tail(), .info(), .describe(), and .shape to quickly understand a dataset.

Selecting Data

Select columns with bracket notation (df["col"]), and rows with .loc[] (label-based) or .iloc[] (position-based).

Filtering

Boolean conditions filter rows, e.g. df[df["age"] > 18].

Common Operations

Sorting with .sort_values(), grouping with .groupby(), handling missing data with .dropna() and .fillna(), and adding new columns by assignment.

Saving Data

Export a DataFrame with .to_csv() or .to_json().