Introduction to Pandas
Pandas is the most popular Python library for data analysis, providing the DataFrame structure for working with tabular data.
What is Pandas?
Pandas builds on NumPy and provides two main data structures: Series (a labeled 1D array) and DataFrame (a labeled 2D table, like a spreadsheet). Install with pip install pandas.
Creating DataFrames
DataFrames can be created from dictionaries, lists of lists, or by reading files such as CSV with pd.read_csv().
Exploring Data
Use .head(), .tail(), .info(), .describe(), and .shape to quickly understand a dataset.
Selecting Data
Select columns with bracket notation (df["col"]), and rows with .loc[] (label-based) or .iloc[] (position-based).
Filtering
Boolean conditions filter rows, e.g. df[df["age"] > 18].
Common Operations
Sorting with .sort_values(), grouping with .groupby(), handling missing data with .dropna() and .fillna(), and adding new columns by assignment.
Saving Data
Export a DataFrame with .to_csv() or .to_json().