1. Introduction to Pandas

Pandas (Panel Data + Python) is an open-source data analysis library built on top of NumPy. It provides fast, flexible, and expressive data structures for working with structured (tabular) data.

import pandas as pd    # Standard import convention
import numpy  as np    # Often needed for array creation
Structure Dimensions Analogy Example
Series 1-D (one column) Single column of a spreadsheet Student marks in one subject
DataFrame 2-D (rows + columns) Full spreadsheet or SQL table Student records with Name, Age, Marks

2. Pandas Series — Meaning and Structure

A Pandas Series is a one-dimensional labelled array capable of holding data of any type — integers, floats, strings, booleans, or Python objects. It has two components:

  • Index: Labels for each element. If not specified, defaults to 0, 1, 2, ...
  • Values: The actual data stored in the Series.
import pandas as pd

s = pd.Series([10, 20, 30, 40, 50])
print(s)
# Output:
# 0    10
# 1    20
# 2    30
# 3    40
# 4    50
# dtype: int64

The left column (0, 1, 2...) is the index; the right column is the values. The last line shows the dtype (data type).

3. Creating a Series — Four Methods

Method 1 — From a Python List

s1 = pd.Series([10, 20, 30, 40, 50])
# Default index: 0, 1, 2, 3, 4

# With custom index
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s2)
# a    10
# b    20
# c    30
# dtype: int64

Method 2 — From a Python Dictionary

Dictionary keys become the index; values become the data.

marks = pd.Series({'Maths': 95, 'Science': 88, 'English': 76})
print(marks)
# Maths      95
# Science    88
# English    76
# dtype: int64

Method 3 — From a NumPy Array (ndarray)

import numpy as np

arr = np.array([100, 200, 300])
s3 = pd.Series(arr, index=['a', 'b', 'c'])
print(s3)
# a    100
# b    200
# c    300
# dtype: int64

Method 4 — From a Scalar Value

A single value repeated for all index positions — index is mandatory when creating from scalar.

s4 = pd.Series(5, index=[0, 1, 2, 3])
print(s4)
# 0    5
# 1    5
# 2    5
# 3    5
# dtype: int64

4. Series Attributes

Attribute Description Example output (for s with 5 int elements)
s.dtype Data type of elements int64
s.size Total number of elements 5
s.shape Tuple showing dimensions (5,)
s.index The index labels of the Series Index(['a','b','c','d','e'])
s.values The data as a NumPy array [10 20 30 40 50]
s.name Name of the Series (if assigned) None by default

5. head() and tail() Functions

s = pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])

print(s.head(3))   # First 3 elements
# a    10
# b    20
# c    30
# dtype: int64

print(s.tail(2))   # Last 2 elements
# d    40
# e    50
# dtype: int64

print(s.head())    # Default: first 5 elements
print(s.tail())    # Default: last 5 elements

Key point: Default value of n in both head() and tail() is 5.

6. Indexing and Slicing a Series

There are three ways to access elements in a Series:

Method Syntax Based on Slice includes end?
Label-based s['label'] or s.loc['label'] Index label name Yes — end label included
Position-based s.iloc[position] Integer position (0-based) No — end position excluded (like Python lists)
Direct position s[0], s[1:4] Works only with default (integer) index Slicing excludes end
s = pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])

# Single element access
print(s['b'])          # 20  (label-based)
print(s.loc['b'])      # 20  (same)
print(s.iloc[1])       # 20  (position: index 1)

# Slicing — label-based (END INCLUDED)
print(s.loc['b':'d'])
# b    20
# c    30
# d    40   ← 'd' IS included

# Slicing — position-based (END EXCLUDED)
print(s.iloc[1:4])
# b    20
# c    30
# d    40   ← position 4 (e) is NOT included

⚠️ Critical Difference — Most Tested in Board Exams:

  • s.loc['b':'d'] → includes both 'b' and 'd' (label-based — end inclusive)
  • s.iloc[1:4] → includes positions 1, 2, 3 — position 4 is excluded (position-based — end exclusive)

7. Boolean Indexing

Boolean indexing filters a Series by applying a condition — only elements satisfying the condition are returned.

marks = pd.Series([45, 78, 90, 55, 88],
                  index=['A', 'B', 'C', 'D', 'E'])

# Get all marks greater than 60
print(marks[marks > 60])
# B    78
# C    90
# E    88
# dtype: int64

# Multiple conditions using & (and) | (or)
print(marks[(marks >= 50) & (marks <= 80)])
# B    78
# D    55
# dtype: int64

Important: Use & (not and) and | (not or) for element-wise operations on Series. Always wrap each condition in parentheses ().

8. Mathematical Operations on Series

Arithmetic Operations

Operations are applied element-wise — and aligned by index label, not position.

s1 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s2 = pd.Series([1,  2,  3],  index=['a', 'b', 'c'])

print(s1 + s2)   # a=11, b=22, c=33
print(s1 * s2)   # a=10, b=40, c=90
print(s1 - s2)   # a=9,  b=18, c=27
print(s1 / s2)   # a=10.0, b=10.0, c=10.0

# Scalar operations
print(s1 + 5)    # a=15, b=25, c=35
print(s1 * 2)    # a=20, b=40, c=60

Index Alignment — NaN for Mismatched Labels

# When indices don't match, result is NaN
s3 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s4 = pd.Series([1,  2,  3],  index=['b', 'c', 'd'])

print(s3 + s4)
# a     NaN    ← 'a' only in s3
# b    21.0   ← 20 + 1
# c    32.0   ← 30 + 2
# d     NaN   ← 'd' only in s4
# dtype: float64

Statistical Methods

Method Description Example (marks Series)
s.sum()Sum of all values356
s.mean()Arithmetic mean71.2
s.max()Maximum value90
s.min()Minimum value45
s.count()Count of non-NaN values5
s.std()Standard deviation19.3...
s.median()Median value78.0
s.describe()Summary statistics (count, mean, std, min, quartiles, max)Full stats table