1. Introduction to Pandas
Pandas (Panel Data + Python) is an open-source data analysis library built on top of NumPy. It provides fast, flexible, and expressive data structures for working with structured (tabular) data.
import pandas as pd # Standard import convention
import numpy as np # Often needed for array creation
| Structure | Dimensions | Analogy | Example |
|---|---|---|---|
| Series | 1-D (one column) | Single column of a spreadsheet | Student marks in one subject |
| DataFrame | 2-D (rows + columns) | Full spreadsheet or SQL table | Student records with Name, Age, Marks |
2. Pandas Series — Meaning and Structure
A Pandas Series is a one-dimensional labelled array capable of holding data of any type — integers, floats, strings, booleans, or Python objects. It has two components:
- Index: Labels for each element. If not specified, defaults to
0, 1, 2, ... - Values: The actual data stored in the Series.
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50])
print(s)
# Output:
# 0 10
# 1 20
# 2 30
# 3 40
# 4 50
# dtype: int64
The left column (0, 1, 2...) is the index; the right column is the values. The last line shows the dtype (data type).
3. Creating a Series — Four Methods
Method 1 — From a Python List
s1 = pd.Series([10, 20, 30, 40, 50])
# Default index: 0, 1, 2, 3, 4
# With custom index
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s2)
# a 10
# b 20
# c 30
# dtype: int64
Method 2 — From a Python Dictionary
Dictionary keys become the index; values become the data.
marks = pd.Series({'Maths': 95, 'Science': 88, 'English': 76})
print(marks)
# Maths 95
# Science 88
# English 76
# dtype: int64
Method 3 — From a NumPy Array (ndarray)
import numpy as np
arr = np.array([100, 200, 300])
s3 = pd.Series(arr, index=['a', 'b', 'c'])
print(s3)
# a 100
# b 200
# c 300
# dtype: int64
Method 4 — From a Scalar Value
A single value repeated for all index positions — index is mandatory when creating from scalar.
s4 = pd.Series(5, index=[0, 1, 2, 3])
print(s4)
# 0 5
# 1 5
# 2 5
# 3 5
# dtype: int64
4. Series Attributes
| Attribute | Description | Example output (for s with 5 int elements) |
|---|---|---|
s.dtype |
Data type of elements | int64 |
s.size |
Total number of elements | 5 |
s.shape |
Tuple showing dimensions | (5,) |
s.index |
The index labels of the Series | Index(['a','b','c','d','e']) |
s.values |
The data as a NumPy array | [10 20 30 40 50] |
s.name |
Name of the Series (if assigned) | None by default |
5. head() and tail() Functions
s = pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])
print(s.head(3)) # First 3 elements
# a 10
# b 20
# c 30
# dtype: int64
print(s.tail(2)) # Last 2 elements
# d 40
# e 50
# dtype: int64
print(s.head()) # Default: first 5 elements
print(s.tail()) # Default: last 5 elements
Key point: Default value of n in both head() and tail() is 5.
6. Indexing and Slicing a Series
There are three ways to access elements in a Series:
| Method | Syntax | Based on | Slice includes end? |
|---|---|---|---|
| Label-based | s['label'] or s.loc['label'] |
Index label name | Yes — end label included |
| Position-based | s.iloc[position] |
Integer position (0-based) | No — end position excluded (like Python lists) |
| Direct position | s[0], s[1:4] |
Works only with default (integer) index | Slicing excludes end |
s = pd.Series([10,20,30,40,50], index=['a','b','c','d','e'])
# Single element access
print(s['b']) # 20 (label-based)
print(s.loc['b']) # 20 (same)
print(s.iloc[1]) # 20 (position: index 1)
# Slicing — label-based (END INCLUDED)
print(s.loc['b':'d'])
# b 20
# c 30
# d 40 ← 'd' IS included
# Slicing — position-based (END EXCLUDED)
print(s.iloc[1:4])
# b 20
# c 30
# d 40 ← position 4 (e) is NOT included
⚠️ Critical Difference — Most Tested in Board Exams:
s.loc['b':'d']→ includes both 'b' and 'd' (label-based — end inclusive)s.iloc[1:4]→ includes positions 1, 2, 3 — position 4 is excluded (position-based — end exclusive)
7. Boolean Indexing
Boolean indexing filters a Series by applying a condition — only elements satisfying the condition are returned.
marks = pd.Series([45, 78, 90, 55, 88],
index=['A', 'B', 'C', 'D', 'E'])
# Get all marks greater than 60
print(marks[marks > 60])
# B 78
# C 90
# E 88
# dtype: int64
# Multiple conditions using & (and) | (or)
print(marks[(marks >= 50) & (marks <= 80)])
# B 78
# D 55
# dtype: int64
Important: Use & (not and) and | (not or) for element-wise operations on Series. Always wrap each condition in parentheses ().
8. Mathematical Operations on Series
Arithmetic Operations
Operations are applied element-wise — and aligned by index label, not position.
s1 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s2 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
print(s1 + s2) # a=11, b=22, c=33
print(s1 * s2) # a=10, b=40, c=90
print(s1 - s2) # a=9, b=18, c=27
print(s1 / s2) # a=10.0, b=10.0, c=10.0
# Scalar operations
print(s1 + 5) # a=15, b=25, c=35
print(s1 * 2) # a=20, b=40, c=60
Index Alignment — NaN for Mismatched Labels
# When indices don't match, result is NaN
s3 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s4 = pd.Series([1, 2, 3], index=['b', 'c', 'd'])
print(s3 + s4)
# a NaN ← 'a' only in s3
# b 21.0 ← 20 + 1
# c 32.0 ← 30 + 2
# d NaN ← 'd' only in s4
# dtype: float64
Statistical Methods
| Method | Description | Example (marks Series) |
|---|---|---|
s.sum() | Sum of all values | 356 |
s.mean() | Arithmetic mean | 71.2 |
s.max() | Maximum value | 90 |
s.min() | Minimum value | 45 |
s.count() | Count of non-NaN values | 5 |
s.std() | Standard deviation | 19.3... |
s.median() | Median value | 78.0 |
s.describe() | Summary statistics (count, mean, std, min, quartiles, max) | Full stats table |

