What Python is
Why Python dominates analytics
Python vs Excel vs SQL vs BI tools
Where Python fits in real analytics workflows
When Python is the wrong tool
Typical analyst tech stack (SQL → Python → BI)
Python distributions
Why Anaconda exists
Virtual environments (conceptual)
Package managers (pip vs conda)
Versioning awareness
What a Notebook is
Kernel & execution model
Cells:
Code cells
Markdown cells
Markdown essentials:
Headers
Lists
Code blocks
Running cells out of order (danger)
Restart kernel & clear outputs
Notebook best practices for analysts
Turning notebooks into reports
Text: str
Numeric: int, float, complex
Boolean: bool
None: NoneType
Sequence: list, tuple, range
Mapping: dict
Set: set, frozenset
Binary: bytes, bytearray, memoryview (conceptual)
Mutable vs immutable objects
Why strings are immutable
Variable assignment vs object creation
Reference behavior
Shallow copy vs deep copy
Common mutation bugs analysts make
Integers
Floating point numbers
Precision & rounding issues
Type casting
Numeric operations
Creating strings
Escape characters (\\n, \\t)
Raw strings (r"")
f-strings
.format()
Unicode & encoding (conceptual)
Boolean values
Truthy vs falsy objects
Boolean expressions in conditions
None
Difference between None, 0, "", False
Why None is critical in analytics pipelines
What variable assignment means
Dynamic typing in Python
Type reassignment
Valid variable names
Reserved keywords
Readability & PEP8 basics
== vs is
Correct None checks
Common interview traps
Indexing
Slicing
Length
Concatenation
Repetition
Case methods
Strip methods
Split & join
Replace & find
Count & checks
Why strings cannot be modified
Efficient string handling patterns
Creation
Indexing & slicing
List methods
Sorting (sort vs sorted)
Copying lists
Nested lists
Performance considerations
Tuple creation
Tuple unpacking
Immutability benefits
When to prefer tuples
Key-value structure
Access patterns
Dictionary methods
Nested dictionaries
Dictionary comprehensions
JSON-like data thinking
Set creation
Uniqueness
Set operations
Membership testing
Analytics use cases
Equality & relational operators
and, or, not
Precedence
in, not in
Ternary operator
Practical usage
if, elif, else
Indentation rules
Nested conditions
Readable condition design
for loop
while loop
Loop else
break
continue
pass
What iteration means
Why loops work
Range behavior
Memory efficiency
Parallel iteration
Practical analytics usage
Index-value iteration
len, sum, min, max
any, all
sorted, reversed
type, dir, help
Syntax
Filtering
Nested comprehensions
Unique transformations
Key-value transformations
Defining functions
Parameters & arguments
Return values
Multiple returns
Positional vs keyword arguments
Default arguments
Mutable default pitfalls
args and *kwargs
Purpose
Writing clean docstrings
Syntax
Use cases
Limitations
Scope resolution
Practical implications
Import styles
Aliasing
Best practices
math
random
datetime
statistics
os
pathlib
__name__ == "__main__"
Opening files
File modes
Encoding
Context managers
csv module
Reading & writing CSVs
Why analysts must know this
Syntax vs runtime vs logical errors
try, except, else, finally
TypeError
ValueError
KeyError
IndexError
FileNotFoundError
raise
Custom messages
Reading tracebacks
Print debugging
Inspecting objects
Objects & classes
Why analysts should care
Attributes
Methods
self
Parent-child classes
Method overriding
Behavioral flexibility
Creating arrays
Data types
Vectorized operations
Mathematical operations
1D & 2D arrays
Concept & benefits
Conditional selection
What Pandas is and why analysts use it
Series vs DataFrame (real differences)
Index concept (and why it matters)
Row vs column orientation
Pandas vs Excel mindset
read_csv
read_excel
read_sql (conceptual + basic usage)
Encoding issues
Handling large files (chunksize – concept)
Writing data:
to_csv
to_excel
head, tail, sample
info
describe
Shape, columns, dtypes
Memory usage
When numbers lie in describe()
Column selection
Row slicing
Boolean filtering
loc vs iloc
Chained indexing (why it’s dangerous)
Resetting and setting index
Handling missing values:
isna, notna
fillna
dropna
Removing duplicates
Renaming columns
Type conversion:
astype
to_datetime
String operations with .str
Common data quality issues in real datasets
Creating new columns
Conditional columns (np.where / apply)
apply vs vectorized operations
map vs apply
Row-wise vs column-wise operations
Performance implications
sort_values
sort_index
Ranking:
rank
Top-N analysis
Tie handling
Split–Apply–Combine concept
Single & multiple aggregations
agg
Named aggregations
Grouping by multiple columns
GroupBy with conditions
Common GroupBy mistakes
merge
Types of joins:
inner
left
right
outer
Joining on keys vs index
concat
Append vs concat
Data mismatch problems
Parsing dates
Extracting year, month, day
Time-based filtering
Period vs timestamp
Resampling (conceptual)
Time-series pitfalls
pivot
pivot_table
melt
Wide vs long format
When reshaping is required
Category dtype
Memory & performance benefits
Ordering categories
Why analysts should care
Why loops are slow
Vectorization mindset
apply abuse
When Pandas breaks at scale
When to move to SQL / Spark
KeyError
SettingWithCopyWarning
Shape mismatch
Silent NaNs
Wrong aggregations
How to sanity-check results
Sales analysis
Customer segmentation logic
KPI calculation pipelines
Cleaning → Transforming → Aggregating → Output
Writing readable Pandas code
Intermediate variables vs chaining
Reproducibility
Notebook hygiene for analytics work
Choosing the right chart
Figure & axes
Core plots
Customization
Subplots
Saving figures
Statistical plots
Distributions
Relationships
Categorical comparisons
Heatmaps
Themes
Color palettes
Writing insights from visuals
Common visualization mistakes
break a problem into functions
write clean conditional logic
use lists/dicts properly
handle edge cases
debug without panicking
write readable code
Understanding a business problem
Defining analysis questions
Data cleaning and preparation
Performing analysis using Python
Writing insights and observations
Structuring analysis for stakeholders
Making decisions from data