My Machine Learning Journey: From NumPy to Production PyTorch
Why Machine Learning, Why Now
After years of building web applications and backend systems, I found myself increasingly curious about the systems that power recommendations, predictions, and intelligent automation. The turning point came when I realized that understanding ML wasn't just about career growth — it was about understanding the technology that's reshaping every industry I might work in.
This post documents my structured approach to learning machine learning from the ground up. Not a tutorial, but a roadmap — the decisions I made about what to learn, in what order, and why. My goal: build a portfolio of 6–10 production-quality projects while developing enough theoretical depth to read and reproduce research papers.
The Learning Philosophy
Before diving into the technical content, I want to share the principles guiding this journey:
- Build first, theory second: I learn best by doing. Mathematical intuition follows from implementing algorithms, not the other way around.
- Consistency over intensity: 30 minutes daily beats 8-hour weekend marathons. The compound effect of daily practice is real.
- Public learning: Writing about what I learn forces clarity. This blog post is as much for me as for anyone reading it.
- Production mindset: Every project should be deployable. No Jupyter notebooks left to rot — code gets structured, tested, and shipped.
The Roadmap Overview
I've structured my learning into four phases, each building on the previous. The timeline is flexible — some phases might take longer, others shorter — but the sequence matters.
Phase 1: Python Scientific Foundations
Before touching any ML library, I needed fluency in the tools that underpin everything: NumPy for numerical computing, Pandas for data manipulation, and Matplotlib/Seaborn for visualization. This isn't glamorous work, but it's foundational.
The tech stack:
- JupyterLab — Interactive development environment for experimentation
- Conda — Environment isolation (critical for reproducibility)
- NumPy — Vectorized computing, linear algebra, the backbone of everything
- Pandas — DataFrames, data cleaning, aggregation, joins
- Matplotlib + Seaborn — Visualization for EDA and communication
The key insight from this phase: vectorization is everything. Understanding why a * b on NumPy arrays is 100–500x faster than a Python loop changed how I think about numerical code. Broadcasting rules, memory layout (C vs Fortran order), and avoiding unnecessary copies — these details matter at scale.
Key exercises I worked through:
- Implementing dot products with and without NumPy to feel the performance difference
- Normalizing matrices using broadcasting (no loops allowed)
- Building gradient descent from scratch with vectorized operations
- Loading messy CSVs, handling missing data, GroupBy aggregations, multi-table joins
Phase 2: Classical Machine Learning
With the foundation in place, I moved to scikit-learn and the classical ML algorithms. The goal wasn't just to call model.fit() — it was to understand the full pipeline: data splitting, preprocessing, cross-validation, hyperparameter tuning, and evaluation.
The tech stack:
- scikit-learn — Pipelines, transformers, estimators, GridSearchCV
- XGBoost / LightGBM — Gradient boosting for tabular data
- Linear algebra refresher — Vectors, matrices, eigenvalues, SVD
- Calculus — Gradients, chain rule, backpropagation intuition
- Probability/Stats — Distributions, MLE, Bayesian thinking
The most valuable lesson here: pipelines prevent data leakage. Fitting a scaler on the full dataset before splitting? That's leakage. Cross-validation with preprocessing inside the fold? That's correct. Getting this wrong invalidates your entire evaluation.
I spent significant time on the math — not to derive every theorem, but to build intuition. Why does PCA work? Because it finds directions of maximum variance via eigendecomposition. Why does gradient descent converge? Because we're following the negative gradient of a (hopefully) convex loss surface. This intuition pays dividends when debugging models that won't train.
Models I implemented and compared:
- Logistic Regression (understand the baseline)
- Random Forest (ensemble intuition, feature importance)
- Gradient Boosting (sequential error correction)
- SVM with RBF kernel (the kernel trick)
- KNN (lazy learning, curse of dimensionality)
Phase 3: Deep Learning with PyTorch
This is where things get interesting. I chose PyTorch over TensorFlow for its Pythonic API and dynamic computation graphs — debugging feels natural, and the mental model is clearer.
The tech stack:
- PyTorch — Tensors, autograd, nn.Module, DataLoaders
- MPS backend — Apple Silicon GPU acceleration (game-changer for local training)
- torchvision — Image datasets, transforms, pretrained models
- Hugging Face Transformers — Pretrained NLP models, fine-tuning
Running on Apple Silicon's MPS backend was a revelation. Training CNNs on CIFAR-10 locally, without cloud costs, with 3–5x speedup over CPU — that's accessibility that wasn't possible a few years ago. The command to check: torch.backends.mps.is_available().
Architectures I studied and implemented:
- MLPs — The foundation. BatchNorm, Dropout, activation functions
- CNNs — Convolutions, pooling, ResNet-style skip connections
- Transfer Learning — Freezing backbones, fine-tuning classification heads
- RNNs/LSTMs — Sequential data, vanishing gradients, hidden states
- Transformers — Multi-head attention, positional encoding, the architecture that changed NLP
The breakthrough moment: implementing multi-head attention from scratch. Once you see that attention is just a weighted sum of values, with weights computed from query-key dot products, the magic disappears and understanding takes its place.
Phase 4: Production ML
Training a model is maybe 20% of the work. The rest is everything that happens before and after: data pipelines, experiment tracking, deployment, monitoring, and maintenance. This phase is about building systems, not just models.
The tech stack:
- MLflow — Experiment tracking, model registry, reproducibility
- DVC — Data versioning (Git for datasets)
- FastAPI — Model serving with async Python
- Docker — Containerization for consistent deployment
- GitHub Actions — CI/CD for ML pipelines
The insight that changed my approach: treat ML projects like software projects. That means version control for data and models, automated testing for data quality, monitoring for model drift, and proper logging for debugging production issues.
Project Structure That Works
After several false starts, I settled on a project structure that scales from exploration to production:
project/
├── notebooks/ # EDA, experimentation (numbered)
├── src/
│ ├── data/ # Dataset classes, preprocessing
│ ├── models/ # Architecture definitions
│ ├── training/ # Train loops, evaluation, callbacks
│ └── utils/ # Config, visualization
├── experiments/ # Tracked runs with configs and metrics
├── tests/ # Unit and integration tests
├── envs/ # Environment files
├── Dockerfile
└── pyproject.tomlThe key principle: notebooks are for exploration, src/ is for production. Once code works in a notebook, it gets refactored into proper modules with tests.
The Math I Actually Needed
There's a lot of gatekeeping in ML about mathematical prerequisites. Here's what I actually found useful:
Linear Algebra (essential):
- Matrix multiplication and why order matters
- Transpose, inverse, and when they exist
- Eigenvalues/eigenvectors (for PCA, understanding attention)
- SVD (dimensionality reduction, latent factors)
- Norms (L1, L2 for regularization)
Calculus (practical subset):
- Derivatives and what they mean geometrically
- Chain rule (the foundation of backpropagation)
- Gradients as direction of steepest ascent
- Numerical gradient checking (debugging tool)
Probability/Statistics (for intuition):
- Common distributions (Gaussian, Bernoulli, Poisson)
- Expectation and variance
- Maximum Likelihood Estimation (what loss functions optimize)
- Bayesian thinking (priors, posteriors, updating beliefs)
I didn't derive every formula — but I made sure I could explain intuitively why each technique works.
Resources That Actually Helped
Cutting through the noise of ML resources:
- 3Blue1Brown's Linear Algebra series — Visual intuition that textbooks lack
- CS231n (Stanford) — The definitive CNN course, lectures are gold
- CS224n (Stanford) — NLP and Transformers, rigorous but accessible
- PyTorch official tutorials — Surprisingly well-written, start here
- StatQuest (YouTube) — Josh Starmer explains stats concepts clearly
- Andrej Karpathy's blog/videos — Practical wisdom from a practitioner
What I avoided: courses that promise to make you an "ML expert in 30 days." This takes time. Accepting that made the process less frustrating.
Project Ideas by Difficulty
The portfolio I'm building, organized by complexity:
Beginner
- Titanic Survival Predictor — The classic. EDA, feature engineering, baseline models
- House Price Regression — Regression, cross-validation, handling outliers
Intermediate
- CIFAR-10 Image Classifier — CNN from scratch, then transfer learning with ResNet
- Sentiment Analysis — Fine-tuning DistilBERT on IMDB reviews
- Football Analytics — Applying ML to match data, player performance prediction
Advanced
- Stock Price Prediction — Time series, LSTMs, backtesting frameworks
- Object Detection — YOLO, mAP evaluation, real-time inference
- Paper Reproduction — Pick a seminal paper (ResNet, Attention Is All You Need), reproduce key results
Daily Habits That Compound
The system that keeps me consistent:
- Daily: 30–60 minutes of focused work (code or reading)
- Weekly: One paper read and summarized
- Weekly: Push something to GitHub (notebook, code, or documentation)
- Weekly: Update environment files (reproducibility)
- Bi-weekly: Write about what I learned (blog posts like this one)
The key: making it small enough to be sustainable. Missing a day happens — missing a week shouldn't.
Milestones I'm Tracking
Concrete goals for the next 12 months:
- Month 3: Complete 2 end-to-end projects (tabular + image classification)
- Month 6: Reproduce 1 research paper, deploy 1 model to production
- Month 9: Build a specialization project (likely NLP or computer vision)
- Month 12: Portfolio of 6–10 polished projects, contribute to an open-source ML library
What I Wish I Knew Earlier
A few lessons that would have saved time:
- Start with structured data. Tabular ML is underrated and immediately applicable. Don't rush to deep learning.
- Data quality > model complexity. A simple model on clean data beats a complex model on garbage.
- Read the scikit-learn docs. They're excellent and often answer questions faster than Stack Overflow.
- GPU access matters. Apple Silicon MPS, Google Colab, or cloud credits — don't suffer through CPU-only training.
- Version everything. Code, data, models, configs. Your future self will thank you.
The Path Forward
This roadmap isn't finished — it's a living document that evolves as I learn. The next posts in this series will dive deeper into specific projects: the CIFAR-10 classifier, the sentiment analysis pipeline, and eventually the paper reproductions.
If you're on a similar journey, I'd love to hear about it. What worked for you? What would you add to this roadmap?
The most important thing I've learned: start before you're ready. The best time to begin was yesterday. The second best time is now.
Resources: PyTorch Tutorials · scikit-learn Guide · CS231n (Stanford CV) · CS224n (Stanford NLP) · 3Blue1Brown Linear Algebra