My Machine Learning Journey: From NumPy to Production PyTorch
Why Machine Learning, Why Now
I spent two years building web applications and backend systems before seriously engaging with ML. The gap between deploying a Next.js app and understanding what is inside a recommendation model bothered me enough to do something about it.
This post documents how I structured that transition: what to learn, in what order, and why that sequence matters. The goal is a portfolio of 6–10 production-quality projects with enough theoretical depth to read and reproduce research papers.
How I Approach This
A few rules I set for myself before starting:
- Build first, theory second. I learn by implementing. The math clicks after you have code that either works or fails in a way you can trace.
- Consistency beats intensity. 30 minutes daily is more reliable than 8-hour weekend sessions. Progress looks slow week-to-week and fast month-to-month.
- Write about it. Writing forces precision. This post is partly for others, mostly to hold myself to account.
- Ship things. Every project gets structured, tested, and put somewhere public. No notebooks that only run on my machine.
The Roadmap
Four phases, each building on the previous. The timeline is flexible but the sequence matters.
Phase 1: Python Scientific Foundations
Before touching any ML library, I needed fluency in NumPy, Pandas, and Matplotlib. Not the most exciting phase, but skipping it makes everything harder later.
The tech stack:
- JupyterLab — Interactive development environment for experimentation
- Conda — Environment isolation (critical for reproducibility)
- NumPy — Vectorized computing, linear algebra, the backbone of everything
- Pandas — DataFrames, data cleaning, aggregation, joins
- Matplotlib + Seaborn — Visualization for EDA and communication
The main thing this phase taught me: vectorization is not optional. Understanding why a * b on NumPy arrays is 100–500x faster than a Python loop changed how I think about numerical code. Broadcasting rules, memory layout (C vs Fortran order), avoiding unnecessary copies—these details compound at scale.
Key exercises I worked through:
- Implementing dot products with and without NumPy to feel the performance difference
- Normalizing matrices using broadcasting (no loops allowed)
- Building gradient descent from scratch with vectorized operations
- Loading messy CSVs, handling missing data, GroupBy aggregations, multi-table joins
Phase 2: Classical Machine Learning
With the foundation in place, I moved to scikit-learn and the classical ML algorithms. The goal wasn't just to call model.fit() — it was to understand the full pipeline: data splitting, preprocessing, cross-validation, hyperparameter tuning, and evaluation.
The tech stack:
- scikit-learn — Pipelines, transformers, estimators, GridSearchCV
- XGBoost / LightGBM — Gradient boosting for tabular data
- Linear algebra refresher — Vectors, matrices, eigenvalues, SVD
- Calculus — Gradients, chain rule, backpropagation intuition
- Probability/Stats — Distributions, MLE, Bayesian thinking
The most valuable lesson here: pipelines prevent data leakage. Fitting a scaler on the full dataset before splitting? That's leakage. Cross-validation with preprocessing inside the fold? That's correct. Getting this wrong invalidates your entire evaluation.
I spent significant time on the math — not to derive every theorem, but to build intuition. Why does PCA work? Because it finds directions of maximum variance via eigendecomposition. Why does gradient descent converge? Because we're following the negative gradient of a (hopefully) convex loss surface. This intuition pays dividends when debugging models that won't train.
Models I implemented and compared:
- Logistic Regression (understand the baseline)
- Random Forest (ensemble intuition, feature importance)
- Gradient Boosting (sequential error correction)
- SVM with RBF kernel (the kernel trick)
- KNN (lazy learning, curse of dimensionality)
Phase 3: Deep Learning with PyTorch
I chose PyTorch over TensorFlow for its Pythonic API and dynamic computation graphs. Debugging feels natural because you are just manipulating Python objects, not reasoning about a compiled static graph.
The tech stack:
- PyTorch — Tensors, autograd, nn.Module, DataLoaders
- MPS backend — Apple Silicon GPU acceleration (game-changer for local training)
- torchvision — Image datasets, transforms, pretrained models
- Hugging Face Transformers — Pretrained NLP models, fine-tuning
Running on Apple Silicon's MPS backend was genuinely useful. Training CNNs on CIFAR-10 locally with a 3–5x speedup over CPU, no cloud credits needed. Check availability with torch.backends.mps.is_available() before each run.
Architectures I studied and implemented:
- MLPs — The foundation. BatchNorm, Dropout, activation functions
- CNNs — Convolutions, pooling, ResNet-style skip connections
- Transfer Learning — Freezing backbones, fine-tuning classification heads
- RNNs/LSTMs — Sequential data, vanishing gradients, hidden states
- Transformers — Multi-head attention, positional encoding, the architecture that changed NLP
The most clarifying exercise: implementing multi-head attention from scratch. Attention is a weighted sum of values where the weights come from query-key dot products. Once you have that in code, the rest of transformer architecture follows directly.
Phase 4: Production ML
Training a model is maybe 20% of the work. The rest is data pipelines, experiment tracking, deployment, monitoring, and maintenance. The goal here is treating the whole thing as a software system, not a research notebook.
The tech stack:
- MLflow — Experiment tracking, model registry, reproducibility
- DVC — Data versioning (Git for datasets)
- FastAPI — Model serving with async Python
- Docker — Containerization for consistent deployment
- GitHub Actions — CI/CD for ML pipelines
The insight that changed my approach: treat ML projects like software projects. That means version control for data and models, automated testing for data quality, monitoring for model drift, and proper logging for debugging production issues.
Project Structure That Works
After several false starts, I settled on a project structure that scales from exploration to production:
project/
├── notebooks/ # EDA, experimentation (numbered)
├── src/
│ ├── data/ # Dataset classes, preprocessing
│ ├── models/ # Architecture definitions
│ ├── training/ # Train loops, evaluation, callbacks
│ └── utils/ # Config, visualization
├── experiments/ # Tracked runs with configs and metrics
├── tests/ # Unit and integration tests
├── envs/ # Environment files
├── Dockerfile
└── pyproject.tomlThe key principle: notebooks are for exploration, src/ is for production. Once code works in a notebook, it gets refactored into proper modules with tests.
The Math I Actually Needed
Most ML courses over-sell the math prerequisites. Here is what actually came up repeatedly:
Linear Algebra (essential):
- Matrix multiplication and why order matters
- Transpose, inverse, and when they exist
- Eigenvalues/eigenvectors (for PCA, understanding attention)
- SVD (dimensionality reduction, latent factors)
- Norms (L1, L2 for regularization)
Calculus (practical subset):
- Derivatives and what they mean geometrically
- Chain rule (the foundation of backpropagation)
- Gradients as direction of steepest ascent
- Numerical gradient checking (debugging tool)
Probability/Statistics (for intuition):
- Common distributions (Gaussian, Bernoulli, Poisson)
- Expectation and variance
- Maximum Likelihood Estimation (what loss functions optimize)
- Bayesian thinking (priors, posteriors, updating beliefs)
I didn't derive every formula — but I made sure I could explain intuitively why each technique works.
Resources That Actually Helped
Filtered from a lot of mediocre material:
- 3Blue1Brown's Linear Algebra series — Visual intuition that textbooks lack
- CS231n (Stanford) — The definitive CNN course, lectures are gold
- CS224n (Stanford) — NLP and Transformers, rigorous but accessible
- PyTorch official tutorials — Surprisingly well-written, start here
- StatQuest (YouTube) — Josh Starmer explains stats concepts clearly
- Andrej Karpathy's blog/videos — Practical wisdom from a practitioner
What I skipped: courses that promise to make you an "ML expert in 30 days." This takes longer than that. Accepting it early makes the rest less frustrating.
Project Ideas by Difficulty
The portfolio I'm building, organized by complexity:
Beginner
- Titanic Survival Predictor — The classic. EDA, feature engineering, baseline models
- House Price Regression — Regression, cross-validation, handling outliers
Intermediate
- CIFAR-10 Image Classifier — CNN from scratch, then transfer learning with ResNet
- Sentiment Analysis — Fine-tuning DistilBERT on IMDB reviews
- Football Analytics — Applying ML to match data, player performance prediction
Advanced
- Stock Price Prediction — Time series, LSTMs, backtesting frameworks
- Object Detection — YOLO, mAP evaluation, real-time inference
- Paper Reproduction — Pick a seminal paper (ResNet, Attention Is All You Need), reproduce key results
Daily Habits
What the actual routine looks like:
- Daily: 30–60 minutes of focused work (code or reading)
- Weekly: One paper read and summarized
- Weekly: Push something to GitHub (notebook, code, or documentation)
- Weekly: Update environment files (reproducibility)
- Bi-weekly: Write about what I learned (blog posts like this one)
The goal is something small enough to do every day. Missing a day is fine. Missing a week breaks the rhythm.
Milestones
Concrete targets for the next 12 months:
- Month 3: Complete 2 end-to-end projects (tabular + image classification)
- Month 6: Reproduce 1 research paper, deploy 1 model to production
- Month 9: Build a specialization project (likely NLP or computer vision)
- Month 12: Portfolio of 6–10 polished projects, contribute to an open-source ML library
What I Wish I Knew Earlier
Things that would have saved me time:
- Start with structured data. Tabular ML is underrated and immediately applicable. Don't rush to deep learning.
- Data quality > model complexity. A simple model on clean data beats a complex model on garbage.
- Read the scikit-learn docs. They're excellent and often answer questions faster than Stack Overflow.
- GPU access matters. Apple Silicon MPS, Google Colab, or cloud credits — don't suffer through CPU-only training.
- Version everything. Code, data, models, configs. Your future self will thank you.
The Path Forward
This roadmap will keep changing as I work through it. The next posts in this series will cover specific projects: the CIFAR-10 classifier, the sentiment analysis pipeline, and eventually the paper reproductions.
If you're at a similar point, the most practical thing I can say is: pick one dataset, build one end-to-end pipeline, and ship it. Everything else follows from that.
Resources: PyTorch Tutorials · scikit-learn Guide · CS231n (Stanford CV) · CS224n (Stanford NLP) · 3Blue1Brown Linear Algebra