Data Science
-

How Pandas chunking, Dask, and Polars help process millions of records when adding more compute…
9 min read -

How Far Can Classical NLP Go? From Bag-of-Words to Stacking on Spooky Author Identification
Machine LearningAn end-to-end classical NLP experiment on Kaggle’s Spooky Author Identification task: from Vowpal Wabbit and…
17 min read -

The tools I use for analytics and reporting have changed more than I expected, yet…
9 min read -

A concrete bias–variance lesson: why the smallest model had the best cross-validated fit, and how…
10 min read -

Why memorizing for the exam doesn’t mean you understand the subject
10 min read -

Beyond the Straight Line: Choosing Between OLS, Interaction Terms, and Tweedie Regression
Data ScienceWhether you should stick to a classic Ordinary Least Squares regression, introduce interaction terms, or…
14 min read -

One Month Into Learning Data Engineering in Public: Here’s What I Didn’t Write About
Data EngineeringA reflection on the first month of learning data engineering in public, and what actually…
8 min read -

Turning model coefficients into a 0–1000 score, with risk classes and stability checks
7 min read -

Activation patching reveals how facts are stored, routed, and read out across transformer layers, and…
9 min read -

Why one-hot encoding isn’t always the best approach, and alternative encodings
21 min read