2025

Causal Machine Learning
14 August 2025
It's difficult to measure the causal effect of actions or interventions. Here, I present three different causal machine learning frameworks that we can use to estimate the effect of a treatment--in order of the simplest to the most complex.
SHAP Values and Model Explainability
08 August 2025
An introduction to SHAP values and how they can be used to understand machine learning model predictions. Using a small climbing competition example, I show how SHAP differs from feature importance, how to read common SHAP plots, and a few things to watch out for when interpreting the results.
Multilevel (or Hierarchical, or Mixed) Modelling
27 July 2025
The one in which the author writes a surprisingly long introduction to multilevel modelling (with math and all) and then proceeds to show how to fit one in both R and Julia with frequentist and Bayesian methods, respectively, despite not being very proficient in either language (because every company she has ever worked at has been python-houses).
One-dimensional smoothing methods: LOESS and LOWESS
19 June 2025
If data is noisy, or we want to capture an underlying trend without assuming a fixed global shape, we can use smoothing methods. Here, I describe one such method and briefly discuss its usage in more complex cases than single-variable smoothing.
Metropolis-Hastings Algorithm
10 May 2025
An introduction to the Metropolis-Hastings algorithm, one of the core methods behind Bayesian inference via MCMC. I explore how it works, why it works, and how to implement it — with references for further learning.
Maximum a Posteriori and Regularization
28 April 2025
I derive the maximum a posteriori (MAP) estimator and show how, under different prior choices, MAP estimation leads naturally to L2-regularization (Ridge regression) with a normal prior and L1-regularization (LASSO regression) with a Laplace prior.
Bayesian Linear Regression
24 April 2025
I provide a complete Bayesian analysis for a standard linear regression model with a Normal-inverse-gamma prior, demonstrating how the conjugate structure yields closed-form expressions for both the posterior distribution and the posterior predictive.
Conjugate Prior for the Univariate Normal Distribution
16 March 2025
I show that the Normal-inverse-gamma distribution is a conjugate prior to the normal distribution with unknown mean and variance.
Bayesian Inference
07 March 2025
I give an introduction to Bayesian inference and Bayes' Theorem, and derive the posterior of the binomial model under a Beta prior, showing that the Beta prior is conjugate in the process.

2024

ROC Curve vs Precision-Recall Curve for Imbalanced Datasets
13 July 2024
I explain the difference between ROC and precision-recall curves, highlighting their use in evaluating classifiers with imbalanced data.
Creating a Custom Time-Series Cross-Validator in PySpark
02 July 2024
I show how to extend PySpark’s built-in cross-validation to support time-series data, with custom folds and windowing strategies.