causalsim • causalsim

An R package for defining causal data generating processes with known ground truth and evaluating estimator performance against them.

Features

Structural causal model with explicit effect, propensity, and baseline functions
Named covariate roles: confounder, instrument, effect modifier, noise
Preset confounding levels ("low", "moderate", "high") or custom functions
Exact or Monte Carlo true ATE computed at construction time
Flexible estimator interface: named numeric vector, named list, or one-row data frame
Tidy performance metrics: bias, RMSE, coverage, and power with Monte Carlo standard errors
Grid evaluation over the Cartesian product of any DGP parameters
Reproducible via seed control at every stage

Installation

Requires R 4.0 or higher. Install from GitHub:

# install.packages("devtools")
devtools::install_github("chaycereed/causalsim")

Usage

Define a data generating process

causalsim_dgp() specifies the structural model. Parameters can be scalars, preset strings, or functions of the covariate names.

library(causalsim)

dgp <- causalsim_dgp(
  n             = 500,
  n_confounders = 1,
  effect        = 2,
  propensity    = "moderate",
  baseline      = "moderate"
)
dgp

Draw a dataset

causalsim_draw() simulates one dataset from the DGP. The returned data frame includes covariate columns, treatment A, outcome Y, individual effect .tau, and propensity .p.

dat <- causalsim_draw(dgp, seed = 1L)
head(dat)

Evaluate an estimator

An estimator is any function that accepts a data frame and returns a named numeric vector with at minimum an estimate field. ci_lower and ci_upper enable coverage and power metrics.

ols_est <- function(data) {
  fit <- lm(Y ~ A + W, data = data)
  est <- coef(fit)[["A"]]
  se  <- sqrt(vcov(fit)["A", "A"])
  c(estimate = est, ci_lower = est - 1.96 * se, ci_upper = est + 1.96 * se)
}

result <- causalsim_eval(dgp, ols_est, reps = 200L, seed = 1L)
result

summary(result)
plot(result)

Evaluate across a parameter grid

causalsim_grid() runs the evaluator over the Cartesian product of any DGP parameters, returning a tidy data frame of metrics for each cell.

grid_result <- causalsim_grid(
  dgp       = dgp,
  estimator = ols_est,
  vary      = list(n = c(100L, 250L, 500L, 1000L)),
  reps      = 200L,
  metrics   = c("bias", "rmse"),
  seed      = 1L
)
grid_result

License

MIT License. See LICENSE for details.