An R package for defining causal data generating processes with known ground truth and evaluating estimator performance against them.
Features
- Structural causal model with explicit effect, propensity, and baseline functions
- Named covariate roles: confounder, instrument, effect modifier, noise
- Preset confounding levels (
"low","moderate","high") or custom functions - Exact or Monte Carlo true ATE computed at construction time
- Flexible estimator interface: named numeric vector, named list, or one-row data frame
- Tidy performance metrics: bias, RMSE, coverage, and power with Monte Carlo standard errors
- Grid evaluation over the Cartesian product of any DGP parameters
- Reproducible via seed control at every stage
Installation
Requires R 4.0 or higher. Install from GitHub:
# install.packages("devtools")
devtools::install_github("chaycereed/causalsim")Usage
Define a data generating process
causalsim_dgp() specifies the structural model. Parameters can be scalars, preset strings, or functions of the covariate names.
library(causalsim)
dgp <- causalsim_dgp(
n = 500,
n_confounders = 1,
effect = 2,
propensity = "moderate",
baseline = "moderate"
)
dgpDraw a dataset
causalsim_draw() simulates one dataset from the DGP. The returned data frame includes covariate columns, treatment A, outcome Y, individual effect .tau, and propensity .p.
dat <- causalsim_draw(dgp, seed = 1L)
head(dat)Evaluate an estimator
An estimator is any function that accepts a data frame and returns a named numeric vector with at minimum an estimate field. ci_lower and ci_upper enable coverage and power metrics.
Evaluate across a parameter grid
causalsim_grid() runs the evaluator over the Cartesian product of any DGP parameters, returning a tidy data frame of metrics for each cell.
grid_result <- causalsim_grid(
dgp = dgp,
estimator = ols_est,
vary = list(n = c(100L, 250L, 500L, 1000L)),
reps = 200L,
metrics = c("bias", "rmse"),
seed = 1L
)
grid_result