condensier
and sl3
¶For a moment, we will go back to simple data structures: we have observations which are realizations of univariate random variables, $$X_1, \ldots, X_n \sim F,$$ where $F$ denotes an unknown cumulative distribution function (CDF). The goal is to estimate the distribution $F$. In particular, we are interested in estimating the density $f = F′$ , assuming that it exists.
The histogram is the oldest and most popular density estimator. We need to specify an "origin" $x_0$ and the class width $h$ for the specifications of the intervals: $$I_j =(x_0 + j \cdot h,x_0 + (j + 1) \cdot h), (j = \ldots, −1, 0, 1, \ldots)$$
where $w(x) = \frac{1}{2}, \mid X \mid \leq 1; 0, \text{otherwise}$. This is merely a simple weight function that places a rectangular box around each interval $(x - h, x + h)$.
By replaceing $w$ with a generalized smooth kernel function, we get the definition of the kernel density estimator: $$\hat{f}(x) = \frac{1}{nh} \sum_{i = 1}^{n} K \left(\frac{x - X_i}{h}\right),$$ where $$K(x) \geq 0, \int_{-\infty}^{\infty} K(x) dx = 1, K(x) = K(-x).$$
The positivity of the kernel function $K(\cdot)$ guarantees a positive density estimate $f(\cdot)$ and the normalization $K(x)dx = 1$ implies that $f(x)dx = 1$, which is necessary for $f(\cdot)$ to be a density. Typically, the kernel function $K(\cdot)$ is chosen as a probability density which is symmetric around $0$. Additionally, the smoothness of $f(\cdot)$ is inherited from the smoothness of the kernel.
In the above definition, we leave the bandwidth $h$ as a tuning parameter, which can be chosen so as to minimize an arbitrary distance metric that ensures the estimated function is optimal, given the available data. For large bandwidth $h$, the estimate $f(x)$ tends to be very slowly varying as a function of $x$, while small bandwidths will produce a more variable function estimate.
The bandwidth h is often also called the "smoothing parameter". It should be clear that for $h \to 0$, we will have spikes at every observation $X_i$, whereas $f(\cdot) = fh(\cdot)$ becomes smoother as $h$ is increasing. In the above, we use a global bandwidth, which we might choose optimally using cross-validation, but, we can also use variable bandwidths (locally changing bandwidths $h(x)$), with the general idea her being to use a large bandwidth for regions where the data is sparse. With respect to the bias-variance tradeoff: the (absolute value of the) bias of $\hat{f}$ increases and the variance of $\hat{f}$ decreases as $h$ increases.
condensier
¶First, let's load the packages we'll be using and some core project management tools.
library(usethis)
usethis::create_project(".")
library(here)
library(tidyverse)
library(data.table, quietly = TRUE)
Changing active project to lab_09 ✔ Writing a sentinel file '.here' ● Build robust paths within your project via `here::here()` ● Learn more at https://krlmlr.github.io/here/
here() starts at /Users/nimahejazi/Dropbox/UC_Berkeley-grad/teaching/2018_Spring/tlbbd-labs/lab_09 ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ── ✔ ggplot2 2.2.1 ✔ purrr 0.2.4 ✔ tibble 1.4.2 ✔ dplyr 0.7.4 ✔ tidyr 0.8.0 ✔ stringr 1.3.0 ✔ readr 1.1.1 ✔ forcats 0.3.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() Attaching package: ‘data.table’ The following objects are masked from ‘package:dplyr’: between, first, last The following object is masked from ‘package:purrr’: transpose
library(simcausal)
library(condensier)
library(sl3)
condensier The condensier package is still in beta testing. Interpret results with caution.
We begin by simulating a simple data set and illustrating a simple execution of how to use condensier
to perform conditional density estimation:
D <- DAG.empty()
D <- D + node("W1", distr = "rbern", prob = 0.5) +
node("W2", distr = "rbern", prob = 0.3) +
node("W3", distr = "rbern", prob = 0.3) +
node("A", distr = "rnorm", mean = (0.98 * W1 + 0.58 * W2 + 0.33 * W3), sd = 1)
D <- set.DAG(D, n.test = 10)
...automatically assigning order attribute to some nodes... node W1, order:1 node W2, order:2 node W3, order:3 node A, order:4
plotDAG(D, xjitter = 0.3, yjitter = 0.04, edge_attrs = list(width = 0.5, arrow.width = 0.4, arrow.size = 0.8), vertex_attrs = list(size = 12, label.cex = 0.8))
using the following vertex attributes: 120.8NAdarkbluenone0 using the following edge attributes: 0.50.40.8black1
Now that we've taken a look at the structure of the data-generating process (the DAG), let's generate some data and take a quick look at the data set:
data_O <- as.data.table(sim(D, n = 10000, rndseed = 57192))
head(data_O)
simulating observed dataset from the DAG object
ID | W1 | W2 | W3 | A |
---|---|---|---|---|
1 | 0 | 0 | 0 | 0.264097302 |
2 | 1 | 0 | 1 | 1.947282783 |
3 | 1 | 0 | 0 | 0.343522286 |
4 | 0 | 0 | 0 | -1.342266575 |
5 | 0 | 0 | 0 | 1.446429830 |
6 | 0 | 0 | 0 | -0.001315845 |
newdata <- data_O[sample(seq_len(1000)), c("W1", "W2", "W3", "A"), with = FALSE]
Now that we've generate some data, we will generate an Task
object for use with sl3
:
task <- sl3_Task$new(data_O, covariates = c("W1", "W2", "W3"),
outcome = "A")
task
A sl3 Task with 10000 obs and these nodes: $covariates [1] "W1" "W2" "W3" $outcome [1] "A" $id NULL $weights NULL $offset NULL
We'll do this same conversion for the sub-sampled testing data as well:
new_task <- sl3_Task$new(newdata, covariates = c("W1", "W2", "W3"),
outcome = "A")
sl3::sl3_list_properties()
lrn <- Lrnr_condensier$new(nbins = 10, bin_method = "equal.len",
bin_estimator = Lrnr_xgboost$new(nrounds = 50, objective = "reg:logistic"))
trained_lrn = lrn$train(task)
pred_probs = trained_lrn$predict(new_task)
head(pred_probs)
likelihood |
---|
0.1054223 |
0.3310383 |
0.3310383 |
0.1044063 |
0.1429661 |
0.1044063 |
Next, ...
lrn1 <- Lrnr_condensier$new(nbins = 25, bin_method = "equal.len",
bin_estimator = Lrnr_glm_fast$new())
lrn2 <- Lrnr_condensier$new(nbins = 35, bin_method = "equal.len",
bin_estimator = Lrnr_xgboost$new(nrounds = 50, objective = "reg:logistic"))
sl <- Lrnr_sl$new(learners = list(lrn1, lrn2),
metalearner = Lrnr_solnp_density$new())
sl_fit <- sl$train(task)
head(sl_fit$predict(new_task))
Iter: 1 fn: 17575.2858 Pars: 0.38205 0.61795 Iter: 2 fn: 17575.2858 Pars: 0.38205 0.61795 solnp--> Completed in 2 iterations density meta-learner fit: Lrnr_condensier_equal.len_25_20_FALSE_NA_FALSE_NULL 0.3820534 Lrnr_condensier_equal.len_35_20_FALSE_NA_FALSE_NULL 0.6179466
likelihood |
---|
0.15692777 |
0.32778128 |
0.29269600 |
0.13861217 |
0.21158060 |
0.04837965 |
?Lrnr_solnp_density
Lrnr_solnp_density {sl3} | R Documentation |
This meta-learner provides fitting procedures for density estimation, finding
convex combinations of candidate density estimators by minimizing the
cross-validated negative log-likelihood loss of each candidate density. The
optimization problem is solved by making use of solnp
,
using Lagrange multipliers. For further details, consult the documentation of
the Rsolnp
package.
Lrnr_solnp_density
R6Class
object.
Learner object with methods for training and prediction. See
Lrnr_base
for documentation on learners.
...
Not currently used.
Individual learners have their own sets of parameters. Below is a list of shared parameters, implemented by Lrnr_base
, and shared
by all learners.
covariates
A character vector of covariates. The learner will use this to subset the covariates for any specified task
outcome_type
A variable_type
object used to control the outcome_type used by the learner. Overrides the task outcome_type if specified
...
All other parameters should be handled by the invidual learner classes. See the documentation for the learner class you're instantiating
Other Learners: Custom_chain
,
Lrnr_HarmonicReg
, Lrnr_arima
,
Lrnr_base
, Lrnr_bilstm
,
Lrnr_condensier
, Lrnr_cv
,
Lrnr_define_interactions
,
Lrnr_expSmooth
,
Lrnr_glm_fast
, Lrnr_glmnet
,
Lrnr_glm
, Lrnr_h2o_grid
,
Lrnr_independent_binomial
,
Lrnr_lstm
, Lrnr_mean
,
Lrnr_nnls
, Lrnr_optim
,
Lrnr_pkg_SuperLearner
,
Lrnr_randomForest
,
Lrnr_rugarch
, Lrnr_sl
,
Lrnr_solnp
,
Lrnr_subset_covariates
,
Lrnr_tsDyn
, Lrnr_xgboost
,
Pipeline
, Stack
,
define_h2o_X
,
undocumented_learner
Note that bin_estimator can be also a Super-Learner object from sl3. In this case the bin hazard will be estimated by stacking several candidate estimators. For example, below, we define a single density learner lrn, with the hazard estimator defined by the Super-Learner that stacks two candidates (GLM and xgboost GBM). Note that in contrast to the above example, this Super-Learner fit will be optimized for the logistic regression problem (estimating pooled bin hazards), but still using internal 10-fold cross-validation.
lrn <- Lrnr_condensier$new(nbins = 35, bin_method = "equal.len", bin_estimator =
Lrnr_sl$new(
learners = list(
Lrnr_glm_fast$new(family = "binomial"),
Lrnr_xgboost$new(nrounds = 50, objective = "reg:logistic")
),
metalearner = Lrnr_glm$new()
))
binSL_fit <- lrn$train(task)
head(binSL_fit$predict(new_task))
likelihood |
---|
0.10960448 |
0.30398384 |
0.30398384 |
0.14400828 |
0.20825493 |
0.06986179 |
sl3
¶What learners to contribute: https://github.com/tlverse/sl3/issues/114 How do learners work: https://sl3.tlverse.org/articles/custom_lrnrs.html
Tips and best practices:
sl3
repo from https://github.com/tlverse/sl3Lrnr_ranger
)sl3::write_learner_template(here::here("R", "Lrnr_ranger.R"))
tests/testthat/test_Lrnr_ranger.R
that ensure that your new learner works as intended/expected.sl3
package.tlverse
ecosystem uses the "Tidyverse
Style Guide". Make sure to follow the formatting guidelines for your code and for pull requests. You can use the styler
R package to automatically reformat your code.