Title: | Policy Learning |
---|---|
Description: | Package for evaluating user-specified finite stage policies and learning optimal treatment policies via doubly robust loss functions. Policy learning methods include doubly robust learning of the blip/conditional average treatment effect and sequential policy tree learning. The package also include methods for optimal subgroup analysis. See Nordland and Holst (2022) <doi:10.48550/arXiv.2212.02335> for documentation and references. |
Authors: | Andreas Nordland [aut, cre],
Klaus Holst [aut] |
Maintainer: | Andreas Nordland <[email protected]> |
License: | Apache License (>= 2) |
Version: | 1.5.1 |
Built: | 2025-03-07 12:54:05 UTC |
Source: | https://github.com/andreasnordland/polle |
conditional()
is used to calculate the
policy value for each group defined by a given baseline variable.
conditional(object, policy_data, baseline)
conditional(object, policy_data, baseline)
object |
Policy evaluation object created by |
policy_data |
Policy data object created by |
baseline |
Character string. |
object of inherited class 'estimate', see lava::estimate.default. The object is a list with elements 'coef' (policy value estimate for each group) and 'IC' (influence curve estimate matrix).
library("polle") library("data.table") setDTthreads(1) d <- sim_single_stage(n=2e3) pd <- policy_data(d, action = "A", baseline = c("B"), covariates = c("Z","L"), utility = "U") # static policy: p <- policy_def(1) pe <- policy_eval(pd, policy = p) # conditional value for each group defined by B conditional(pe, pd, "B")
library("polle") library("data.table") setDTthreads(1) d <- sim_single_stage(n=2e3) pd <- policy_data(d, action = "A", baseline = c("B"), covariates = c("Z","L"), utility = "U") # static policy: p <- policy_def(1) pe <- policy_eval(pd, policy = p) # conditional value for each group defined by B conditional(pe, pd, "B")
control_blip
sets the default control arguments
for doubly robust blip-learning, type = "blip"
.
control_blip(blip_models = q_glm(~.))
control_blip(blip_models = q_glm(~.))
blip_models |
Single element or list of V-restricted blip-models created
by |
list of (default) control arguments.
control_drql
sets the default control arguments
for doubly robust Q-learning, type = "drql"
.
control_drql(qv_models = q_glm(~.))
control_drql(qv_models = q_glm(~.))
qv_models |
Single element or list of V-restricted Q-models created
by |
list of (default) control arguments.
control_earl
sets the default control arguments
for efficient augmentation and relaxation learning , type = "earl"
.
The arguments are passed directly to DynTxRegime::earl()
if not
specified otherwise.
control_earl( moPropen, moMain, moCont, regime, iter = 0L, fSet = NULL, lambdas = 0.5, cvFolds = 0L, surrogate = "hinge", kernel = "linear", kparam = NULL, verbose = 0L )
control_earl( moPropen, moMain, moCont, regime, iter = 0L, fSet = NULL, lambdas = 0.5, cvFolds = 0L, surrogate = "hinge", kernel = "linear", kparam = NULL, verbose = 0L )
moPropen |
Propensity model of class "ModelObj", see modelObj::modelObj. |
moMain |
Main effects outcome model of class "ModelObj". |
moCont |
Contrast outcome model of class "ModelObj". |
regime |
An object of class formula specifying the design of the policy/regime. |
iter |
Maximum number of iterations for outcome regression. |
fSet |
A function or NULL defining subset structure. |
lambdas |
Numeric or numeric vector. Penalty parameter. |
cvFolds |
Integer. Number of folds for cross-validation of the parameters. |
surrogate |
The surrogate 0-1 loss function. The options are
|
kernel |
The options are |
kparam |
Numeric. Kernel parameter |
verbose |
Integer. |
list of (default) control arguments.
control_owl()
sets the default control arguments
for backwards outcome weighted learning, type = "owl"
.
The arguments are passed directly to DTRlearn2::owl()
if not
specified otherwise.
control_owl( policy_vars = NULL, reuse_scales = TRUE, res.lasso = TRUE, loss = "hinge", kernel = "linear", augment = FALSE, c = 2^(-2:2), sigma = c(0.03, 0.05, 0.07), s = 2^(-2:2), m = 4 )
control_owl( policy_vars = NULL, reuse_scales = TRUE, res.lasso = TRUE, loss = "hinge", kernel = "linear", augment = FALSE, c = 2^(-2:2), sigma = c(0.03, 0.05, 0.07), s = 2^(-2:2), m = 4 )
policy_vars |
Character vector/string or list of character
vectors/strings. Variable names used to restrict the policy.
The names must be a subset of the history names, see get_history_names().
Not passed to |
reuse_scales |
The history matrix passed to |
res.lasso |
If |
loss |
Loss function. The options are |
kernel |
Type of kernel used by the support vector machine. The
options are |
augment |
If |
c |
Regularization parameter. |
sigma |
Tuning parameter. |
s |
Slope parameter. |
m |
Number of folds for cross-validation of the parameters. |
list of (default) control arguments.
control_ptl
sets the default control arguments
for doubly robust policy tree learning, type = "ptl"
.
The arguments are passed directly to policytree::policy_tree()
(or
policytree::hybrid_policy_tree()
) if not specified otherwise.
control_ptl( policy_vars = NULL, hybrid = FALSE, depth = 2, search.depth = 2, split.step = 1, min.node.size = 1 )
control_ptl( policy_vars = NULL, hybrid = FALSE, depth = 2, search.depth = 2, split.step = 1, min.node.size = 1 )
policy_vars |
Character vector/string or list of character
vectors/strings. Variable names used to
construct the V-restricted policy tree.
The names must be a subset of the history names, see get_history_names().
Not passed to |
hybrid |
If |
depth |
Integer or integer vector. The depth of the fitted policy tree for each stage. |
search.depth |
(only used if |
split.step |
Integer or integer vector. The number of possible splits to consider when performing policy tree search at each stage. |
min.node.size |
Integer or integer vector. The smallest terminal node size permitted at each stage. |
list of (default) control arguments.
control_rwl
sets the default control arguments
for residual learning , type = "rwl"
.
The arguments are passed directly to DynTxRegime::rwl()
if not
specified otherwise.
control_rwl( moPropen, moMain, regime, fSet = NULL, lambdas = 2, cvFolds = 0L, kernel = "linear", kparam = NULL, responseType = "continuous", verbose = 2L )
control_rwl( moPropen, moMain, regime, fSet = NULL, lambdas = 2, cvFolds = 0L, kernel = "linear", kparam = NULL, responseType = "continuous", verbose = 2L )
moPropen |
Propensity model of class "ModelObj", see modelObj::modelObj. |
moMain |
Main effects outcome model of class "ModelObj". |
regime |
An object of class formula specifying the design of the policy/regime. |
fSet |
A function or NULL defining subset structure. |
lambdas |
Numeric or numeric vector. Penalty parameter. |
cvFolds |
Integer. Number of folds for cross-validation of the parameters.
|
kernel |
The options are |
kparam |
Numeric. Kernel parameter |
responseType |
Character string. Options are |
verbose |
Integer. |
list of (default) control arguments.
Objects of class policy_data contains elements of
class data.table::data.table.
data.table
provide functions that operate on objects by reference.
Thus, the policy_data
object is not copied when modified by reference,
see examples. An explicit copy can be made by copy_policy_data
. The
function is a wrapper of data.table::copy()
.
copy_policy_data(object)
copy_policy_data(object)
object |
Object of class policy_data. |
Object of class policy_data.
library("polle") ### Single stage case: Wide data d1 <- sim_single_stage(5e2, seed=1) head(d1, 5) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=c("Z", "B", "L"), utility="U") pd1 # True copy pd2 <- copy_policy_data(pd1) # manipulating the data.table by reference: pd2$baseline_data[, id := id + 1] head(pd2$baseline_data$id - pd1$baseline_data$id) # False copy pd2 <- pd1 # manipulating the data.table by reference: pd2$baseline_data[, id := id + 1] head(pd2$baseline_data$id - pd1$baseline_data$id)
library("polle") ### Single stage case: Wide data d1 <- sim_single_stage(5e2, seed=1) head(d1, 5) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=c("Z", "B", "L"), utility="U") pd1 # True copy pd2 <- copy_policy_data(pd1) # manipulating the data.table by reference: pd2$baseline_data[, id := id + 1] head(pd2$baseline_data$id - pd1$baseline_data$id) # False copy pd2 <- pd1 # manipulating the data.table by reference: pd2$baseline_data[, id := id + 1] head(pd2$baseline_data$id - pd1$baseline_data$id)
fit_g_functions
is used to fit a list of g-models.
fit_g_functions(policy_data, g_models, full_history = FALSE)
fit_g_functions(policy_data, g_models, full_history = FALSE)
policy_data |
Policy data object created by |
g_models |
List of action probability models/g-models for each stage
created by |
full_history |
If TRUE, the full history is used to fit each g-model. If FALSE, the single stage/"Markov type" history is used to fit each g-model. |
library("polle") ### Simulating two-stage policy data d <- sim_two_stage(2e3, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # fitting a single g-model across all stages: g_functions <- fit_g_functions(policy_data = pd, g_models = g_glm(), full_history = FALSE) g_functions # fitting a g-model for each stage: g_functions <- fit_g_functions(policy_data = pd, g_models = list(g_glm(), g_glm()), full_history = TRUE) g_functions
library("polle") ### Simulating two-stage policy data d <- sim_two_stage(2e3, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # fitting a single g-model across all stages: g_functions <- fit_g_functions(policy_data = pd, g_models = g_glm(), full_history = FALSE) g_functions # fitting a g-model for each stage: g_functions <- fit_g_functions(policy_data = pd, g_models = list(g_glm(), g_glm()), full_history = TRUE) g_functions
Use g_glm()
, g_empir()
,
g_glmnet()
, g_rf()
, g_sl()
, g_xgboost
to construct
an action probability model/g-model object.
The constructors are used as input for policy_eval()
and policy_learn()
.
g_empir(formula = ~1, ...) g_glm( formula = ~., family = "binomial", model = FALSE, na.action = na.pass, ... ) g_glmnet(formula = ~., family = "binomial", alpha = 1, s = "lambda.min", ...) g_rf( formula = ~., num.trees = c(500), mtry = NULL, cv_args = list(nfolds = 5, rep = 1), ... ) g_sl( formula = ~., SL.library = c("SL.mean", "SL.glm"), family = binomial(), env = parent.frame(), onlySL = TRUE, ... ) g_xgboost( formula = ~., objective = "binary:logistic", params = list(), nrounds, max_depth = 6, eta = 0.3, nthread = 1, cv_args = list(nfolds = 3, rep = 1) )
g_empir(formula = ~1, ...) g_glm( formula = ~., family = "binomial", model = FALSE, na.action = na.pass, ... ) g_glmnet(formula = ~., family = "binomial", alpha = 1, s = "lambda.min", ...) g_rf( formula = ~., num.trees = c(500), mtry = NULL, cv_args = list(nfolds = 5, rep = 1), ... ) g_sl( formula = ~., SL.library = c("SL.mean", "SL.glm"), family = binomial(), env = parent.frame(), onlySL = TRUE, ... ) g_xgboost( formula = ~., objective = "binary:logistic", params = list(), nrounds, max_depth = 6, eta = 0.3, nthread = 1, cv_args = list(nfolds = 3, rep = 1) )
formula |
An object of class formula specifying the design matrix for
the propensity model/g-model. Use |
... |
Additional arguments passed to |
family |
A description of the error distribution and link function to be used in the model. |
model |
(Only used by |
na.action |
(Only used by |
alpha |
(Only used by |
s |
(Only used by |
num.trees |
(Only used by |
mtry |
(Only used by |
cv_args |
(Only used by |
SL.library |
(Only used by |
env |
(Only used by |
onlySL |
(Only used by |
objective |
(Only used by |
params |
(Only used by |
nrounds |
(Only used by |
max_depth |
(Only used by |
eta |
(Only used by |
nthread |
(Only used by |
g_glm()
is a wrapper of glm()
(generalized linear model).g_empir()
calculates the empirical probabilities within the groups
defined by the formula.g_glmnet()
is a wrapper of glmnet::glmnet()
(generalized linear model via
penalized maximum likelihood).g_rf()
is a wrapper of ranger::ranger()
(random forest).
When multiple hyper-parameters are given, the
model with the lowest cross-validation error is selected.g_sl()
is a wrapper of SuperLearner::SuperLearner (ensemble model).g_xgboost()
is a wrapper of xgboost::xgboost.
g-model object: function with arguments 'A' (action vector), 'H' (history matrix) and 'action_set'.
get_history_names()
, get_g_functions()
.
library("polle") ### Two stages: d <- sim_two_stage(2e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # available state history variable names: get_history_names(pd) # defining a g-model: g_model <- g_glm(formula = ~B+C) # evaluating the static policy (A=1) using inverse propensity weighting # based on a state glm model across all stages: pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = g_model) # inspecting the fitted g-model: get_g_functions(pe) # available full history variable names at each stage: get_history_names(pd, stage = 1) get_history_names(pd, stage = 2) # evaluating the same policy based on a full history # glm model for each stage: pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = list(g_glm(~ L_1 + B), g_glm(~ A_1 + L_2 + B)), g_full_history = TRUE) # inspecting the fitted g-models: get_g_functions(pe)
library("polle") ### Two stages: d <- sim_two_stage(2e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # available state history variable names: get_history_names(pd) # defining a g-model: g_model <- g_glm(formula = ~B+C) # evaluating the static policy (A=1) using inverse propensity weighting # based on a state glm model across all stages: pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = g_model) # inspecting the fitted g-model: get_g_functions(pe) # available full history variable names at each stage: get_history_names(pd, stage = 1) get_history_names(pd, stage = 2) # evaluating the same policy based on a full history # glm model for each stage: pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = list(g_glm(~ L_1 + B), g_glm(~ A_1 + L_2 + B)), g_full_history = TRUE) # inspecting the fitted g-models: get_g_functions(pe)
get_action_set
returns the action set, i.e., the possible
actions at each stage for the policy data object.
get_action_set(object)
get_action_set(object)
object |
Object of class policy_data. |
Character vector.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the actions set: get_action_set(pd)
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the actions set: get_action_set(pd)
get_actions
returns the actions at every stage for every observation
in the policy data object.
get_actions(object)
get_actions(object)
object |
Object of class policy_data. |
data.table::data.table with keys id and stage and character variable A.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the actions: head(get_actions(pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the actions: head(get_actions(pd))
get_g_functions()
returns a list of (fitted) g-functions
associated with each stage.
get_g_functions(object)
get_g_functions(object)
object |
Object of class policy_eval or policy_object. |
List of class nuisance_functions.
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # evaluating the static policy a=1 using inverse propensity weighting # based on a GLM model at each stage pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE, name = "A=1"), g_models = list(g_glm(), g_glm())) pe # getting the g-functions g_functions <- get_g_functions(pe) g_functions # getting the fitted g-function values head(predict(g_functions, pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # evaluating the static policy a=1 using inverse propensity weighting # based on a GLM model at each stage pe <- policy_eval(type = "ipw", policy_data = pd, policy = policy_def(1, reuse = TRUE, name = "A=1"), g_models = list(g_glm(), g_glm())) pe # getting the g-functions g_functions <- get_g_functions(pe) g_functions # getting the fitted g-function values head(predict(g_functions, pd))
get_history_names()
returns the state covariate names of the history data
table for a given stage. The function is useful when specifying
the design matrix for g_model and q_model objects.
get_history_names(object, stage)
get_history_names(object, stage)
object |
Policy data object created by |
stage |
Stage number. If NULL, the state/Markov-type history variable names are returned. |
Character vector.
library("polle") ### Multiple stages: d3 <- sim_multi_stage(5e2, seed = 1) pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 # state/Markov type history variable names (H): get_history_names(pd3) # full history variable names (H_k) at stage 2: get_history_names(pd3, stage = 2)
library("polle") ### Multiple stages: d3 <- sim_multi_stage(5e2, seed = 1) pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 # state/Markov type history variable names (H): get_history_names(pd3) # full history variable names (H_k) at stage 2: get_history_names(pd3, stage = 2)
get_id
returns the ID for every observation in the policy data object.
get_id(object)
get_id(object)
object |
Object of class policy_data or history. |
Character vector.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the IDs: head(get_id(pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the IDs: head(get_id(pd))
get_id
returns the stages for every ID for every observation in the policy data object.
get_id_stage(object)
get_id_stage(object)
object |
Object of class policy_data or history. |
data.table::data.table with keys id and stage.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the IDs and stages: head(get_id_stage(pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the IDs and stages: head(get_id_stage(pd))
get_K
returns the maximal number of stages for the observations in
the policy data object.
get_K(object)
get_K(object)
object |
Object of class policy_data. |
Integer.
d <- sim_multi_stage(5e2, seed = 1) pd <- policy_data(data = d$stage_data, baseline_data = d$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd # getting the maximal number of stages: get_K(pd)
d <- sim_multi_stage(5e2, seed = 1) pd <- policy_data(data = d$stage_data, baseline_data = d$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd # getting the maximal number of stages: get_K(pd)
get_n
returns the number of observations in
the policy data object.
get_n(object)
get_n(object)
object |
Object of class policy_data. |
Integer.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the number of observations: get_n(pd)
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the number of observations: get_n(pd)
get_policy
extracts the policy from a policy object
or a policy evaluation object The policy is a function which take a
policy data object as input and returns the policy actions.
get_policy(object, threshold = NULL)
get_policy(object, threshold = NULL)
object |
Object of class policy_object or policy_eval. |
threshold |
Numeric vector. Thresholds for the first stage policy function. |
function of class policy.
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("BB"), covariates = list( L = c("L_1", "L_2"), C = c("C_1", "C_2") ), utility = c("U_1", "U_2", "U_3") ) pd ### V-restricted (Doubly Robust) Q-learning # specifying the learner: pl <- policy_learn( type = "drql", control = control_drql(qv_models = q_glm(formula = ~C)) ) # fitting the policy (object): po <- pl( policy_data = pd, q_models = q_glm(), g_models = g_glm() ) # getting and applying the policy: head(get_policy(po)(pd)) # the policy learner can also be evaluated directly: pe <- policy_eval( policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm() ) # getting and applying the policy again: head(get_policy(pe)(pd))
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("BB"), covariates = list( L = c("L_1", "L_2"), C = c("C_1", "C_2") ), utility = c("U_1", "U_2", "U_3") ) pd ### V-restricted (Doubly Robust) Q-learning # specifying the learner: pl <- policy_learn( type = "drql", control = control_drql(qv_models = q_glm(formula = ~C)) ) # fitting the policy (object): po <- pl( policy_data = pd, q_models = q_glm(), g_models = g_glm() ) # getting and applying the policy: head(get_policy(po)(pd)) # the policy learner can also be evaluated directly: pe <- policy_eval( policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm() ) # getting and applying the policy again: head(get_policy(pe)(pd))
get_policy_actions()
extract the actions dictated by the
(learned and possibly cross-fitted) policy a every stage.
get_policy_actions(object)
get_policy_actions(object)
object |
Object of class policy_eval. |
data.table::data.table with keys id
and stage
and action variable
d
.
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # defining a policy learner based on cross-fitted doubly robust Q-learning: pl <- policy_learn(type = "drql", control = control_drql(qv_models = list(q_glm(~C_1), q_glm(~C_1+C_2))), full_history = TRUE, L = 2) # number of folds for cross-fitting # evaluating the policy learner using 2-fold cross fitting: pe <- policy_eval(type = "dr", policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm(), M = 2) # number of folds for cross-fitting # Getting the cross-fitted actions dictated by the fitted policy: head(get_policy_actions(pe))
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # defining a policy learner based on cross-fitted doubly robust Q-learning: pl <- policy_learn(type = "drql", control = control_drql(qv_models = list(q_glm(~C_1), q_glm(~C_1+C_2))), full_history = TRUE, L = 2) # number of folds for cross-fitting # evaluating the policy learner using 2-fold cross fitting: pe <- policy_eval(type = "dr", policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm(), M = 2) # number of folds for cross-fitting # Getting the cross-fitted actions dictated by the fitted policy: head(get_policy_actions(pe))
get_policy_functions()
returns a function defining the policy at
the given stage. get_policy_functions()
is useful when implementing
the learned policy.
## S3 method for class 'blip' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... ) ## S3 method for class 'drql' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... ) get_policy_functions(object, stage, threshold, ...) ## S3 method for class 'ptl' get_policy_functions(object, stage, threshold = NULL, ...) ## S3 method for class 'ql' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... )
## S3 method for class 'blip' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... ) ## S3 method for class 'drql' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... ) get_policy_functions(object, stage, threshold, ...) ## S3 method for class 'ptl' get_policy_functions(object, stage, threshold = NULL, ...) ## S3 method for class 'ql' get_policy_functions( object, stage, threshold = NULL, include_g_values = FALSE, ... )
object |
Object of class "policy_object" or "policy_eval", see policy_learn and policy_eval. |
stage |
Integer. Stage number. |
threshold |
Numeric, threshold for not choosing the reference action at stage 1. |
include_g_values |
If TRUE, the g-values are included as an attribute. |
... |
Additional arguments. |
Functions with arguments:
H
data.table::data.table containing the variables needed to evaluate the policy (and g-function).
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = "BB", covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd ### Realistic V-restricted Policy Tree Learning # specifying the learner: pl <- policy_learn(type = "ptl", control = control_ptl(policy_vars = list(c("C_1", "BB"), c("L_1", "BB"))), full_history = TRUE, alpha = 0.05) # evaluating the learner: pe <- policy_eval(policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm()) # getting the policy function at stage 2: pf2 <- get_policy_functions(pe, stage = 2) args(pf2) # applying the policy function to new data: set.seed(1) L_1 <- rnorm(n = 10) new_H <- data.frame(C = rnorm(n = 10), L = L_1, L_1 = L_1, BB = "group1") d2 <- pf2(H = new_H) head(d2)
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = "BB", covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd ### Realistic V-restricted Policy Tree Learning # specifying the learner: pl <- policy_learn(type = "ptl", control = control_ptl(policy_vars = list(c("C_1", "BB"), c("L_1", "BB"))), full_history = TRUE, alpha = 0.05) # evaluating the learner: pe <- policy_eval(policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm()) # getting the policy function at stage 2: pf2 <- get_policy_functions(pe, stage = 2) args(pf2) # applying the policy function to new data: set.seed(1) L_1 <- rnorm(n = 10) new_H <- data.frame(C = rnorm(n = 10), L = L_1, L_1 = L_1, BB = "group1") d2 <- pf2(H = new_H) head(d2)
Extract the fitted policy object.
get_policy_object(object)
get_policy_object(object)
object |
Object of class policy_eval. |
Object of class policy_object.
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # evaluating the policy: pe1 <- policy_eval(policy_data = pd1, policy_learn = policy_learn(type = "drql", control = control_drql(qv_models = q_glm(~.))), g_models = g_glm(), q_models = q_glm()) # extracting the policy object: get_policy_object(pe1)
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # evaluating the policy: pe1 <- policy_eval(policy_data = pd1, policy_learn = policy_learn(type = "drql", control = control_drql(qv_models = q_glm(~.))), g_models = g_glm(), q_models = q_glm()) # extracting the policy object: get_policy_object(pe1)
get_q_functions()
returns a list of (fitted) Q-functions
associated with each stage.
get_q_functions(object)
get_q_functions(object)
object |
Object of class policy_eval or policy_object. |
List of class nuisance_functions.
### Two stages: d <- sim_two_stage(5e2, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list( L = c("L_1", "L_2"), C = c("C_1", "C_2") ), utility = c("U_1", "U_2", "U_3") ) pd # evaluating the static policy a=1 using outcome regression # based on a GLM model at each stage. pe <- policy_eval( type = "or", policy_data = pd, policy = policy_def(1, reuse = TRUE, name = "A=1"), q_models = list(q_glm(), q_glm()) ) pe # getting the Q-functions q_functions <- get_q_functions(pe) # getting the fitted g-function values head(predict(q_functions, pd))
### Two stages: d <- sim_two_stage(5e2, seed = 1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list( L = c("L_1", "L_2"), C = c("C_1", "C_2") ), utility = c("U_1", "U_2", "U_3") ) pd # evaluating the static policy a=1 using outcome regression # based on a GLM model at each stage. pe <- policy_eval( type = "or", policy_data = pd, policy = policy_def(1, reuse = TRUE, name = "A=1"), q_models = list(q_glm(), q_glm()) ) pe # getting the Q-functions q_functions <- get_q_functions(pe) # getting the fitted g-function values head(predict(q_functions, pd))
get_stage_action_sets
returns the action sets at each stage, i.e.,
the possible actions at each stage for the policy data object.
get_stage_action_sets(object)
get_stage_action_sets(object)
object |
Object of class policy_data. |
List of character vectors.
### Two stages: d <- sim_two_stage_multi_actions(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the stage actions set: get_stage_action_sets(pd)
### Two stages: d <- sim_two_stage_multi_actions(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the stage actions set: get_stage_action_sets(pd)
get_utility()
returns the utility, i.e., the sum of the rewards,
for every observation in the policy data object.
get_utility(object)
get_utility(object)
object |
Object of class policy_data. |
data.table::data.table with key id and numeric variable U.
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the utility: head(get_utility(pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # getting the utility: head(get_utility(pd))
get_history
summarizes the history and action at a given stage from a
policy_data object.
get_history(object, stage = NULL, full_history = FALSE)
get_history(object, stage = NULL, full_history = FALSE)
object |
Object of class policy_data. |
stage |
Stage number. If NULL, the state/Markov-type history across all stages is returned. |
full_history |
Logical. If TRUE, the full history is returned If FALSE, only the state/Markov-type history is returned. |
Each observation has the sequential form
for a possibly stochastic number of stages K.
is a vector of baseline covariates.
is the reward at stage k (not influenced by the action
).
is a vector of state covariates summarizing the state at stage k.
is the categorical action at stage k.
Object of class history. The object is a list containing the following elements:
H |
data.table::data.table with keys id and stage and with variables
{ |
A |
data.table::data.table with keys id and stage and variable |
action_name |
Name of the action variable in |
action_set |
Sorted character vector defining the action set. |
U |
(If |
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # In the single stage case, set stage = NULL h1 <- get_history(pd1) head(h1$H) head(h1$A) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd2 <- policy_data(d2, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # getting the state/Markov-type history across all stages: h2 <- get_history(pd2) head(h2$H) head(h2$A) # getting the full history at stage 2: h2 <- get_history(pd2, stage = 2, full_history = TRUE) head(h2$H) head(h2$A) head(h2$U) # getting the state/Markov-type history at stage 2: h2 <- get_history(pd2, stage = 2, full_history = FALSE) head(h2$H) head(h2$A) ### Multiple stages d3 <- sim_multi_stage(5e2, seed = 1) # constructing policy_data object: pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 # getting the full history at stage 2: h3 <- get_history(pd3, stage = 2, full_history = TRUE) head(h3$H) # note that not all observations have two stages: nrow(h3$H) # number of observations with two stages. get_n(pd3) # number of observations in total.
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # In the single stage case, set stage = NULL h1 <- get_history(pd1) head(h1$H) head(h1$A) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) # constructing policy_data object: pd2 <- policy_data(d2, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # getting the state/Markov-type history across all stages: h2 <- get_history(pd2) head(h2$H) head(h2$A) # getting the full history at stage 2: h2 <- get_history(pd2, stage = 2, full_history = TRUE) head(h2$H) head(h2$A) head(h2$U) # getting the state/Markov-type history at stage 2: h2 <- get_history(pd2, stage = 2, full_history = FALSE) head(h2$H) head(h2$A) ### Multiple stages d3 <- sim_multi_stage(5e2, seed = 1) # constructing policy_data object: pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 # getting the full history at stage 2: h3 <- get_history(pd3, stage = 2, full_history = TRUE) head(h3$H) # note that not all observations have two stages: nrow(h3$H) # number of observations with two stages. get_n(pd3) # number of observations in total.
The fitted g-functions and Q-functions are stored in an object of class "nuisance_functions". The object is a list with a fitted model object for every stage. Information on whether the full history or the state/Markov-type history is stored as an attribute ("full_history").
The following S3 generic functions are available for an object of class
nuisance_functions
:
predict
Predict the values of the g- or Q-functions based on a policy_data object.
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # evaluating the static policy a=1: pe <- policy_eval(policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = g_glm(), q_models = q_glm()) # getting the fitted g-functions: (g_functions <- get_g_functions(pe)) # getting the fitted Q-functions: (q_functions <- get_q_functions(pe)) # getting the fitted values: head(predict(g_functions, pd)) head(predict(q_functions, pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd # evaluating the static policy a=1: pe <- policy_eval(policy_data = pd, policy = policy_def(1, reuse = TRUE), g_models = g_glm(), q_models = q_glm()) # getting the fitted g-functions: (g_functions <- get_g_functions(pe)) # getting the fitted Q-functions: (q_functions <- get_q_functions(pe)) # getting the fitted values: head(predict(g_functions, pd)) head(predict(q_functions, pd))
partial
creates a partial policy data object by trimming
the maximum number of stages in the policy data object to a fixed
given number.
partial(object, K)
partial(object, K)
object |
Object of class policy_data. |
K |
Maximum number of stages. |
Object of class policy_data.
library("polle") ### Multiple stage case d <- sim_multi_stage(5e2, seed = 1) # constructing policy_data object: pd <- policy_data(data = d$stage_data, baseline_data = d$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd # Creating a partial policy data object with 3 stages pd3 <- partial(pd, K = 3) pd3
library("polle") ### Multiple stage case d <- sim_multi_stage(5e2, seed = 1) # constructing policy_data object: pd <- policy_data(data = d$stage_data, baseline_data = d$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd # Creating a partial policy data object with 3 stages pd3 <- partial(pd, K = 3) pd3
Plot policy data for given policies
## S3 method for class 'policy_data' plot( x, policy = NULL, which = c(1), stage = 1, history_variables = NULL, jitter = 0.05, ... )
## S3 method for class 'policy_data' plot( x, policy = NULL, which = c(1), stage = 1, history_variables = NULL, jitter = 0.05, ... )
x |
Object of class policy_data |
policy |
An object or list of objects of class policy |
which |
A subset of the numbers 1:2
|
stage |
Stage number for plot 2 |
history_variables |
character vector of length 2 for plot 2 |
jitter |
numeric |
... |
Additional arguments |
library("polle") library("data.table") setDTthreads(1) d3 <- sim_multi_stage(2e2, seed = 1) pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") # specifying two static policies: p0 <- policy_def(c(1,1,0,0), name = "p0") p1 <- policy_def(c(1,0,0,0), name = "p1") plot(pd3) plot(pd3, policy = list(p0, p1)) # learning and plotting a policy: pe3 <- policy_eval(pd3, policy_learn = policy_learn(), q_models = q_glm(formula = ~t + X + X_lead)) plot(pd3, list(get_policy(pe3), p0)) # plotting the recommended actions at a specific stage: plot(pd3, get_policy(pe3), which = 2, stage = 2, history_variables = c("t","X"))
library("polle") library("data.table") setDTthreads(1) d3 <- sim_multi_stage(2e2, seed = 1) pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") # specifying two static policies: p0 <- policy_def(c(1,1,0,0), name = "p0") p1 <- policy_def(c(1,0,0,0), name = "p1") plot(pd3) plot(pd3, policy = list(p0, p1)) # learning and plotting a policy: pe3 <- policy_eval(pd3, policy_learn = policy_learn(), q_models = q_glm(formula = ~t + X + X_lead)) plot(pd3, list(get_policy(pe3), p0)) # plotting the recommended actions at a specific stage: plot(pd3, get_policy(pe3), which = 2, stage = 2, history_variables = c("t","X"))
policy_eval
objectPlot histogram of the influence curve for a policy_eval
object
## S3 method for class 'policy_eval' plot(x, ...)
## S3 method for class 'policy_eval' plot(x, ...)
x |
Object of class policy_eval |
... |
Additional arguments |
d <- sim_two_stage(2e3, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = "BB", covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pe <- policy_eval(pd, policy_learn = policy_learn()) plot(pe)
d <- sim_two_stage(2e3, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = "BB", covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pe <- policy_eval(pd, policy_learn = policy_learn()) plot(pe)
A function of inherited class "policy" takes a policy data object as input and returns the policy actions for every observation for every (observed) stage.
A policy can either be defined directly by the user using policy_def or a policy can be fitted using policy_learn (or policy_eval). policy_learn returns a policy_object from which the policy can be extracted using get_policy.
data.table::data.table with keys id
and stage
and
action variable d
.
The following S3 generic functions are available for an object of class
policy
:
print
Baisc print function
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) # defining a dynamic policy: p <- policy_def( function(L) (L>0)*1, reuse = TRUE ) p head(p(pd), 5) # V-restricted (Doubly Robust) Q-learning: # specifying the learner: pl <- policy_learn(type = "drql", control = control_drql(qv_models = q_glm(formula = ~ C))) # fitting the policy (object): po <- pl(policy_data = pd, q_models = q_glm(), g_models = g_glm()) p <- get_policy(po) p head(p(pd))
### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) # defining a dynamic policy: p <- policy_def( function(L) (L>0)*1, reuse = TRUE ) p head(p(pd), 5) # V-restricted (Doubly Robust) Q-learning: # specifying the learner: pl <- policy_learn(type = "drql", control = control_drql(qv_models = q_glm(formula = ~ C))) # fitting the policy (object): po <- pl(policy_data = pd, q_models = q_glm(), g_models = g_glm()) p <- get_policy(po) p head(p(pd))
policy_data()
creates a policy data object which
is used as input to policy_eval()
and policy_learn()
for policy
evaluation and data adaptive policy learning.
policy_data( data, baseline_data, type = "wide", action, covariates, utility, baseline = NULL, deterministic_rewards = NULL, id = NULL, stage = NULL, event = NULL, action_set = NULL, verbose = FALSE ) ## S3 method for class 'policy_data' print(x, digits = 2, ...) ## S3 method for class 'policy_data' summary(object, probs = seq(0, 1, 0.25), ...)
policy_data( data, baseline_data, type = "wide", action, covariates, utility, baseline = NULL, deterministic_rewards = NULL, id = NULL, stage = NULL, event = NULL, action_set = NULL, verbose = FALSE ) ## S3 method for class 'policy_data' print(x, digits = 2, ...) ## S3 method for class 'policy_data' summary(object, probs = seq(0, 1, 0.25), ...)
data |
data.frame or data.table::data.table; see Examples. |
baseline_data |
data.frame or data.table::data.table; see Examples. |
type |
Character string. If "wide", |
action |
Action variable name(s). Character vector or character string.
|
covariates |
Stage specific covariate name(s). Character vector or named list of character vectors.
|
utility |
Utility/Reward variable name(s). Character string or vector.
|
baseline |
Baseline covariate name(s). Character vector. |
deterministic_rewards |
Deterministic reward variable name(s). Named list of character vectors of length K. The name of each element must be on the form "U_Aa" where "a" corresponds to an action in the action set. |
id |
ID variable name. Character string. |
stage |
Stage number variable name. |
event |
Event indicator name. |
action_set |
Character string. Action set across all stages. |
verbose |
Logical. If TRUE, formatting comments are printed to the console. |
x |
Object to be printed. |
digits |
Minimum number of digits to be printed. |
... |
Additional arguments passed to print. |
object |
Object of class policy_data |
probs |
numeric vector (probabilities) |
Each observation has the sequential form
for a possibly stochastic number of stages K.
is a vector of baseline covariates.
is the reward at stage k (not influenced by the action
).
is a vector of state covariates summarizing the state at stage k.
is the categorical action at stage k.
The utility is given by the sum of the rewards, i.e.,
.
policy_data()
returns an object of class "policy_data".
The object is a list containing the following elements:
stage_data |
data.table::data.table containing the id, stage number, event
indicator, action ( |
baseline_data |
data.table::data.table containing the id and baseline
covariates ( |
colnames |
List containing the state covariate names, baseline covariate names, and the deterministic reward variable names. |
action_set |
Sorted character vector describing the action set, i.e., the possible actions at all stages. |
stage_action_sets |
List of sorted character vectors describing the observed actions at each stage. |
dim |
List containing the number of observations (n) and the number of stages (K). |
The following S3 generic functions are available for an object of
class policy_data
:
partial()
Trim the maximum number
of stages in a policy_data
object.
subset_id()
Subset a a policy_data
object on ID.
get_history()
Summarize the history and action at a given stage.
get_history_names()
Get history variable names.
get_actions()
Get the action at every stage.
get_utility()
Get the utility.
plot()
Plot method.
policy_eval()
, policy_learn()
, copy_policy_data()
library("polle") ### Single stage: Wide data d1 <- sim_single_stage(n = 5e2, seed=1) head(d1, 5) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=c("Z", "B", "L"), utility="U") pd1 # associated S3 methods: methods(class = "policy_data") head(get_actions(pd1), 5) head(get_utility(pd1), 5) head(get_history(pd1)$H, 5) ### Two stage: Wide data d2 <- sim_two_stage(5e2, seed=1) head(d2, 5) # constructing policy_data object: pd2 <- policy_data(d2, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 head(get_history(pd2, stage = 2)$H, 5) # state/Markov type history and action, (H_k,A_k). head(get_history(pd2, stage = 2, full_history = TRUE)$H, 5) # Full history and action, (H_k,A_k). ### Multiple stages: Long data d3 <- sim_multi_stage(5e2, seed = 1) head(d3$stage_data, 10) # constructing policy_data object: pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 head(get_history(pd3, stage = 3)$H, 5) # state/Markov type history and action, (H_k,A_k). head(get_history(pd3, stage = 2, full_history = TRUE)$H, 5) # Full history and action, (H_k,A_k).
library("polle") ### Single stage: Wide data d1 <- sim_single_stage(n = 5e2, seed=1) head(d1, 5) # constructing policy_data object: pd1 <- policy_data(d1, action="A", covariates=c("Z", "B", "L"), utility="U") pd1 # associated S3 methods: methods(class = "policy_data") head(get_actions(pd1), 5) head(get_utility(pd1), 5) head(get_history(pd1)$H, 5) ### Two stage: Wide data d2 <- sim_two_stage(5e2, seed=1) head(d2, 5) # constructing policy_data object: pd2 <- policy_data(d2, action = c("A_1", "A_2"), baseline = c("B"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 head(get_history(pd2, stage = 2)$H, 5) # state/Markov type history and action, (H_k,A_k). head(get_history(pd2, stage = 2, full_history = TRUE)$H, 5) # Full history and action, (H_k,A_k). ### Multiple stages: Long data d3 <- sim_multi_stage(5e2, seed = 1) head(d3$stage_data, 10) # constructing policy_data object: pd3 <- policy_data(data = d3$stage_data, baseline_data = d3$baseline_data, type = "long", id = "id", stage = "stage", event = "event", action = "A", utility = "U") pd3 head(get_history(pd3, stage = 3)$H, 5) # state/Markov type history and action, (H_k,A_k). head(get_history(pd3, stage = 2, full_history = TRUE)$H, 5) # Full history and action, (H_k,A_k).
policy_def
returns a function of class policy.
The function input is a policy_data object and it returns a data.table::data.table
with keys id
and stage
and action variable d
.
policy_def(policy_functions, full_history = FALSE, reuse = FALSE, name = NULL)
policy_def(policy_functions, full_history = FALSE, reuse = FALSE, name = NULL)
policy_functions |
A single function/character string or a list of functions/character strings. The list must have the same length as the number of stages. |
full_history |
If TRUE, the full history at each stage is used as input to the policy functions. |
reuse |
If TRUE, the policy function is reused at every stage. |
name |
Character string. |
Function of class "policy"
. The function takes a
policy_data object as input and returns a data.table::data.table
with keys id
and stage
and action variable d
.
get_history_names()
, get_history()
.
library("polle") ### Single stage" d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # defining a static policy (A=1): p1_static <- policy_def(1) # applying the policy: p1_static(pd1) # defining a dynamic policy: p1_dynamic <- policy_def( function(Z, L) ((3*Z + 1*L -2.5)>0)*1 ) p1_dynamic(pd1) ### Two stages: d2 <- sim_two_stage(5e2, seed = 1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) # defining a static policy (A=0): p2_static <- policy_def(0, reuse = TRUE) p2_static(pd2) # defining a reused dynamic policy: p2_dynamic_reuse <- policy_def( function(L) (L > 0)*1, reuse = TRUE ) p2_dynamic_reuse(pd2) # defining a dynamic policy for each stage based on the full history: # available variable names at each stage: get_history_names(pd2, stage = 1) get_history_names(pd2, stage = 2) p2_dynamic <- policy_def( policy_functions = list( function(L_1) (L_1 > 0)*1, function(L_1, L_2) (L_1 + L_2 > 0)*1 ), full_history = TRUE ) p2_dynamic(pd2)
library("polle") ### Single stage" d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # defining a static policy (A=1): p1_static <- policy_def(1) # applying the policy: p1_static(pd1) # defining a dynamic policy: p1_dynamic <- policy_def( function(Z, L) ((3*Z + 1*L -2.5)>0)*1 ) p1_dynamic(pd1) ### Two stages: d2 <- sim_two_stage(5e2, seed = 1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) # defining a static policy (A=0): p2_static <- policy_def(0, reuse = TRUE) p2_static(pd2) # defining a reused dynamic policy: p2_dynamic_reuse <- policy_def( function(L) (L > 0)*1, reuse = TRUE ) p2_dynamic_reuse(pd2) # defining a dynamic policy for each stage based on the full history: # available variable names at each stage: get_history_names(pd2, stage = 1) get_history_names(pd2, stage = 2) p2_dynamic <- policy_def( policy_functions = list( function(L_1) (L_1 > 0)*1, function(L_1, L_2) (L_1 + L_2 > 0)*1 ), full_history = TRUE ) p2_dynamic(pd2)
policy_eval()
is used to estimate
the value of a given fixed policy
or a data adaptive policy (e.g. a policy
learned from the data). policy_eval()
is also used to estimate the average
treatment effect among the subjects who would
get the treatment under the policy.
policy_eval( policy_data, policy = NULL, policy_learn = NULL, g_functions = NULL, g_models = g_glm(), g_full_history = FALSE, save_g_functions = TRUE, q_functions = NULL, q_models = q_glm(), q_full_history = FALSE, save_q_functions = TRUE, target = "value", type = "dr", cross_fit_type = "pooled", variance_type = "pooled", M = 1, future_args = list(future.seed = TRUE), name = NULL ) ## S3 method for class 'policy_eval' coef(object, ...) ## S3 method for class 'policy_eval' IC(x, ...) ## S3 method for class 'policy_eval' vcov(object, ...) ## S3 method for class 'policy_eval' print( x, digits = 4L, width = 35L, std.error = TRUE, level = 0.95, p.value = TRUE, ... ) ## S3 method for class 'policy_eval' summary(object, ...) ## S3 method for class 'policy_eval' estimate( x, labels = get_element(x, "name", check_name = FALSE), level = 0.95, ... ) ## S3 method for class 'policy_eval' merge(x, y, ..., paired = TRUE) ## S3 method for class 'policy_eval' x + ...
policy_eval( policy_data, policy = NULL, policy_learn = NULL, g_functions = NULL, g_models = g_glm(), g_full_history = FALSE, save_g_functions = TRUE, q_functions = NULL, q_models = q_glm(), q_full_history = FALSE, save_q_functions = TRUE, target = "value", type = "dr", cross_fit_type = "pooled", variance_type = "pooled", M = 1, future_args = list(future.seed = TRUE), name = NULL ) ## S3 method for class 'policy_eval' coef(object, ...) ## S3 method for class 'policy_eval' IC(x, ...) ## S3 method for class 'policy_eval' vcov(object, ...) ## S3 method for class 'policy_eval' print( x, digits = 4L, width = 35L, std.error = TRUE, level = 0.95, p.value = TRUE, ... ) ## S3 method for class 'policy_eval' summary(object, ...) ## S3 method for class 'policy_eval' estimate( x, labels = get_element(x, "name", check_name = FALSE), level = 0.95, ... ) ## S3 method for class 'policy_eval' merge(x, y, ..., paired = TRUE) ## S3 method for class 'policy_eval' x + ...
policy_data |
Policy data object created by |
policy |
Policy object created by |
policy_learn |
Policy learner object created by |
g_functions |
Fitted g-model objects, see nuisance_functions.
Preferably, use |
g_models |
List of action probability models/g-models for each stage
created by |
g_full_history |
If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model. |
save_g_functions |
If TRUE, the fitted g-functions are saved. |
q_functions |
Fitted Q-model objects, see nuisance_functions.
Only valid if the Q-functions are fitted using the same policy.
Preferably, use |
q_models |
Outcome regression models/Q-models created by
|
q_full_history |
Similar to g_full_history. |
save_q_functions |
Similar to save_g_functions. |
target |
Character string. Either "value" or "subgroup". If "value",
the target parameter is the policy value.
If "subgroup", the target parameter
is the average treatement effect among
the subgroup of subjects that would receive
treatment under the policy, see details.
"subgroup" is only implemented for |
type |
Character string. Type of evaluation. Either |
cross_fit_type |
Character string.
Either "stacked", or "pooled", see details. (Only used if |
variance_type |
Character string. Either "pooled" (default),
"stacked" or "complete", see details. (Only used if |
M |
Number of folds for cross-fitting. |
future_args |
Arguments passed to |
name |
Character string. |
object , x , y
|
Objects of class "policy_eval". |
... |
Additional arguments. |
digits |
Integer. Number of printed digits. |
width |
Integer. Width of printed parameter name. |
std.error |
Logical. Should the std.error be printed. |
level |
Numeric. Level of confidence limits. |
p.value |
Logical. Should the p.value for associated confidence level be printed. |
labels |
Name(s) of the estimate(s). |
paired |
|
Each observation has the sequential form
for a possibly stochastic number of stages K.
is a vector of baseline covariates.
is the reward at stage k
(not influenced by the action
).
is a vector of state
covariates summarizing the state at stage k.
is the categorical action
within the action set
at stage k.
The utility is given by the sum of the rewards, i.e.,
.
A policy is a set of functions
where for
maps
into the
action set.
Recursively define the Q-models (q_models
):
If q_full_history = TRUE
,
, and if
q_full_history = FALSE
, .
The g-models (g_models
) are defined as
If g_full_history = TRUE
,
, and if
g_full_history = FALSE
, .
Furthermore, if
g_full_history = FALSE
and g_models
is a
single model, it is assumed that .
If target = "value"
and type = "or"
policy_eval()
returns the empirical estimate of
the value (coef
):
If target = "value"
and type = "ipw"
policy_eval()
returns the empirical estimates of
the value (coef
) and influence curve (IC
):
If target = "value"
and
type = "dr"
policy_eval
returns the empirical estimates of
the value (coef
) and influence curve (IC
):
where
If target = "subgroup"
, type = "dr"
, K = 1
,
and ,
policy_eval()
returns the empirical estimates of the subgroup average
treatment effect (coef
) and influence curve (IC
):
Applying -fold cross-fitting using the {M} argument, let
If target = "subgroup"
, type = "dr"
, K = 1
,
, and
cross_fit_type = "pooled"
,
policy_eval()
returns the estimate
If
cross_fit_type = "stacked"
the returned estimate is
where for ease of notation we let
the integer be the number of oberservations in each fold.
policy_eval()
returns an object of class "policy_eval".
The object is a list containing the following elements:
coef |
Numeric vector. The estimated target parameter: policy value or subgroup average treatment effect. |
IC |
Numeric matrix. Estimated influence curve associated with
|
type |
Character string. The type of evaluation ("dr", "ipw", "or"). |
target |
Character string. The target parameter ("value" or "subgroup") |
id |
Character vector. The IDs of the observations. |
name |
Character vector. Names for the each element in |
coef_ipw |
(only if |
coef_or |
(only if |
policy_actions |
data.table::data.table with keys id and stage. Actions associated with the policy for every observation and stage. |
policy_object |
(only if |
g_functions |
(only if |
g_values |
The fitted g-function values. |
q_functions |
(only if |
q_values |
The fitted Q-function values. |
Z |
(only if |
subgroup_indicator |
(only if |
cross_fits |
(only if |
folds |
(only if |
cross_fit_type |
Character string. |
variance_type |
Character string. |
The following S3 generic functions are available for an object of
class policy_eval
:
get_g_functions()
Extract the fitted g-functions.
get_q_functions()
Extract the fitted Q-functions.
get_policy()
Extract the fitted policy object.
get_policy_functions()
Extract the fitted policy function for a given stage.
get_policy_actions()
Extract the (fitted) policy actions.
plot.policy_eval()
Plot diagnostics.
van der Laan, Mark J., and Alexander R. Luedtke.
"Targeted learning of the mean outcome under an optimal dynamic treatment rule."
Journal of causal inference 3.1 (2015): 61-95.
doi:10.1515/jci-2013-0022
Tsiatis, Anastasios A., et al. Dynamic
treatment regimes: Statistical methods for precision medicine. Chapman and
Hall/CRC, 2019. doi:10.1201/9780429192692.
Victor Chernozhukov, Denis
Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey,
James Robins, Double/debiased machine learning for treatment and structural
parameters, The Econometrics Journal, Volume 21, Issue 1, 1 February 2018,
Pages C1–C68, doi:10.1111/ectj.12097.
lava::IC, lava::estimate.default.
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action = "A", covariates = list("Z", "B", "L"), utility = "U") pd1 # defining a static policy (A=1): pl1 <- policy_def(1) # evaluating the policy: pe1 <- policy_eval(policy_data = pd1, policy = pl1, g_models = g_glm(), q_models = q_glm(), name = "A=1 (glm)") # summarizing the estimated value of the policy: # (equivalent to summary(pe1)): pe1 coef(pe1) # value coefficient sqrt(vcov(pe1)) # value standard error # getting the g-function and Q-function values: head(predict(get_g_functions(pe1), pd1)) head(predict(get_q_functions(pe1), pd1)) # getting the fitted influence curve (IC) for the value: head(IC(pe1)) # evaluating the policy using random forest nuisance models: set.seed(1) pe1_rf <- policy_eval(policy_data = pd1, policy = pl1, g_models = g_rf(), q_models = q_rf(), name = "A=1 (rf)") # merging the two estimates (equivalent to pe1 + pe1_rf): (est1 <- merge(pe1, pe1_rf)) coef(est1) head(IC(est1)) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # defining a policy learner based on cross-fitted doubly robust Q-learning: pl2 <- policy_learn( type = "drql", control = control_drql(qv_models = list(q_glm(~C_1), q_glm(~C_1+C_2))), full_history = TRUE, L = 2) # number of folds for cross-fitting # evaluating the policy learner using 2-fold cross fitting: pe2 <- policy_eval(type = "dr", policy_data = pd2, policy_learn = pl2, q_models = q_glm(), g_models = g_glm(), M = 2, # number of folds for cross-fitting name = "drql") # summarizing the estimated value of the policy: pe2 # getting the cross-fitted policy actions: head(get_policy_actions(pe2))
library("polle") ### Single stage: d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action = "A", covariates = list("Z", "B", "L"), utility = "U") pd1 # defining a static policy (A=1): pl1 <- policy_def(1) # evaluating the policy: pe1 <- policy_eval(policy_data = pd1, policy = pl1, g_models = g_glm(), q_models = q_glm(), name = "A=1 (glm)") # summarizing the estimated value of the policy: # (equivalent to summary(pe1)): pe1 coef(pe1) # value coefficient sqrt(vcov(pe1)) # value standard error # getting the g-function and Q-function values: head(predict(get_g_functions(pe1), pd1)) head(predict(get_q_functions(pe1), pd1)) # getting the fitted influence curve (IC) for the value: head(IC(pe1)) # evaluating the policy using random forest nuisance models: set.seed(1) pe1_rf <- policy_eval(policy_data = pd1, policy = pl1, g_models = g_rf(), q_models = q_rf(), name = "A=1 (rf)") # merging the two estimates (equivalent to pe1 + pe1_rf): (est1 <- merge(pe1, pe1_rf)) coef(est1) head(IC(est1)) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # defining a policy learner based on cross-fitted doubly robust Q-learning: pl2 <- policy_learn( type = "drql", control = control_drql(qv_models = list(q_glm(~C_1), q_glm(~C_1+C_2))), full_history = TRUE, L = 2) # number of folds for cross-fitting # evaluating the policy learner using 2-fold cross fitting: pe2 <- policy_eval(type = "dr", policy_data = pd2, policy_learn = pl2, q_models = q_glm(), g_models = g_glm(), M = 2, # number of folds for cross-fitting name = "drql") # summarizing the estimated value of the policy: pe2 # getting the cross-fitted policy actions: head(get_policy_actions(pe2))
policy_learn()
is used to specify a policy
learning method (Q-learning,
doubly robust Q-learning, policy tree
learning and outcome weighted learning).
Evaluating the policy learner returns a policy object.
policy_learn( type = "ql", control = list(), alpha = 0, threshold = NULL, full_history = FALSE, L = 1, cross_fit_g_models = TRUE, save_cross_fit_models = FALSE, future_args = list(future.seed = TRUE), name = type ) ## S3 method for class 'policy_learn' print(x, ...) ## S3 method for class 'policy_object' print(x, ...)
policy_learn( type = "ql", control = list(), alpha = 0, threshold = NULL, full_history = FALSE, L = 1, cross_fit_g_models = TRUE, save_cross_fit_models = FALSE, future_args = list(future.seed = TRUE), name = type ) ## S3 method for class 'policy_learn' print(x, ...) ## S3 method for class 'policy_object' print(x, ...)
type |
Type of policy learner method:
|
control |
List of control arguments.
Values (and default values) are set using
|
alpha |
Probability threshold for determining realistic actions. |
threshold |
Numeric vector, thresholds for not choosing the reference action at stage 1. |
full_history |
If |
L |
Number of folds for cross-fitting nuisance models. |
cross_fit_g_models |
If |
save_cross_fit_models |
If |
future_args |
Arguments passed to |
name |
Character string. |
x |
Object of class "policy_object" or "policy_learn". |
... |
Additional arguments passed to print. |
Function of inherited class "policy_learn"
.
Evaluating the function on a policy_data object returns an object of
class policy_object. A policy object is a list containing all or
some of the following elements:
q_functions
Fitted Q-functions. Object of class "nuisance_functions".
g_functions
Fitted g-functions. Object of class "nuisance_functions".
action_set
Sorted character vector describing the action set, i.e., the possible actions at each stage.
alpha
Numeric. Probability threshold to determine realistic actions.
K
Integer. Maximal number of stages.
qv_functions
(only if type = "drql"
) Fitted
V-restricted Q-functions. Contains a fitted model for each stage and action.
ptl_objects
(only if type = "ptl"
) Fitted V-restricted
policy trees. Contains a policytree::policy_tree for each stage.
ptl_designs
(only if type = "ptl"
) Specification of the
V-restricted design matrix for each stage
The following S3 generic functions are available for an object of class "policy_object":
get_g_functions()
Extract the fitted g-functions.
get_q_functions()
Extract the fitted Q-functions.
get_policy()
Extract the fitted policy object.
get_policy_functions()
Extract the fitted policy function for a given stage.
get_policy_actions()
Extract the (fitted) policy actions.
Doubly Robust Q-learning (type = "drql"
): Luedtke, Alexander R., and
Mark J. van der Laan. "Super-learning of an optimal dynamic treatment rule."
The international journal of biostatistics 12.1 (2016): 305-332.
doi:10.1515/ijb-2015-0052.
Policy Tree Learning (type = "ptl"
): Zhou, Zhengyuan, Susan Athey,
and Stefan Wager. "Offline multi-action policy learning: Generalization and
optimization." Operations Research (2022). doi:10.1287/opre.2022.2271.
(Augmented) Outcome Weighted Learning: Liu, Ying, et al. "Augmented
outcome‐weighted learning for estimating optimal dynamic treatment regimens."
Statistics in medicine 37.26 (2018): 3776-3788. doi:10.1002/sim.7844.
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("BB"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd ### V-restricted (Doubly Robust) Q-learning # specifying the learner: pl <- policy_learn( type = "drql", control = control_drql(qv_models = list(q_glm(formula = ~ C_1 + BB), q_glm(formula = ~ L_1 + BB))), full_history = TRUE ) # evaluating the learned policy pe <- policy_eval(policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm()) pe # getting the policy object: po <- get_policy_object(pe) # inspecting the fitted QV-model for each action strata at stage 1: po$qv_functions$stage_1 head(get_policy(pe)(pd))
library("polle") ### Two stages: d <- sim_two_stage(5e2, seed=1) pd <- policy_data(d, action = c("A_1", "A_2"), baseline = c("BB"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd ### V-restricted (Doubly Robust) Q-learning # specifying the learner: pl <- policy_learn( type = "drql", control = control_drql(qv_models = list(q_glm(formula = ~ C_1 + BB), q_glm(formula = ~ L_1 + BB))), full_history = TRUE ) # evaluating the learned policy pe <- policy_eval(policy_data = pd, policy_learn = pl, q_models = q_glm(), g_models = g_glm()) pe # getting the policy object: po <- get_policy_object(pe) # inspecting the fitted QV-model for each action strata at stage 1: po$qv_functions$stage_1 head(get_policy(pe)(pd))
predict()
returns the fitted values of the g-functions and
Q-functions when applied to a (new) policy data object.
## S3 method for class 'nuisance_functions' predict(object, new_policy_data, ...)
## S3 method for class 'nuisance_functions' predict(object, new_policy_data, ...)
object |
Object of class "nuisance_functions". Either |
new_policy_data |
Policy data object created by |
... |
Additional arguments. |
data.table::data.table with keys id
and stage
and variables g_a
or Q_a
for
each action a in the actions set.
library("polle") ### Single stage: d <- sim_single_stage(5e2, seed=1) pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U") pd # defining a static policy (A=1): pl <- policy_def(1, name = "A=1") # doubly robust evaluation of the policy: pe <- policy_eval(policy_data = pd, policy = pl, g_models = g_glm(), q_models = q_glm()) # summarizing the estimated value of the policy: pe # getting the fitted g-function values: head(predict(get_g_functions(pe), pd)) # getting the fitted Q-function values: head(predict(get_q_functions(pe), pd))
library("polle") ### Single stage: d <- sim_single_stage(5e2, seed=1) pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U") pd # defining a static policy (A=1): pl <- policy_def(1, name = "A=1") # doubly robust evaluation of the policy: pe <- policy_eval(policy_data = pd, policy = pl, g_models = g_glm(), q_models = q_glm()) # summarizing the estimated value of the policy: pe # getting the fitted g-function values: head(predict(get_g_functions(pe), pd)) # getting the fitted Q-function values: head(predict(get_q_functions(pe), pd))
Use q_glm()
, q_glmnet()
, q_rf()
, and q_sl()
to construct
an outcome regression model/Q-model object.
The constructors are used as input for policy_eval()
and policy_learn()
.
q_glm( formula = ~A * ., family = gaussian(), model = FALSE, na.action = na.pass, ... ) q_glmnet( formula = ~A * ., family = "gaussian", alpha = 1, s = "lambda.min", ... ) q_rf( formula = ~., num.trees = c(250, 500, 750), mtry = NULL, cv_args = list(nfolds = 3, rep = 1), ... ) q_sl( formula = ~., SL.library = c("SL.mean", "SL.glm"), env = parent.frame(), onlySL = TRUE, discreteSL = FALSE, ... ) q_xgboost( formula = ~., objective = "reg:squarederror", params = list(), nrounds, max_depth = 6, eta = 0.3, nthread = 1, cv_args = list(nfolds = 3, rep = 1) )
q_glm( formula = ~A * ., family = gaussian(), model = FALSE, na.action = na.pass, ... ) q_glmnet( formula = ~A * ., family = "gaussian", alpha = 1, s = "lambda.min", ... ) q_rf( formula = ~., num.trees = c(250, 500, 750), mtry = NULL, cv_args = list(nfolds = 3, rep = 1), ... ) q_sl( formula = ~., SL.library = c("SL.mean", "SL.glm"), env = parent.frame(), onlySL = TRUE, discreteSL = FALSE, ... ) q_xgboost( formula = ~., objective = "reg:squarederror", params = list(), nrounds, max_depth = 6, eta = 0.3, nthread = 1, cv_args = list(nfolds = 3, rep = 1) )
formula |
An object of class formula specifying the design matrix for
the outcome regression model/Q-model at the given stage. The action at the
given stage is always denoted 'A', see examples. Use
|
family |
A description of the error distribution and link function to be used in the model. |
model |
(Only used by |
na.action |
(Only used by |
... |
Additional arguments passed to |
alpha |
(Only used by |
s |
(Only used by |
num.trees |
(Only used by |
mtry |
(Only used by |
cv_args |
(Only used by |
SL.library |
(Only used by |
env |
(Only used by |
onlySL |
(Only used by |
discreteSL |
(Only used by |
objective |
(Only used by |
params |
(Only used by |
nrounds |
(Only used by |
max_depth |
(Only used by |
eta |
(Only used by |
nthread |
(Only used by |
q_glm()
is a wrapper of glm()
(generalized linear model).q_glmnet()
is a wrapper of glmnet::glmnet()
(generalized linear model via
penalized maximum likelihood).q_rf()
is a wrapper of ranger::ranger()
(random forest).
When multiple hyper-parameters are given, the
model with the lowest cross-validation error is selected.q_sl()
is a wrapper of SuperLearner::SuperLearner (ensemble model).
q_xgboost()
is a wrapper of xgboost::xgboost.
q_model object: function with arguments 'AH' (combined action and history matrix) and 'V_res' (residual value/expected utility).
get_history_names()
, get_q_functions()
.
library("polle") ### Single stage case d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # available history variable names for the outcome regression: get_history_names(pd1) # evaluating the static policy a=1 using inverse # propensity weighting based on the given Q-model: pe1 <- policy_eval(type = "or", policy_data = pd1, policy = policy_def(1, name = "A=1"), q_model = q_glm(formula = ~A*.)) pe1 # getting the fitted Q-function values head(predict(get_q_functions(pe1), pd1)) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # available full history variable names at each stage: get_history_names(pd2, stage = 1) get_history_names(pd2, stage = 2) # evaluating the static policy a=1 using outcome # regression based on a glm model for each stage: pe2 <- policy_eval(type = "or", policy_data = pd2, policy = policy_def(1, reuse = TRUE, name = "A=1"), q_model = list(q_glm(~ A * L_1), q_glm(~ A * (L_1 + L_2))), q_full_history = TRUE) pe2 # getting the fitted Q-function values head(predict(get_q_functions(pe2), pd2))
library("polle") ### Single stage case d1 <- sim_single_stage(5e2, seed=1) pd1 <- policy_data(d1, action="A", covariates=list("Z", "B", "L"), utility="U") pd1 # available history variable names for the outcome regression: get_history_names(pd1) # evaluating the static policy a=1 using inverse # propensity weighting based on the given Q-model: pe1 <- policy_eval(type = "or", policy_data = pd1, policy = policy_def(1, name = "A=1"), q_model = q_glm(formula = ~A*.)) pe1 # getting the fitted Q-function values head(predict(get_q_functions(pe1), pd1)) ### Two stages: d2 <- sim_two_stage(5e2, seed=1) pd2 <- policy_data(d2, action = c("A_1", "A_2"), covariates = list(L = c("L_1", "L_2"), C = c("C_1", "C_2")), utility = c("U_1", "U_2", "U_3")) pd2 # available full history variable names at each stage: get_history_names(pd2, stage = 1) get_history_names(pd2, stage = 2) # evaluating the static policy a=1 using outcome # regression based on a glm model for each stage: pe2 <- policy_eval(type = "or", policy_data = pd2, policy = policy_def(1, reuse = TRUE, name = "A=1"), q_model = list(q_glm(~ A * L_1), q_glm(~ A * (L_1 + L_2))), q_full_history = TRUE) pe2 # getting the fitted Q-function values head(predict(get_q_functions(pe2), pd2))
Simulate Multi-Stage Data
sim_multi_stage( n, par = list(tau = 10, gamma = c(0, -0.2, 0.3), alpha = c(0, 0.5, 0.2, -0.5, 0.4), beta = c(3, -0.5, -0.5), psi = 1, xi = 0.3), a = function(t, x, beta, ...) { prob <- lava::expit(beta[1] + (beta[2] * t^2) + (beta[3] * x)) stats::rbinom(n = 1, size = 1, prob = prob) }, seed = NULL )
sim_multi_stage( n, par = list(tau = 10, gamma = c(0, -0.2, 0.3), alpha = c(0, 0.5, 0.2, -0.5, 0.4), beta = c(3, -0.5, -0.5), psi = 1, xi = 0.3), a = function(t, x, beta, ...) { prob <- lava::expit(beta[1] + (beta[2] * t^2) + (beta[3] * x)) stats::rbinom(n = 1, size = 1, prob = prob) }, seed = NULL )
n |
Number of observations. |
par |
Named list with distributional parameters.
|
a |
Function used to specify the action/treatment at every stage. |
seed |
Integer. |
sim_multi_stage
samples n
iid observation
with the following distribution:
For let
Note that is the minimum increment.
list with elements stage_data
(data.table::data.table) and
baseline_data
(data.table::data.table).
Simulate Single-Stage Data
sim_single_stage( n = 10000, par = c(k = 0.1, d = 0.5, a = 1, b = -2.5, c = 3, p = 0.3), action_model = function(Z, L, B, k, d) { k * (Z + L - 1) * Z^(-2) + d * (B == 1) }, utility_model = function(Z, L, A, a, b, c) { Z + L + A * (c * Z + a * L + b) }, seed = NULL, return_model = FALSE, ... )
sim_single_stage( n = 10000, par = c(k = 0.1, d = 0.5, a = 1, b = -2.5, c = 3, p = 0.3), action_model = function(Z, L, B, k, d) { k * (Z + L - 1) * Z^(-2) + d * (B == 1) }, utility_model = function(Z, L, A, a, b, c) { Z + L + A * (c * Z + a * L + b) }, seed = NULL, return_model = FALSE, ... )
n |
Number of observations. |
par |
Named vector with distributional parameters.
|
action_model |
Function used to specify the action/treatment probability (logit link). |
utility_model |
Function used to specify the conditional mean utility. |
seed |
Integer. |
return_model |
If TRUE, the lava::lvm model is returned. |
... |
Additional arguments passed to |
sim_single_stage
samples n
iid observation
with the following distribution:
data.frame with n rows and columns Z, L, B, A, and U.
Simulate Single-Stage Multi-Action Data
sim_single_stage_multi_actions(n = 1000, seed = NULL)
sim_single_stage_multi_actions(n = 1000, seed = NULL)
n |
Number of observations. |
seed |
Integer. |
sim_single_stage_multi_actions
samples n
iid observation
with the following distribution:
data.frame with n rows and columns z, x, a, and u.
Simulate Two-Stage Data
sim_two_stage( n = 10000, par = c(gamma = 0.5, beta = 1), seed = NULL, action_model_1 = function(C_1, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_1)), action_model_2 = function(C_2, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_2)), deterministic_rewards = FALSE )
sim_two_stage( n = 10000, par = c(gamma = 0.5, beta = 1), seed = NULL, action_model_1 = function(C_1, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_1)), action_model_2 = function(C_2, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_2)), deterministic_rewards = FALSE )
n |
Number of observations. |
par |
Named vector with distributional parameters.
|
seed |
Integer. |
action_model_1 |
Function used to specify the action/treatment at stage 1. |
action_model_2 |
Function used to specify the action/treatment at stage 2. |
deterministic_rewards |
Logical. If TRUE, the deterministic reward contributions are returned as well (columns U_1_A0, U_1_A1, U_2_A0, U_2_A1). |
sim_two_stage
samples n
iid observation
with the following distribution:
is a random categorical variable with levels
group1
,
group2
, and group3
. Furthermore,
The rewards are calculated as
data.table::data.table with n rows and columns B, BB, L_1, C_1, A_1, L_2, C_2, A_2, L_3, U_1, U_2, U_3 (,U_1_A0, U_1_A1, U_2_A0, U_2_A1).
Simulate Two-Stage Multi-Action Data
sim_two_stage_multi_actions( n = 1000, par = list(gamma = 0.5, beta = 1, prob = c(0.2, 0.4, 0.4)), seed = NULL, action_model_1 = function(C_1, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_1)) )
sim_two_stage_multi_actions( n = 1000, par = list(gamma = 0.5, beta = 1, prob = c(0.2, 0.4, 0.4)), seed = NULL, action_model_1 = function(C_1, beta, ...) stats::rbinom(n = NROW(C_1), size = 1, prob = lava::expit(beta * C_1)) )
n |
Number of observations. |
par |
Named vector with distributional parameters.
|
seed |
Integer. |
action_model_1 |
Function used to specify the dichotomous action/treatment at stage 1. |
sim_two_stage_multi_actions
samples n
iid observation
with the following distribution:
is a random categorical variable with levels
group1
,
group2
, and group3
. Furthermore,
The rewards are calculated as
data.table::data.table with n rows and columns B, BB, L_1, C_1, A_1, L_2, C_2, A_2, L_3, U_1, U_2, U_3.
subset_id
returns a policy data object containing the given IDs.
subset_id(object, id, preserve_action_set = TRUE)
subset_id(object, id, preserve_action_set = TRUE)
object |
Object of class policy_data. |
id |
character vectors of IDs. |
preserve_action_set |
If TRUE, the action sets must be preserved. |
Object of class policy_data.
library("polle") ### Single stage: d <- sim_single_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U") pd # getting the observation IDs: get_id(pd)[1:10] # subsetting on IDs: pdsub <- subset_id(pd, id = 250:500) pdsub get_id(pdsub)[1:10]
library("polle") ### Single stage: d <- sim_single_stage(5e2, seed=1) # constructing policy_data object: pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U") pd # getting the observation IDs: get_id(pd)[1:10] # subsetting on IDs: pdsub <- subset_id(pd, id = 250:500) pdsub get_id(pdsub)[1:10]