Title: | Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials |
---|---|
Description: | Supports propensity score weighting analysis of observational studies and randomized trials. Enables the estimation and inference of average causal effects with binary and multiple treatments using overlap weights (ATO), inverse probability of treatment weights (ATE), average treatment effect among the treated weights (ATT), matching weights (ATM) and entropy weights (ATEN), with and without propensity score trimming. These weights are members of the family of balancing weights introduced in Li, Morgan and Zaslavsky (2018) <doi:10.1080/01621459.2016.1260466> and Li and Li (2019) <doi:10.1214/19-AOAS1282>. |
Authors: | Tianhui Zhou [aut], Guangyu Tong [aut], Fan Li [aut], Laine Thomas [aut], Fan Li [aut], Yukang Zeng [cre] |
Maintainer: | Yukang Zeng <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.0 |
Built: | 2025-02-23 04:02:36 UTC |
Source: | https://github.com/thuizhou/psweight |
The coef
method for class "PSweight"
## S3 method for class 'PSweight' coef(object, ...)
## S3 method for class 'PSweight' coef(object, ...)
object |
an object for which the extraction of model coefficients is meaningful. |
... |
other arguments. |
The output from coef
This is a real observational study with binary and multiple treatment groups to illustrate the utility of PSweight functions.
data(NCDS)
data(NCDS)
A data frame with 3642 rows and 16 columns.
An dataset from the National Child Development Survey (NCDS) of the United Kingdom (UK). This dataset is obtained through the CC0 waiver from the work by Battistin and Sianesi (2011). For illustration, missing entries are imputed once with Multiple Imputation by Chained Equations (MICE).
This data frame contains the following columns:
white: self-identified as white.
wage: gross hourly wage in pound in log scale.
Dany: whether received any education before.
Dmult: three levels of educational attainment.
maemp: employment status of mother.
scht: school type.
qmab: math score at age 7.
qmab2: math score at age 11.
qvab: reading score at age 7.
qvab2: reading score at age 11.
paed_u: father's years of education.
maed_u: mother's years of education.
age_pa: age of father.
age_ma: age of mother.
sub_u: number of siblings.
wagebin: dichotomized wage obtained with a cutoff of average hourly wage.
https://cls.ucl.ac.uk/cls-studies/1958-national-child-development-study/
Battistin E, Sianesi B. (2011). Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. Review of Economics and Statistics, 93(2), 495-509.
Battistin E, Sianesi B. (2012). Replication data for: Misclassified Treatment Status and Treatment Effects: an Application to Returns to Education in the United Kindom. URL https://doi.org.10.7910/DVN/EPCYUL.
data("NCDS")
data("NCDS")
The function OUTmethod
is an internal function to estimate the potential outcomes given a specified model through formula.
It is built into function PSweight
, and is used for constructing the augmented estimators.
OUTmethod( out.formula = out.formula, y = y, out.method = "glm", family = "gaussian", datain = datain, dataout = dataout, out.control = list() )
OUTmethod( out.formula = out.formula, y = y, out.method = "glm", family = "gaussian", datain = datain, dataout = dataout, out.control = list() )
out.formula |
an object of class |
y |
a vector of the observed outcome in the training data ( |
out.method |
a character to specify the method for estimating the outcome regression model. |
family |
a description of the error distribution and link function to be used in the outcome model. Supported distributional families include
|
datain |
The training data for the outcome model. In the context of |
dataout |
The prediction data for the outcome model. In the context of |
out.control |
a list to specify additional options when |
A typical form for out.formula
is y ~ terms
where y
is the outcome
variable and terms
is a series of terms which specifies linear predictors (on the link function scale). out.formula
by default specifies generalized
linear model given the gaussian error through the default arguments method = "glm"
and family = "gaussian"
. It fits the logistic regression when family = "binomal"
,and poisson
regression when family = "poisson"
. The argument out.method
allows user to choose
model other than glm to fit the outcome regression models for constructing the augmented estimator. We have included gbm
and SuperLearner
as alternative machine learning estimators.
Additional argument in them can be supplied through the ...
argument. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
m.est
a vector of predicted outcome on the dataout
.
gamma.h
estimated coefficient of the outcome model when method = "glm"
.
#' the outcome model out.formula <- Y~cov1+cov2+cov3+cov4+cov5+cov6 y <- psdata$Y #train on model with treatment group 1 datain <- psdata[psdata$trt==1, ] outfit <- OUTmethod(out.formula = out.formula, y=y, datain = datain, dataout = psdata)
#' the outcome model out.formula <- Y~cov1+cov2+cov3+cov4+cov5+cov6 y <- psdata$Y #train on model with treatment group 1 datain <- psdata[psdata$trt==1, ] outfit <- OUTmethod(out.formula = out.formula, y=y, datain = datain, dataout = psdata)
Summarize the SumStat object, generate histogram or density of estimated propensity scores and plot the balance statistics under weighting versus no weighting.
## S3 method for class 'SumStat' plot( x, type = "balance", weighted.var = TRUE, threshold = 0.1, metric = "ASD", breaks = 50, ... )
## S3 method for class 'SumStat' plot( x, type = "balance", weighted.var = TRUE, threshold = 0.1, metric = "ASD", breaks = 50, ... )
x |
a |
type |
a character indicating the type of plot to produce, including histogram of estimated propensity scores ( |
weighted.var |
logical. Indicating whether weighted variance should be used in calculating the balance statistics. Default is |
threshold |
an optional numeric value indicating the balance threshold for the balance plot. Default is 0.1. Only valid when |
metric |
a character indicating the type of metric used in balance plot. Only |
breaks |
a single number giving the number of cells for the histogram. Default is 50. |
... |
further arguments passed to or from other methods. |
For the balance plot, a vertical line at threshold
is used to define balance on covariates.
The default value is threshold = 0.1
following Austin and Stuart (2015). If more than 2 treatments
are considered, only density of the estimated generalized propensity scores will be produced, regardless of
whether type = "density"
or type = "hist"
.
Plot of the indicated type.
Austin, P.C. and Stuart, E.A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661-3679.
data("psdata") ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 msstat <- SumStat(ps.formula, trtgrp="2", data=subset(psdata,trt>1), weight=c("IPW","overlap","treated","entropy","matching")) plot(msstat, type="hist") plot(msstat, type="balance", weighted.var=TRUE, threshold=0.1, metric="ASD")
data("psdata") ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 msstat <- SumStat(ps.formula, trtgrp="2", data=subset(psdata,trt>1), weight=c("IPW","overlap","treated","entropy","matching")) plot(msstat, type="hist") plot(msstat, type="balance", weighted.var=TRUE, threshold=0.1, metric="ASD")
The print
method for class "PStrim"
## S3 method for class 'PStrim' print(x, ...)
## S3 method for class 'PStrim' print(x, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from print
The print
method for class "PSweight"
## S3 method for class 'PSweight' print(x, ...)
## S3 method for class 'PSweight' print(x, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from print
The print
method for class "PSweightsum"
## S3 method for class 'PSweightsum' print(x, ...)
## S3 method for class 'PSweightsum' print(x, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from print
The print
method for class "SumStat"
## S3 method for class 'SumStat' print(x, ...)
## S3 method for class 'SumStat' print(x, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from print
The print
method for class "SumSumStat"
## S3 method for class 'SumSumStat' print(x, ...)
## S3 method for class 'SumSumStat' print(x, ...)
x |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from print
This is a simulated observational study with three treatment groups to illustated the ulity of PSweight.
data(psdata)
data(psdata)
A data frame with 1500 rows and 8 columns.
The simulated dataset includes 1500 rows, with each row represents information recorded from each individual. There are 8 variables (columns). The treatment is the variable trt, which has three treatment arms. The outcome of interest is variable Y. cov1-cov6 are pre-treatment covariates among which cov1-cov5 are continous, and cov6 is binary.
data("psdata")
data("psdata")
This is a simulated observational study with three treatment groups to illustated the ulity of PSweight.
data(psdata_cl)
data(psdata_cl)
A data frame with 1500 rows and 9 columns.
The simulated dataset includes 1500 rows, with each row represents information recorded from each individual. There are 9 variables (columns). The treatment is the variable trt, which has two treatment arms. The clt is the cluster level. The outcome of interest is variable Y. cov1-cov6 are pre-treatment covariates among which cov1-cov5 are continous, and cov6 is binary.
# data("psdata_cl")
# data("psdata_cl")
The function PSmethod
is an internal function to estimate the propensity scores given a specified model through formula.
It is built into function Sumstat
, PStrim
and PSweight
.
PSmethod( ps.formula = ps.formula, method = "glm", data = data, ncate = ncate, ps.control = list() )
PSmethod( ps.formula = ps.formula, method = "glm", data = data, ncate = ncate, ps.control = list() )
ps.formula |
an object of class |
method |
a character to specify the method for estimating propensity scores. |
data |
an optional data frame containing the variables in the propensity score model. |
ncate |
a numeric to specify the number of treatment groups present in the given data. |
ps.control |
a list to specify additional options when |
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable and terms
is a series of terms which specifies a linear predictor. ps.formula
by default specifies generalized
linear models given the default argument method = "glm"
. It fits the logistic regression when ncate = 2
,and multinomial
logistic regression when ncate > 2
. The argument method
allows user to choose
model other than glm to fit the propensity score models. We have included gbm
and SuperLearner
as two alternative machine learning methods.
Additional arguments of the machine learning estimators can be supplied through the ...
argument. Note that SuperLearner does not handle multiple groups and the current version of multinomial
logistic regression is not supported by gbm. We suggest user to use them with extra caution. Please refer to the user manual of the gbm
and SuperLearner
packages for all the
allowed arguments.
e.h
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
beta.h
estimated coefficient of the propensity model when method = "glm"
.
# the propensity model ps.formula <- trt~cov1+cov2+cov3+cov4+cov5+cov6 psfit <- PSmethod(ps.formula = ps.formula,data = psdata,ncate=3)
# the propensity model ps.formula <- trt~cov1+cov2+cov3+cov4+cov5+cov6 psfit <- PSmethod(ps.formula = ps.formula,data = psdata,ncate=3)
Trim the original data and propensity estimate according to symmetric propensity score trimming rules.
PStrim( data, ps.formula = NULL, zname = NULL, ps.estimate = NULL, delta = 0, optimal = FALSE, out.estimate = NULL, method = "glm", ps.control = list() )
PStrim( data, ps.formula = NULL, zname = NULL, ps.estimate = NULL, delta = 0, optimal = FALSE, out.estimate = NULL, method = "glm", ps.control = list() )
data |
an optional data frame containing the variables required by |
ps.formula |
an object of class |
zname |
an optional character specifying the name of the treatment variable in |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels. A vector of propensity score estimates is also allowed in |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
optimal |
an logical argument indicating if optimal trimming should be used. Default is |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies a
model for estimating the propensity scores, when ps.estimate
is NULL
.
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the "gbm"
package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to call the SuperLearner()
function in SuperLearner
package.
Currently, the SuperLearner method only support binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
With binary treatments, delta
defines the symmetric propensity score trimming rule following Crump et al. (2009).
With multiple treatments, delta
defines the symmetric multinomial trimming rule introduced in Yoshida et al. (2019).
With binary treatments and when optimal
equals TRUE
, the trimming function implements the optimal
symmetric trimming rule in Crump et al. (2009). The optimal trimming threshold delta
is then returned.
With multiple treatments and optimal
equals TRUE
, the trimming function implements the optimal trimming rule in Yang et al. (2016).
The optimal cutoff lambda
, which defines the acceptable upper bound for the sum of inverse generalized propensity scores, is
returned. See Yang et al. (2016) and Li and Li (2019) for details.
The argument zname
is required when ps.estimate
is not NULL
.
PStrim returns a list of the following values:
data
a data frame of trimmed data.
trim_sum
a table summrizing the number of cases by treatment groups before and after trimming.
ps.estimate
a data frame of propensity estimate after trimming.
delta
an optional output of trimming threshold for symmetric trimming.
lambda
an optional output trimming threshold for optimal trimming with multiple treatment groups.
out.estimate
a data frame of estimated potential outcomes after trimming.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Yang, S., Imbens, G. W., Cui, Z., Faries, D. E., Kadziola, Z. (2016). Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics, 72(4), 1055-1065.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
data("psdata") # the propensity model ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 # trim the original data by setting the threshold of propensity as 0.05 PStrim(data=psdata, ps.formula=ps.formula, delta=0.05) PStrim(data=psdata, ps.formula=ps.formula, optimal=TRUE)
data("psdata") # the propensity model ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 # trim the original data by setting the threshold of propensity as 0.05 PStrim(data=psdata, ps.formula=ps.formula, delta=0.05) PStrim(data=psdata, ps.formula=ps.formula, optimal=TRUE)
The function PSweight
is used to estimate the average potential outcomes corresponding to
each treatment group among the target population. The function currently implements
the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population),
average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment),
overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population
is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function).
Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimates either estimated
within the function, or supplied by external routines.
PSweight( ps.formula = NULL, ps.estimate = NULL, trtgrp = NULL, zname = NULL, yname, data, weight = "overlap", delta = 0, augmentation = FALSE, bootstrap = FALSE, R = 50, out.formula = NULL, out.estimate = NULL, family = "gaussian", ps.method = "glm", ps.control = list(), out.method = "glm", out.control = list() )
PSweight( ps.formula = NULL, ps.estimate = NULL, trtgrp = NULL, zname = NULL, yname, data, weight = "overlap", delta = 0, augmentation = FALSE, bootstrap = FALSE, R = 50, out.formula = NULL, out.estimate = NULL, family = "gaussian", ps.method = "glm", ps.control = list(), out.method = "glm", out.control = list() )
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for
each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the
total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level,
if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order
of the treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment
effect among the treated (ATT). Only necessary if |
zname |
an optional character specifying the name of the treatment variable in |
yname |
an optional character specifying the name of the outcome variable in |
data |
an optional data frame containing the variables in the propensity score model
and outcome model (if augmented estimator is used). If not found in data, the variables are
taken from |
weight |
a character or vector of characters including the types of weights to be used.
|
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
augmentation |
logical. Indicate whether augmented weighting estimators should be used.
Default is |
bootstrap |
logical. Indaicate whether bootstrap is used to estimate the standard error
of the point estimates. Default is |
R |
an optional integer indicating number of bootstrap replicates. Default is |
out.formula |
an object of class |
out.estimate |
an optional matrix or data frame containing estimated potential outcomes
for each observation. Typically, this is an N by J matrix, where N is the number of observations
and J is the total number of treatment levels. Preferably, the column name of this matrix should
match the name of treatment level, if column name is missing or there is a mismatch,
the column names would be assigned according to alphabatic order of the treatment levels, with a
similar mechanism as in |
family |
a description of the error distribution and link function to be used in the outcome model.
Only required if |
ps.method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
out.method |
a character to specify the method for estimating the outcome regression model. |
out.control |
a list to specify additional options when |
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. Similarly, a typical form for out.formula
is
outcome ~ terms
where outcome
is the outcome variable (identical to the variable name
used to specify yname
) and terms
is a series of terms which specifies a linear
predictor for outcome
. Both ps.formula
and out.formula
by default specify generalized
linear models when ps.estimate
and/or out.estimate
is NULL
. The argument ps.method
and out.method
allow user to choose
model other than glm to fit the propensity score and outcome regression models for augmentation. Additional argument in the gbm()
function can be supplied through the ps.control
and out.control
argument. Please refer to the user manual of the gbm
package for all the
allowed arguments. "SuperLearner"
is also allowed in the ps.method
and out.method
arguments. Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
for both propensity and outcome regression models.
Prediction algorithm and other tuning parameters can also be passed throughps.control
and out.control
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of values in Z
, and the rightmost coulmn of ps.estimate
is assumed
to be the treatment group, when estimating ATT. trtgrp
can also be used to specify the treatment
group for estimating ATT. The same mechanism applies to out.estimate
, except that the input for out.estimate
must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment)
for each observation.
The argument zname
and/or yname
is required when ps.estimate
and/or out.estimate
is not NULL
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE (IPW), ATT (treated) and
ATO (overlap), ATM (matching) and ATEN (entropy). These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019).
If augmentation = TRUE
, an augmented weighting estimator will be implemented. For binary treatments, the augmented
weighting estimator is presented in Mao, Li and Greene (2018). For multiple treatments, the augmented weighting estimator is
mentioned in Li and Li (2019), and additional details will appear in our ongoing work (Zhou et al. 2020+). When
weight = "IPW"
, the augmented estimator is also referred to as a doubly-robust (DR) estimator.
When bootstrap = TRUE
, the variance will be calculated by nonparametric bootstrap, with R
bootstrap
replications. The default of R
is 50. Otherwise, the variance will be calculated using the sandwich variance
formula obtained in the M-estimation framework.
PSweight returns a PSweight
object containing a list of the following values:
estimated propensity scores, average potential outcomes corresponding to each treatment,
variance-covariance matrix of the point estimates, the label for each treatment group,
and estimates in each bootstrap replicate if bootstrap = TRUE
.
A summary of PSweight can be obtained with summary.PSweight
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
muhat
average potential outcomes by treatment groups, with reference to specific target populations.
covmu
variance-covariance matrix of muhat
.
muboot
an optional list of point estimates in each bootstrap replicate bootstrap = TRUE
.
group
a table of treatment group labels corresponding to the output point estimates muhat
.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research, 29(12), 3721-3756.
data("psdata") # the propensity and outcome models ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 out.formula<-Y~cov1+cov2+cov3+cov4+cov5+cov6 # without augmentation ato1<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,weight = 'overlap') summary(ato1) # augmented weighting estimator, takes longer time to calculate sandwich variance # ato2<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata, # augmentation = TRUE,out.formula = out.formula,family = 'gaussian',weight = 'overlap') # summary(ato2)
data("psdata") # the propensity and outcome models ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 out.formula<-Y~cov1+cov2+cov3+cov4+cov5+cov6 # without augmentation ato1<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,weight = 'overlap') summary(ato1) # augmented weighting estimator, takes longer time to calculate sandwich variance # ato2<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata, # augmentation = TRUE,out.formula = out.formula,family = 'gaussian',weight = 'overlap') # summary(ato2)
The function PSweight_cl
is used to estimate the average potential outcomes corresponding to
each treatment group among the target population with two-level data. The function currently implements
the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population),
average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment),
overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population
is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function).
Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimated
within the function through mixed effect model.
PSweight_cl( ps.formula = NULL, trtgrp = NULL, yname, data, weight = "overlap", delta = 0, augmentation = FALSE, bootstrap = FALSE, bs_level = NULL, R = 50, out.formula = NULL, family = "gaussian", nAGQ = 1L )
PSweight_cl( ps.formula = NULL, trtgrp = NULL, yname, data, weight = "overlap", delta = 0, augmentation = FALSE, bootstrap = FALSE, bs_level = NULL, R = 50, out.formula = NULL, family = "gaussian", nAGQ = 1L )
ps.formula |
an object of class |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment
effect among the treated (ATT). Only necessary if |
yname |
an optional character specifying the name of the outcome variable in |
data |
an optional data frame containing the variables in the propensity score model
and outcome model (if augmented estimator is used). If not found in data, the variables are
taken from |
weight |
a character or vector of characters including the types of weights to be used.
|
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
augmentation |
logical. Indicate whether augmented weighting estimators should be used.
Default is |
bootstrap |
logical. Indaicate whether bootstrap is used to estimate the standard error
of the point estimates. Default is |
bs_level |
an optional character defining the cluster level (name of the variable) for each bootstrap resampling.
Default is |
R |
an optional integer indicating number of bootstrap replicates. Default is |
out.formula |
an object of class |
family |
a description of the error distribution and link function to be used in the outcome model.
Only required if |
nAGQ |
integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Please refer to lme4 package for more details. |
A typical form for ps.formula
is treatment ~ terms+1|clusters
where treatment
is the treatment
variable and terms
is a series of terms
which specifies a linear predictor for treatment
and cluster level effects. Similarly, a typical form for out.formula
is
outcome ~ treatment+terms+1|cluster
where outcome
is the outcome variable (identical to the variable name
used to specify yname
); terms
is a series of terms which specifies a linear
predictor for outcome
; clusters
is the random effects term for clusters. Both ps.formula
and out.formula
by default specify generalized
linear mixed effect models.
Current version of PSweight_cl
allows for five types of propensity score weights used to estimate ATE (IPW), ATT (treated) and
ATO (overlap), ATM (matching) and ATEN (entropy). These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019).
If augmentation = TRUE
, an augmented weighting estimator will be implemented. For binary treatments, the augmented
weighting estimator is presented in Mao, Li and Greene (2018). When
weight = "IPW"
, the augmented estimator is also referred to as a doubly-robust (DR) estimator.
When bootstrap = TRUE
, the variance will be calculated by nonparametric bootstrap, with R
bootstrap
replications. bs_level
needs to be specified as the variable name for the cluster in order to conduct cluster
level resampling and maintaining the cluster level coorelation. The default value NULL
treat each observation independently.
The default of R
is 50. Otherwise, the variance will be calculated using the sandwich variance
formula obtained in the M-estimation framework.
PSweight_cl returns a PSweight
object containing a list of the following values:
estimated propensity scores, average potential outcomes corresponding to each treatment,
variance-covariance matrix of the point estimates, the label for each treatment group,
and estimates in each bootstrap replicate if bootstrap = TRUE
.
A summary of PSweight_cl can be obtained with summary.PSweight
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
muhat
average potential outcomes by treatment groups, with reference to specific target populations.
covmu
variance-covariance matrix of muhat
.
muboot
an optional list of point estimates in each bootstrap replicate bootstrap = TRUE
.
group
a table of treatment group labels corresponding to the output point estimates muhat
.
Li, F., Zaslavsky, A. M., Landrum, M. B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373-3387.
Fuentes, A., Lüdtke, O., Robitzsch, A. (2021). Causal inference with multilevel data: A comparison of different propensit score weighting appropaches. Multivariate Behavioral Research, 1-24.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research 29(12), 3721-3756.
#data("psdata_cl") #ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt) #ato_cl<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata_cl) #summary(ato_cl)
#data("psdata_cl") #ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt) #ato_cl<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata_cl) #summary(ato_cl)
summary.PSweight
is used to summarize the results from PSweight
.
The output contains the average causal effects defined by specific contrasts, as well as their
standard error estimates.
## S3 method for class 'PSweight' summary(object, contrast = NULL, type = "DIF", CI = TRUE, ...)
## S3 method for class 'PSweight' summary(object, contrast = NULL, type = "DIF", CI = TRUE, ...)
object |
a PSweight object obtained from the |
contrast |
a vector or matrix specifying the causal contrast of interest. The average causal effects will be defined by such contrats. For multiple treatments, the contrast parameters are explained in Li and Li (2019) for estimating general causal effects. Default is all pairwise contrasts between any two treatment groups. |
type |
a character specifying the target estimand. The most commonly seen additive estimand is specified
by |
CI |
a logical argument indicates whether confidence interval should be calculated. Default is |
... |
further arguments passed to or from other methods. |
For the contrast
argument, one specifies the contrast of interest and thus defines the target estimand
for comparing treatments. For example, if there are three treatment levels: A, B, and C, the contrast A-C
(i.e., E[Y(A)] - E[Y(C)]) can be specified by c(1,0,-1)
. The contrasts of A-C and B-C can be
jointly specified by rbind(c(1,0,-1), c(0,1,-1))
.
For estimating the causal relative risk (type = "RR"
), the contrast is specified at the log scale. For example,
the contrast A-C (specified by c(1,0,-1)
) implies the estimation of log{E[Y(A)]} - log{E[Y(C)]}. For estimating the causal odds
ratio, the contrast is specified at the log odds scale. For example, the contrast A-C (specified by c(1,0,-1)
)
implies the estimation of log{E[Y(A)]/E[1-Y(A)]} - log{E[Y(C)]/E[1-Y(C)]}.
The variance of the contrasts will be estimated by the delta method (if sandwich variance is used, or
bootstrap = FALSE
), or nonparametric bootstrap (if bootstrap = TRUE
). Details will be given in
Zhou et al. (2020+).
The argument type
takes one of three options: "DIF"
, "RR"
, or "RR"
, with "DIF"
as
the default option. Typically, "RR"
is relavent for binary or count outcomes, and "OR"
is relavent
only for binary outcomes. "DIF"
applies to all types of outcomes.
A list of following values:
estimates
a matrix of point estimates, standard errors, test statistics, 95 for contrasts of interest.
bootestimates
a list of data frames containing estimated contrasts in each bootstrap replicate, if bootstrap is used to estimate standard errors.
contrast
a table listing the specified contrasts of interest.
group
a table of treatment group labels corresponding to the output point estimates, provided in results
obtained from PSweight
.
trtgrp
a character indicating the treatment group, or target population under ATT weights.
type
a character specifying the target estimand.
CI
a logical indaicator of whether confidence interval should be reported.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
## For examples, run: example(PSweight).
## For examples, run: example(PSweight).
summary.SumStat
is used to summarize results obtained from function
SumStat
. The output includes effective sample sizes and tables for balance statistics.
## S3 method for class 'SumStat' summary(object, weighted.var = TRUE, metric = "ASD", ...)
## S3 method for class 'SumStat' summary(object, weighted.var = TRUE, metric = "ASD", ...)
object |
a |
weighted.var |
logical. Indicate whether the propensity score weighted variance should be used in calculating the balance metrics. Default is |
metric |
a chatacter indicating the type of balance metrics. |
... |
further arguments passed to or from other methods. |
For metric
, the two options "ASD"
and "PSD"
are defined in Li and Li (2019)
for the general family of balancing weights. Similar definitions are also given in McCaffrey et al. (2013)
for inverse probability weighting. weighted.var
specifies whether weighted or unweighted variance
should be used in calculating ASD or PSD. An example of weighted variance with two treatment groups is given in
Austin and Stuart (2015). For more than two treatment groups, the maximum of ASD (across all pairs of treatments)
and maximum of PSD (across all treatments) are calcualted, as explained in Li and Li (2019).
A list of tables containing effective sample sizes and balance statistics on covariates for specified propensity score weighting schemes.
effective.sample.size
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted
A table summarizing mean, variance by treatment groups, and standardized mean difference.
IPW
If "IPW"
is specified, this is a data table summarizing mean, variance by treatment groups,
and standardized mean difference under inverse probability of treatment weighting.
treated
If "treated"
is specified, this is a data table summarizing mean, variance by treatment groups,
and standardized mean difference under the ATT weights.
overlap
If "overlap"
is specified, this is a data table summarizing mean, variance by treatment groups,
and standardized mean difference under the (generalized) overlap weights.
matching
If "matching"
is specified, this is a data table summarizing mean, variance by treatment groups,
and standardized mean difference under the (generalized) matching weights.
entropy
If "entropy"
is specified, this is a data table summarizing mean, variance by treatment groups,
and standardized mean difference under the (generalized) entropy weights.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Austin, P.C. and Stuart, E.A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661-3679.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
## For examples, run: example(SumStat).
## For examples, run: example(SumStat).
SumStat
is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting.
SumStat( ps.formula = NULL, ps.estimate = NULL, trtgrp = NULL, Z = NULL, covM = NULL, zname = NULL, xname = NULL, data = NULL, weight = "overlap", delta = 0, method = "glm", ps.control = list() )
SumStat( ps.formula = NULL, ps.estimate = NULL, trtgrp = NULL, Z = NULL, covM = NULL, zname = NULL, xname = NULL, data = NULL, weight = "overlap", delta = 0, method = "glm", ps.control = list() )
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column names of this matrix should match the names of treatment level, if column names are missing or there is a mismatch, the column names would be assigned according to the alphabatic order of treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
Z |
an optional vector specifying the values of treatment, only necessary when the covariate matrix |
covM |
an optional covariate matrix or data frame including covariates, their interactions and higher-order terms. When the covariate matrix |
zname |
an optional character specifying the name of the treatment variable in |
xname |
an optional vector of characters including the names of covariates in |
data |
an optional data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
A typical form for ps.formula
is treatment ~ terms
where treatment
is the treatment
variable (identical to the variable name used to specify zname
) and terms
is a series of terms
which specifies a linear predictor for treatment
. ps.formula
specifies logistic or multinomial logistic
models for estimating the propensity scores, when ps.estimate
is NULL
.
When comparing two treatments, ps.estimate
can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
When comparing multiple (J>=3) treatments, ps.estimate
needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate
should have column names that indicate the level of the treatment variable,
which should match the levels given in Z
.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of treatmentlevels. The rightmost coulmn of ps.estimate
is then assumed
to be the treatment group when estimating ATT ("treated"
). trtgrp
can also be used to specify the treatment
group for estimating ATT.
To generate balance statistics, one can directly specify Z
and covM
to indicate the treatment levels and
covariate matrix. Alternatively, one can supply data
, zname
, and xname
to indicate the
same information. When both are specified, the function will prioritize inputs from Z
and covM
.
When ps.estimate
is not NULL
, argument zname
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE ("IPW"
), ATT ("treated"
), and
ATO("overlap"
), ATM ("matching"
) and ATEN ("entropy"
). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
"glm"
is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm"
serves as an API to call the gbm()
function from the
gbm
package. Additional argument in the gbm()
function can be supplied through the ps.control=list()
argument in SumStat()
. Please refer to the user manual of the gbm package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner"
is also allowed in the method
argument to pass the propensity score estimation to the SuperLearner()
function in SuperLearner package.
Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm"
. The estimation approach is default to "method.NNLS"
in the SumStat()
function.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list()
to SumStat()
. Please refer to the user manual of the SuperLearner
package for all the allowed specifications.
SumStat returns a SumStat
object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat
can be obtained with summary.SumStat
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
ps.weights
a data frame of propensity score weights.
ess
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstat
A list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstat
If "IPW"
is included in weight
, this is a list of summary statistics using inverse probability weighting.
ATT.sumstat
If "treated"
is included in weight
, this is a list of summary statistics using the ATT weights.
ATO.sumstat
If "overlap"
is included in weight
, this is a list of summary statistics using the overlap weights.
ATM.sumstat
If "matching"
is included in weight
, this is a list of summary statistics using the matching weights.
ATEN.sumstat
If "entropy"
is included in weight
, this is a list of summary statistics using the entropy weights.
trim
If delta > 0
, this is a table summarizing the number of observations before and after trimming.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Greenwell B., Boehmke B.,Cunningham J, GBM Developers (2020) gbm: Generalized Boosted Regression Models. Cran: https://cran.r-project.org/web/packages/gbm/index.html
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Polley E., LeDell E., Kennedy C., Lendle S., van der Laan M. (2019) SuperLearner: Super Learner Prediction. Cran: https://cran.r-project.org/web/packages/SuperLearner/index.html
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
data("psdata") # the propensity model ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 # using SumStat to estimate propensity scores msstat <- SumStat(ps.formula, trtgrp="2", data=psdata, weight=c("IPW","overlap","treated","entropy","matching")) #summary(msstat) # importing user-supplied propensity scores "e.h" # fit <- nnet::multinom(formula=ps.formula, data=psdata, maxit=500, trace=FALSE) # e.h <- fit$fitted.values # varname <- c("cov1","cov2","cov3","cov4","cov5","cov6") # msstat0 <- SumStat(zname="trt", xname=varname, data=psdata, ps.estimate=e.h, # trtgrp="2", weight=c("IPW","overlap","treated","entropy","matching")) # summary(msstat0)
data("psdata") # the propensity model ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6 # using SumStat to estimate propensity scores msstat <- SumStat(ps.formula, trtgrp="2", data=psdata, weight=c("IPW","overlap","treated","entropy","matching")) #summary(msstat) # importing user-supplied propensity scores "e.h" # fit <- nnet::multinom(formula=ps.formula, data=psdata, maxit=500, trace=FALSE) # e.h <- fit$fitted.values # varname <- c("cov1","cov2","cov3","cov4","cov5","cov6") # msstat0 <- SumStat(zname="trt", xname=varname, data=psdata, ps.estimate=e.h, # trtgrp="2", weight=c("IPW","overlap","treated","entropy","matching")) # summary(msstat0)
SumStat_cl
is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting with two-level data.
SumStat_cl( ps.formula = NULL, trtgrp = NULL, data = NULL, weight = "overlap", delta = 0, nAGQ = 1L )
SumStat_cl( ps.formula = NULL, trtgrp = NULL, data = NULL, weight = "overlap", delta = 0, nAGQ = 1L )
ps.formula |
an object of class |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
data |
an data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
nAGQ |
integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood. Defaults to 1, corresponding to the Laplace approximation. Please refer to lme4 package for more details. |
A typical form for ps.formula
is treatment ~ terms+1|clusters
where treatment
is the treatment
variable, terms
is a series of terms
which specifies a linear predictor for treatment
, and clusters
is the cluster indicator. The current version supports two-level models and the random-effects term is required to be the last piece in the formula. ps.formula
specifies a mixed-effects logistic regression
model for estimating propensity scores. The treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp
.
Current version of PSweight
allows for five types of propensity score weights used to estimate ATE ("IPW"
), ATT ("treated"
), and
ATO("overlap"
), ATM ("matching"
) and ATEN ("entropy"
). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta
defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta
defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
SumStat_cl returns a SumStat
object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat
can be obtained with summary.SumStat
.
trtgrp
a character indicating the treatment group.
propensity
a data frame of estimated propensity scores.
ps.fitObjects
the fitted propensity model details
ps.weights
a data frame of propensity score weights.
ess
a table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstat
A list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstat
If "IPW"
is included in weight
, this is a list of summary statistics using inverse probability weighting.
ATT.sumstat
If "treated"
is included in weight
, this is a list of summary statistics using the ATT weights.
ATO.sumstat
If "overlap"
is included in weight
, this is a list of summary statistics using the overlap weights.
ATM.sumstat
If "matching"
is included in weight
, this is a list of summary statistics using the matching weights.
ATEN.sumstat
If "entropy"
is included in weight
, this is a list of summary statistics using the entropy weights.
trim
If delta > 0
, this is a table summarizing the number of observations before and after trimming.
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research 29(12), 3721-3756.
Li, F., Zaslavsky, A. M., & Landrum, M. B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373-3387.
data("psdata_cl") # the propensity model # ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt) # using SumStat to estimate propensity scores # msstat <- SumStat_cl(ps.formula, trtgrp="1", data=psdata_cl, # weight=c("IPW","overlap","treated","entropy","matching")) #summary(msstat)
data("psdata_cl") # the propensity model # ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt) # using SumStat to estimate propensity scores # msstat <- SumStat_cl(ps.formula, trtgrp="1", data=psdata_cl, # weight=c("IPW","overlap","treated","entropy","matching")) #summary(msstat)
The vcov
method for class "PSweight"
## S3 method for class 'PSweight' vcov(object, ...)
## S3 method for class 'PSweight' vcov(object, ...)
object |
an object used to select a method. |
... |
further arguments passed to or from other methods. |
The output from vcov