ssci function to construct the sparsified simultaneous confidence intervals.
ssci.Rd
ssci function to construct the sparsified simultaneous confidence intervals.
Usage
ssci(
x,
y,
intercept = FALSE,
family = "gaussian",
standardize = FALSE,
bootstrap_rep = 1000,
selection = c("adalasso_refit", "SPSP_adalassoCV", "SPSP_adalasso", "SPSP_lasso"),
alpha = 0.05,
cutting = c("unweighted", "weighted"),
shrink = c("No", "nzero_prob"),
outlyingness = c("StdCoef", "MaxMin", "F"),
cut_F = c("small", "large", "both"),
dispersion = c("Std", "MeanAD_mean", "MediAD_median", "IQR"),
displayMesg = FALSE,
parallel = FALSE,
dataX = NULL,
dataY = NULL,
...
)
Arguments
- x
The input matrix with dimensions (nobs) and (nvars). It has n rows (obs), and p columns (number of covariates).
- y
Response variable. Now,
SSCI
only supportsfamily="gaussian"
with continuous responesy
.- intercept
Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE).
- family
Response type. Either a character string representing one of the built-in families, or else a glm() family object.
- standardize
logical argument. Should conduct standardization before the estimation.
- bootstrap_rep
bootstrap times.
- selection
The selection approach that is adopted for the estimation. The available selection approach are the
"adalasso_refit"
and"SPSP_adalasso"
."adalasso_refit"
is a two-stage selection and estimation approach that conduct the adaptive lasss selection and then least-square refitting to obtain the unbiased estimates."SPSP_adalasso"
is a SPSP approach that partioning the solution paths generated from the adaptive lasso selection method. This approach is more stable than many of the selection methods.- alpha
The significance level of the confidence intervals. Default is 0.05.
- cutting
Cutting method (whether use weights for each line):
unweighted - use all weights are 1 for each cell of estimates;
weighted - use weights as inverse probability of 0 estimates (only 1 model estimate X1 as 0, then weight for that 0 estimate is r/1=1000, all the rest of weights are 1).
- shrink
An argument for intervals shrinkage. Default is "nzero_prob", which use the probability of non-zero estimates of coefficients as the weights to shrink the width of confidence interval of each covariates. This will highly shrink the width of truely zero covariates and mildly shrink the width of truely non-zero covariates.
- outlyingness
outlyingness score for the construction of simultaneous confidence interval.
"StdCoef"
remove bootstraps models by overall outlyingness of standardized estimates;"MaxMin"
represents outlyingness in Dezeure, Buhlmann & Zhang (TEST, 2017), which remove outlying selected models w.r.t the widest variable only (only remove alpha/2% from both side of this widest bootstrap distribution)."StdCoef" - remove bootstraps models by overall outlyingness of standardized estimates (standardize them and order, remove from both sides line-by-line until 5% of lines are removed). This is equivalent to Dezeure et. al's method.
"MaxMin" - represents method in Dezeure, Buhlmann & Zhang(TEST, 2017), which remove outlying selected models w.r.t two distributions: largest and smallest standardized variable (only remove alpha/2% from both side of this widest bootstrap distribution);
"F" - calculate outlyingness score as F-stat of a model nested in the largest model among all bootstrap models (union of all selected covariates).
- cut_F
An argument for different cut-off ways of MCS by F-score, default is "small" model against large full model by removing large F-stat. If "large", remove large models by small F-stat. Or 1-alpha percent on both tails.
- dispersion
The measure of dispersion for the constructing of outlyingness score:
"Std" - \((\beta - mean(\beta))/(\sigma)\);
"MeanAD_mean" - \((\beta - mean(\beta))/mean(|\beta - mean(\beta)|)\);
"MediAD_median" - \((\beta - median(\beta))/median(|\beta - median(\beta)|)\);
"IQR" - \((\beta - median(\beta))/IQR\).
- displayMesg
Logical flag for displaying messages or not.
- parallel
Should parallel
foreach
(default=FALSE) be used to conduct bootstrapping? IfTRUE
, the parallel backend must be registered beforehand, such asdoParallel
or others.- dataX
A new/split design matrix for the calculation of F-stat.
- dataY
An input response for the calculation of F-stat.
- ...
Additional optional arguments.
Value
An object of class "ssci"
is a list containing at least the following
components. This object has "arguments"
attribute saved as c(association, method, resids.type), "responses" attribute, and
"adjustments" attribute.
The list contains:
intvls
The constructed confidence intervals;
ests
All of the bootstrap estimates;
index_alpha
A set contains the indexes of bootstrap estimates that have been removed as outlying estimates.
Examples
data(Boston2)
Xdata <- as.matrix(Boston2[,-14]); Ydata <- Boston2[,14]
Xdata[,"crim"] <- log(Xdata[,"crim"])
Xdata[,"tax"] <- log(Xdata[,"tax"])
Xdata[,"lstat"] <- log(Xdata[,"lstat"])
Xdata[,"dis"] <- log(Xdata[,"dis"])
Xdata[,"age"] <- log(Xdata[,"age"])
Ydata <- scale(Ydata)
for (i in 1:(ncol(Xdata))){
Xdata[,i] <- scale(Xdata[,i])
}
BostonClean <- data.frame(medv = Ydata, Xdata)
ssci_Boston <- ssci(x = Xdata, y = Ydata,
intercept = FALSE,
family = "gaussian",
standardize = FALSE,
parallel = FALSE,
bootstrap_rep = 100,
alpha=0.05, cutting = "unweighted", shrink = "No"
)
#> Warning: executing %dopar% sequentially: no parallel backend registered
print(ssci_Boston, digits = 3)
#> The sparsified simultaneous confidence intervals (Adalasso-refit):
#> --------------------------------------------
#> lower_int upper_int
#> crim 0.000 0.168
#> zn -0.039 0.103
#> indus -0.077 0.058
#> chas 0.000 0.072
#> nox -0.246 0.000
#> rm 0.517 0.698
#> age -0.158 0.000
#> dis -0.342 -0.080
#> rad 0.027 0.133
#> tax -0.202 -0.099
#> ptratio -0.236 -0.114
#> black 0.000 0.114
#> lstat -0.296 -0.115