Skip to contents

ssci function to construct the sparsified simultaneous confidence intervals.

Usage

ssci(
  x,
  y,
  intercept = FALSE,
  family = "gaussian",
  standardize = FALSE,
  bootstrap_rep = 1000,
  selection = c("adalasso_refit", "SPSP_adalassoCV", "SPSP_adalasso", "SPSP_lasso"),
  alpha = 0.05,
  cutting = c("unweighted", "weighted"),
  shrink = c("No", "nzero_prob"),
  outlyingness = c("StdCoef", "MaxMin", "F"),
  cut_F = c("small", "large", "both"),
  dispersion = c("Std", "MeanAD_mean", "MediAD_median", "IQR"),
  displayMesg = FALSE,
  parallel = FALSE,
  dataX = NULL,
  dataY = NULL,
  ...
)

Arguments

x

The input matrix with dimensions (nobs) and (nvars). It has n rows (obs), and p columns (number of covariates).

y

Response variable. Now, SSCI only supports family="gaussian" with continuous respones y.

intercept

Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE).

family

Response type. Either a character string representing one of the built-in families, or else a glm() family object.

standardize

logical argument. Should conduct standardization before the estimation.

bootstrap_rep

bootstrap times.

selection

The selection approach that is adopted for the estimation. The available selection approach are the "adalasso_refit" and "SPSP_adalasso". "adalasso_refit" is a two-stage selection and estimation approach that conduct the adaptive lasss selection and then least-square refitting to obtain the unbiased estimates. "SPSP_adalasso" is a SPSP approach that partioning the solution paths generated from the adaptive lasso selection method. This approach is more stable than many of the selection methods.

alpha

The significance level of the confidence intervals. Default is 0.05.

cutting

Cutting method (whether use weights for each line):

  • unweighted - use all weights are 1 for each cell of estimates;

  • weighted - use weights as inverse probability of 0 estimates (only 1 model estimate X1 as 0, then weight for that 0 estimate is r/1=1000, all the rest of weights are 1).

shrink

An argument for intervals shrinkage. Default is "nzero_prob", which use the probability of non-zero estimates of coefficients as the weights to shrink the width of confidence interval of each covariates. This will highly shrink the width of truely zero covariates and mildly shrink the width of truely non-zero covariates.

outlyingness

outlyingness score for the construction of simultaneous confidence interval. "StdCoef" remove bootstraps models by overall outlyingness of standardized estimates; "MaxMin" represents outlyingness in Dezeure, Buhlmann & Zhang (TEST, 2017), which remove outlying selected models w.r.t the widest variable only (only remove alpha/2% from both side of this widest bootstrap distribution).

  • "StdCoef" - remove bootstraps models by overall outlyingness of standardized estimates (standardize them and order, remove from both sides line-by-line until 5% of lines are removed). This is equivalent to Dezeure et. al's method.

  • "MaxMin" - represents method in Dezeure, Buhlmann & Zhang(TEST, 2017), which remove outlying selected models w.r.t two distributions: largest and smallest standardized variable (only remove alpha/2% from both side of this widest bootstrap distribution);

  • "F" - calculate outlyingness score as F-stat of a model nested in the largest model among all bootstrap models (union of all selected covariates).

cut_F

An argument for different cut-off ways of MCS by F-score, default is "small" model against large full model by removing large F-stat. If "large", remove large models by small F-stat. Or 1-alpha percent on both tails.

dispersion

The measure of dispersion for the constructing of outlyingness score:

  • "Std" - \((\beta - mean(\beta))/(\sigma)\);

  • "MeanAD_mean" - \((\beta - mean(\beta))/mean(|\beta - mean(\beta)|)\);

  • "MediAD_median" - \((\beta - median(\beta))/median(|\beta - median(\beta)|)\);

  • "IQR" - \((\beta - median(\beta))/IQR\).

displayMesg

Logical flag for displaying messages or not.

parallel

Should parallel foreach (default=FALSE) be used to conduct bootstrapping? If TRUE, the parallel backend must be registered beforehand, such as doParallel or others.

dataX

A new/split design matrix for the calculation of F-stat.

dataY

An input response for the calculation of F-stat.

...

Additional optional arguments.

Value

An object of class "ssci" is a list containing at least the following components. This object has "arguments" attribute saved as c(association, method, resids.type), "responses" attribute, and "adjustments" attribute. The list contains:

intvls

The constructed confidence intervals;

ests

All of the bootstrap estimates;

index_alpha

A set contains the indexes of bootstrap estimates that have been removed as outlying estimates.

Examples

data(Boston2)
Xdata <- as.matrix(Boston2[,-14]); Ydata <- Boston2[,14]
Xdata[,"crim"] <- log(Xdata[,"crim"])
Xdata[,"tax"] <- log(Xdata[,"tax"])
Xdata[,"lstat"] <- log(Xdata[,"lstat"])
Xdata[,"dis"] <- log(Xdata[,"dis"])
Xdata[,"age"] <- log(Xdata[,"age"])
Ydata <- scale(Ydata)
for (i in 1:(ncol(Xdata))){
   Xdata[,i] <- scale(Xdata[,i])
}
BostonClean <- data.frame(medv = Ydata, Xdata)

ssci_Boston <- ssci(x = Xdata, y = Ydata,
                    intercept = FALSE,
                    family = "gaussian",
                    standardize = FALSE,
                    parallel = FALSE,
                    bootstrap_rep = 100,
                    alpha=0.05, cutting = "unweighted", shrink = "No"
                    )
#> Warning: executing %dopar% sequentially: no parallel backend registered

print(ssci_Boston, digits = 3)
#> The sparsified simultaneous confidence intervals (Adalasso-refit): 
#> -------------------------------------------- 
#>          lower_int  upper_int
#> crim      0.000      0.168   
#> zn       -0.039      0.103   
#> indus    -0.077      0.058   
#> chas      0.000      0.072   
#> nox      -0.246      0.000   
#> rm        0.517      0.698   
#> age      -0.158      0.000   
#> dis      -0.342     -0.080   
#> rad       0.027      0.133   
#> tax      -0.202     -0.099   
#> ptratio  -0.236     -0.114   
#> black     0.000      0.114   
#> lstat    -0.296     -0.115