High-dimensional Linear Regression (family = 'reg_hd')¶
In this module, we consider models of each source domain are high-dimensional regressions. For more details of methods, please refer CGDRO-Regression.
We can use cgdro_() with family = 'reg_hd' for high-dimensional linear regressions.
Example¶
Data Generating Process¶
In this example, we generate a high-dimensional multi-source domain data with $2$ domains, putting $100$ samples on each source domain and $100$ samples on the target domain. The dimension of the parameters is $p=100$.
# two source groups, each with 100 samples, and 100 target samples
data <- simu_linear_reg_highd(n_list = c(100, 100), N = 100, p = 100, seed = 123)
Xlist = data$X_list
Ylist = data$Y_list
X0 = data$X0
Implementation & Results¶
## Fit CGDRO model and do inference on coefficients with index 1,10,45,99 (no intercept)
fit <- cgdro_(Xlist, Ylist, X0 = X0,
family = "reg_hd",
index = c(1,10,45,99), intercept = FALSE,
delta = 0, lambda = "CV.min", verbose = TRUE)
inf <- infer_cgdro_(fit, M = 200, alpha = 0.05)
start high-dimensional fitting----- ---> Computing for loading (1/1)... The projection direction is identified at mu = 0.02671at step =6 ---> Computing for loading (1/1)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (1/1)... The projection direction is identified at mu = 0.02671at step =6 ---> Computing for loading (1/1)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (1/4)... The projection direction is identified at mu = 0.060097at step =4 ---> Computing for loading (2/4)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (3/4)... The projection direction is identified at mu = 0.060097at step =4 ---> Computing for loading (4/4)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (1/4)... The projection direction is identified at mu = 0.060097at step =4 ---> Computing for loading (2/4)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (3/4)... The projection direction is identified at mu = 0.040065at step =5 ---> Computing for loading (4/4)... The projection direction is identified at mu = 0.040065at step =5
summary_cgdro_(fit, infer=inf)
Model Summary: ================================= CGDRO Aggregated Weights: group | 1 2 weight_ | 0.2718 0.7282 ================================= Plug-in Estimators: index | 1 10 45 99 coef_ | 0.0272 -0.0586 0.0000 -0.0069 ================================= Debiased Estimators: index | 1 10 45 99 coef_ | 0.2640 -0.2182 -0.0675 -0.1037 ================================= Confidence Intervals: index | 1 10 45 99 CI | (-0.2504,0.6375) (-0.9175,0.4246) (-0.4859,0.3941) (-0.6657,0.3350)
We can get statistical inference results from CGDRO, including CGDRO Aggregated Weights (learned weights from each group of source domain), Coefficient Estimators (the worst-case estimators of coefficient on target domain), and Confidence Intervals (valid confidence intervals of target domain coefficient estimators). In the summarized results above, group refers to each group of source domains, index refers to the index of coeffients, starting from the intercept if intercept=TRUE, else starting from the first dimension of coefficient.
Prediction¶
Make prediction on target data (you do not have to state the coveriate you use for prediction since target data is the default choice) and show the first 6 predicted values.
pred <- predict_cgdro_(fit) # N x 1 vector of predicted values
head(pred)
- -0.0614912721286922
- 0.188315681650812
- -1.1826602470143
- -0.371092358300653
- 0.720142772734227
- 0.794176188386739