Low-dimensional Linear Regression (family = 'reg_ld')¶

In this module, we assume that the conditional outcome model in each source domain is a low-dimensional linear regression. For more details of methods, please refer CGDRO-Regression.

We can use cgdro_() with family = 'reg_ld' for low-dimensional linear regressions.

Now we give an example showing how to implement family = 'reg_ld' with three different loss functions:

Reward-based loss
Squared loss
Regret-based loss

Example¶

Data Generating Process¶

In this example, we generate a multi-source domain data with $3$ domains, putting $1,000$ samples on each source domain and $10,000$ samples on the target domain. The dimension of the parameters is $p=5$.

In [ ]:

Copied!





# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
data <- simu_linear_reg_lowd(n_list = list(1000,1000,1000), N=10000, p = 5, seed = 123)
Xlist = data$X_list
Ylist = data$Y_list
X0 = data$X0
# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
data <- simu_linear_reg_lowd(n_list = list(1000,1000,1000), N=10000, p = 5, seed = 123)
Xlist = data$X_list
Ylist = data$Y_list
X0 = data$X0

Implementation & Results¶

We implement three loss functions by family = 'reg_ld', including reward, squaredloss, and regret. Geometrically, reward: $f^∗$ is the point closest to the original within the convex hull of ${f(l)}_{l\in[L]}$; squaredloss: $f^{sq}$ corresponds to the source model with the largest noise level with the highest noise level when this noise is substantially higher than that in other sources; regret: $f^{reg}$ is the center of the smallest circle enclosing all individual source models.

Note: In Regression.linear.ld, only when loss_type=reward can we do inference to get confidence intervals, or we can only do point estimation and prediction.

loss_type = reward¶

In [ ]:

Copied!





## Note: only when loss_type='reward', infer_cgdro_() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done

fit <- cgdro_(Xlist, Ylist, X0, loss_type = "reward",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)
inf <- infer_cgdro_(fit, M = 200, alpha = 0.05)
## Note: only when loss_type='reward', infer_cgdro_() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done

fit <- cgdro_(Xlist, Ylist, X0, loss_type = "reward",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)
inf <- infer_cgdro_(fit, M = 200, alpha = 0.05)

In [ ]:

Copied!

summary_cgdro_(fit, infer=inf)
summary_cgdro_(fit, infer=inf)

Model Summary:
=================================
CGDRO Aggregated Weights:

group     |        1        2        3
weight_   |   0.5523   0.2813   0.1665

=================================
CGDRO Aggregated Estimators:

index     |        1        2        3        4        5        6
coef_     |   0.0232  -0.0653  -0.0449   0.0333  -0.0104   0.1307

=================================
Confidence Intervals:

index     |              1              2              3              4              5
CI        | (-0.0467,0.1067) (-0.2273,0.0549) (-0.1714,0.0781) (-0.1005,0.1320) (-0.1502,0.1224)
index     |              6
CI        | (0.0247,0.2149)

We can get statistical inference results from CGDRO, including CGDRO Aggregated Weights (learned weights from each group of source domain), Coefficient Estimators (the worst-case estimators of coefficient on target domain), and Confidence Intervals (valid confidence intervals of target domain coefficient estimators). In the summarized results above, group refers to each group of source domains, index refers to the index of coeffients, starting from the intercept if intercept=TRUE, else starting from the first dimension of coefficient.

Make prediction on target data (you do not have to state the coveriate you use for prediction since target data is the default choice) and show the first 10 predicted values.

In [ ]:

Copied!

pred <- predict_cgdro_(fit)  # N x 1 vector of predicted values
head(pred)
pred <- predict_cgdro_(fit)  # N x 1 vector of predicted values
head(pred)

-0.102162650146038
-0.239544671660532
0.052196477757759
-0.146508834066964
-0.0294789674672407
0.0131166513817656

loss_type = squaredloss¶

In [ ]:

Copied!

fit <- cgdro_(Xlist, Ylist, X0, loss_type = "squaredloss",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)
fit <- cgdro_(Xlist, Ylist, X0, loss_type = "squaredloss",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)

In [ ]:

Copied!

summary_cgdro_(fit)
summary_cgdro_(fit)

Model Summary:
=================================
CGDRO Aggregated Weights:

group     |        1        2        3
weight_   |   0.0000   1.0000   0.0000

=================================
CGDRO Aggregated Estimators:

index     |        1        2        3        4        5        6
coef_     |   0.0533  -0.4480  -0.2918  -0.2533  -0.2652  -0.1019

Confidence Intervals not provided. Run infer_reg_ld() and pass its result via infer=.

In [ ]:

Copied!

pred <- predict_cgdro_(fit)  
head(pred)
pred <- predict_cgdro_(fit)  
head(pred)

-0.556374680737251
0.322332748788102
-0.68009197828785
0.625852942095968
-0.0404564171014494
-0.403366750896784

loss_type = regret¶

In [ ]:

Copied!

fit <- cgdro_(Xlist, Ylist, X0, loss_type = "regret",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)
fit <- cgdro_(Xlist, Ylist, X0, loss_type = "regret",
             family = "reg_ld",  intercept = TRUE,
             delta = 0,  verbose = FALSE)

In [ ]:

Copied!

summary_cgdro_(fit)
summary_cgdro_(fit)

Model Summary:
=================================
CGDRO Aggregated Weights:

group     |        1        2        3
weight_   |   0.4788   0.4924   0.0288

=================================
CGDRO Aggregated Estimators:

index     |        1        2        3        4        5        6
coef_     |   0.0370  -0.1873  -0.0903  -0.0557  -0.0531   0.0704

Confidence Intervals not provided. Run infer_reg_ld() and pass its result via infer=.

In [ ]:

Copied!

pred <- predict_cgdro_(fit)  # N x 1 vector of predicted values
head(pred)
pred <- predict_cgdro_(fit)  # N x 1 vector of predicted values
head(pred)

-0.143457526625113
-0.102260302388424
-0.176093814112722
-0.00923385476387486
0.0281128189924826
-0.128837656162272