Regression.linear.hd¶

In this module, we consider models of each source domain are high-dimensional regressions. For more details of methods, please refer CGDRO-Regression.

We can import Regression.linear module by the code below:

In [ ]:

Copied!

# import Regression.linear module
from cgdro.Regression import linear
# import Regression.linear module
from cgdro.Regression import linear

Module Arguments & Outputs¶

Regression.linear.hd¶

intercept (bool, optional): whether to include intercept in outcome models. Defaults to False.
loading_intercept (bool, optional): whether to include intercept in loading matrix. Defaults to False.
delta (float, optional): ridge penalty level, non-positive. Defaults to 0.
lam (float, optional): Lasso penalty level for high-dimensional regression. Defaults to None.
verbose (bool, optional): whether to print out the fitting information. Defaults to False.

Built-in functions in Regression.linear.hd:

BUilt-in Functions	Description
`fit()`	Fit robust linear regression (high-dim) in the target domain.
`predict()`	Make robust prediction in the target domain.
`infer()`	Build confidence intervals of the target linear regression coefficients.
`summary()`	Summarize the results.

fit()¶

Arguments:

X_list (list of array-like): list of source domain features, each element is n_i x d.
y_list (list of array-like): list of source domain labels, each element is n_i x 1.
index (int): index of the loading vector (1-based), the index-th coefficient is of interest.
X0 (array-like, optional): target domain features, n0 x d. If None, use all sources' data. Defaults to None.

Outputs: enabled the following attributes:

.parameters : "est_bc": CGDRO aggregated debiased loaded coefficient estimators in the target domain; "est_plug": CGDRO aggregated plug-in loaded coefficient estimators in the target domain; "weight_": CGDRO aggregated weights of the source domains.

predict()¶

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

pred : linear prediction in the target domain.

infer()¶

Arguments:

M (int, optional): number of resampling iterations. Defaults to 200.
alpha (float, optional): significance level for confidence intervals. Defaults to 0.05.
alpha_thres (float, optional): threshold for generating samples. Defaults to 0.01.

Outputs enabled the following attributes:

.CI : CGDRO aggregated debiased confidence intervals of the target domain coefficients.

summary()¶

Outputs

Summay of CGDRO aggregated weights, estimators, and confidence intervals of interest.

Example¶

Data Generating Process¶

In this example, we generate a high-dimensional multi-source domain data with $2$ domains, putting $100$ samples on each source domain and $100$ samples on the target domain. The dimension of the parameters is $p=100$.

In [ ]:

Copied!





from cgdro.data import DataContainerSimu_linear_reg_highd

# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100

data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
from cgdro.data import DataContainerSimu_linear_reg_highd

# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100

data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target

Implementation & Results¶

In [ ]:

Copied!





## First announcing the module
## Then calling the functions fit() and infer()
## We select the indexes of coefficients to do inference on [1,5,10,98]
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)
## First announcing the module
## Then calling the functions fit() and infer()
## We select the indexes of coefficients to do inference on [1,5,10,98]
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)

Argument 'loading_intercept' set to False because intercept is False
start fitting-----
======> Bias Correction for initial estimators....
---> Computing for loading (1/4)...
The projection direction is identified at xi = 0.040065 at step = 5.0
---> Computing for loading (2/4)...
The projection direction is identified at xi = 0.060097 at step = 4.0
---> Computing for loading (3/4)...
The projection direction is identified at xi = 0.060097 at step = 4.0
---> Computing for loading (4/4)...
The projection direction is identified at xi = 0.060097 at step = 4.0
---> Computing for loading (1/4)...
The projection direction is identified at xi = 0.040065 at step = 5.0
---> Computing for loading (2/4)...
The projection direction is identified at xi = 0.040065 at step = 5.0
---> Computing for loading (3/4)...
The projection direction is identified at xi = 0.060097 at step = 4.0
---> Computing for loading (4/4)...
The projection direction is identified at xi = 0.040065 at step = 5.0
======> Bias Correction for matrix Gamma....
---> Computing for loading (1/1)...
The projection direction is identified at xi = 0.040065 at step = 5.0
---> Computing for loading (1/1)...
The projection direction is identified at xi = 0.026710 at step = 6.0
---> Computing for loading (1/1)...
The projection direction is identified at xi = 0.026710 at step = 6.0
---> Computing for loading (1/1)...
The projection direction is identified at xi = 0.040065 at step = 5.0

In [ ]:

Copied!

reg.summary()
reg.summary()

Model Summary:
=================================
CGDRO Aggregated Weights:

group     |        1        2
weight_   |   0.1574   0.8426

=================================
Plug-in Estimators:

index     |        1        5       10       98
coef_     |   0.0079   0.0803   0.1087   0.0359

=================================
Debiased Estimators:

index     |        1        5       10       98
coef_     |   0.2073  -0.0132   0.0497   0.5338

=================================
Confidence Intervals:

index     |              1              5             10             98
CI        | (-0.1542,0.5687) (-0.4427,0.4164) (-0.3308,0.4301) (0.0375,1.0301)

We can get statistical inference results from CGDRO, including CGDRO Aggregated Weights (learned weights from each group of source domain), Coefficient Estimators (the worst-case estimators of coefficient on target domain), and Confidence Intervals (valid confidence intervals of target domain coefficient estimators). In the summarized results above, group refers to each group of source domains, index refers to the index of coeffients, starting from the intercept if intercept=True, else starting from the first dimension of coefficient.

Prediction¶

Make prediction on target data (you do not have to state the coveriate you use for prediction since target data is the default choice) and show the first 10 predicted values.

In [ ]:

Copied!

pred = reg.predict()
print(pred[:10])
pred = reg.predict()
print(pred[:10])

[ 0.5931408   0.01983438 -0.12045868  0.12917504 -0.68239194 -0.49449755
 -0.18172753  0.15242893 -0.76921899  1.13411843]