Fit CGDRO in high-dimensional linear regression¶

In [ ]:

Copied!





Regression.linear.hd(
    self,
    intercept=False, 
    loading_intercept=False, 
    delta=0, 
    lam=None, 
    verbose=False
)
Regression.linear.hd(
    self,
    intercept=False, 
    loading_intercept=False, 
    delta=0, 
    lam=None, 
    verbose=False
)

intercept (bool, optional): whether to include intercept in outcome models. Defaults to False.
loading_intercept (bool, optional): whether to include intercept in loading matrix. Defaults to False.
delta (float, optional): ridge penalty level, non-positive. Defaults to 0.
lam (float, optional): Lasso penalty level for high-dimensional regression. Defaults to None.
verbose (bool, optional): whether to print out the fitting information. Defaults to False.

Built-in functions in Regression.linear.hd:

BUilt-in Functions	Description
`fit()`	Fit robust linear regression (high-dim) in the target domain.
`predict()`	Make robust prediction in the target domain.
`infer()`	Build confidence intervals of the target linear regression coefficients.
`summary()`	Summarize the results.

In [ ]:

Copied!





fit(
    self, 
    X_list, 
    y_list, 
    index, 
    X0=None
)
fit(
    self, 
    X_list, 
    y_list, 
    index, 
    X0=None
)

Arguments:

X_list (list of array-like): list of source domain features, each element is n_i x d.
y_list (list of array-like): list of source domain labels, each element is n_i x 1.
index (int): index of the loading vector (1-based), the index-th coefficient is of interest.
X0 (array-like, optional): target domain features, n0 x d. If None, use all sources' data. Defaults to None.

Outputs: enabled the following attributes:

.parameters : "est_bc": CGDRO aggregated debiased loaded coefficient estimators in the target domain; "est_plug": CGDRO aggregated plug-in loaded coefficient estimators in the target domain; "weight_": CGDRO aggregated weights of the source domains.

In [ ]:

Copied!





predict(
    self,
    X=None
)
predict(
    self,
    X=None
)

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

pred : linear prediction in the target domain.

In [ ]:

Copied!





infer(
    self, 
    M=200, 
    alpha=0.05, 
    alpha_thres=0.01
)
infer(
    self, 
    M=200, 
    alpha=0.05, 
    alpha_thres=0.01
)

M (int, optional): number of resampling iterations. Defaults to 200.
alpha (float, optional): significance level for confidence intervals. Defaults to 0.05.
alpha_thres (float, optional): threshold for generating samples. Defaults to 0.01.

Outputs enabled the following attributes:

.CI : CGDRO aggregated debiased confidence intervals of the target domain coefficients.

In [ ]:

Copied!

summary(
    self
)
summary(
    self
)

Outputs

Summay of CGDRO aggregated weights, estimators, and confidence intervals of interest.

Example¶

In [ ]:

Copied!





from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_highd

# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100

data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target

## First announcing the module
## Then calling the functions fit() and infer()
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)

## Summary
reg.summary()

# Making predictions
pred = reg.predict()
print(pred[:10])
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_highd

# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100

data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target

## First announcing the module
## Then calling the functions fit() and infer()
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)

## Summary
reg.summary()

# Making predictions
pred = reg.predict()
print(pred[:10])