Fit CGDRO in high-dimensional linear regression¶
In [ ]:
Copied!
Regression.linear.hd(
self,
intercept=False,
loading_intercept=False,
delta=0,
lam=None,
verbose=False
)
Regression.linear.hd(
self,
intercept=False,
loading_intercept=False,
delta=0,
lam=None,
verbose=False
)
intercept(bool, optional): whether to include intercept in outcome models. Defaults to False.loading_intercept(bool, optional): whether to include intercept in loading matrix. Defaults to False.delta(float, optional): ridge penalty level, non-positive. Defaults to 0.lam(float, optional): Lasso penalty level for high-dimensional regression. Defaults to None.verbose(bool, optional): whether to print out the fitting information. Defaults to False.
Built-in functions in Regression.linear.hd:
| BUilt-in Functions | Description |
|---|---|
fit() |
Fit robust linear regression (high-dim) in the target domain. |
predict() |
Make robust prediction in the target domain. |
infer() |
Build confidence intervals of the target linear regression coefficients. |
summary() |
Summarize the results. |
In [ ]:
Copied!
fit(
self,
X_list,
y_list,
index,
X0=None
)
fit(
self,
X_list,
y_list,
index,
X0=None
)
Arguments:
X_list(list of array-like): list of source domain features, each element is n_i x d.y_list(list of array-like): list of source domain labels, each element is n_i x 1.index(int): index of the loading vector (1-based), the index-th coefficient is of interest.X0(array-like, optional): target domain features, n0 x d. If None, use all sources' data. Defaults to None.
Outputs: enabled the following attributes:
.parameters:"est_bc": CGDRO aggregated debiased loaded coefficient estimators in the target domain;"est_plug": CGDRO aggregated plug-in loaded coefficient estimators in the target domain;"weight_": CGDRO aggregated weights of the source domains.
In [ ]:
Copied!
predict(
self,
X=None
)
predict(
self,
X=None
)
Arguments:
X: Input features for prediction. If None, uses the training data. Defaults to None.
Outputs:
pred: linear prediction in the target domain.
In [ ]:
Copied!
infer(
self,
M=200,
alpha=0.05,
alpha_thres=0.01
)
infer(
self,
M=200,
alpha=0.05,
alpha_thres=0.01
)
M(int, optional): number of resampling iterations. Defaults to 200.alpha(float, optional): significance level for confidence intervals. Defaults to 0.05.alpha_thres(float, optional): threshold for generating samples. Defaults to 0.01.
Outputs enabled the following attributes:
.CI: CGDRO aggregated debiased confidence intervals of the target domain coefficients.
In [ ]:
Copied!
summary(
self
)
summary(
self
)
Outputs
- Summay of CGDRO aggregated weights, estimators, and confidence intervals of interest.
Example¶
In [ ]:
Copied!
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_highd
# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100
data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## First announcing the module
## Then calling the functions fit() and infer()
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)
## Summary
reg.summary()
# Making predictions
pred = reg.predict()
print(pred[:10])
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_highd
# two source groups, each with 100 samples, and 100 target samples
n_list = [100, 100]
N = 100
data = DataContainerSimu_linear_reg_highd(n_list=n_list, N=N, p=100)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## First announcing the module
## Then calling the functions fit() and infer()
## Note: input indexes are 1-based.
reg = linear.hd(verbose=True)
reg.fit(Xlist, Ylist, [1,5,10,98], X0=X0)
reg.infer(M=200, alpha=0.05, alpha_thres=0.01)
## Summary
reg.summary()
# Making predictions
pred = reg.predict()
print(pred[:10])