Fit CGDRO in low-dimensional linear regression¶
In [ ]:
Copied!
Regression.linear.ld(
self,
intercept=False,
delta=0,
verbose=False
)
Regression.linear.ld(
self,
intercept=False,
delta=0,
verbose=False
)
intercept(bool, optional): whether to include intercept in outcome models. Defaults to False.delta(float, optional): ridge penalty level, non-positive. Defaults to 0.verbose(bool, optional): whether to print out the fitting information. Defaults to False.
Built-in functions in Regression.linear.ld:
| BUilt-in Functions | Description |
|---|---|
fit() |
Fit robust linear regression (low-dim) in the target domain. |
predict() |
Make robust prediction in the target domain. |
infer() |
Build confidence intervals of the target linear regression coefficients. |
summary() |
Summarize the results. |
In [ ]:
Copied!
fit(
self,
X_list,
y_list,
X0=None,
loss_type='reward'
)
fit(
self,
X_list,
y_list,
X0=None,
loss_type='reward'
)
Arguments:
X_list(list of array-like): list of source domain features, each element is n_i x d.y_list(list of array-like): list of source domain labels, each element is n_i x 1.loss_type(str, optional): type of the loss function used to compute the optimal aggregation weights. Options include 'reward' (default), 'squaredloss', and 'regret'. Defaults to 'reward'.X0(array-like, optional): target domain features, n0 x d. If None, use all sources' data. Defaults to None.
Outputs: enabled the following attributes:
.parameters:"coef_": CGDRO aggregated coefficient estimators in the target domain;"weight_": CGDRO aggregated weights of the source domains.
In [ ]:
Copied!
predict(
self,
X=None
)
predict(
self,
X=None
)
Arguments:
X: Input features for prediction. If None, uses the training data. Defaults to None.
Outputs:
pred: linear prediction in the target domain.
In [ ]:
Copied!
infer(
self,
M=200,
alpha=0.05,
alpha_thres=0.01
)
infer(
self,
M=200,
alpha=0.05,
alpha_thres=0.01
)
Arguments:
M(int, optional): number of resampling iterations. Defaults to 200.alpha(float, optional): significance level for confidence intervals. Defaults to 0.05.alpha_thres(float, optional): threshold for generating samples. Defaults to 0.01.
Outputs enabled the following attributes:
.CI: CGDRO aggregated confidence intervals of the target domain coefficients.
In [ ]:
Copied!
summary(
self,
index=None
)
summary(
self,
index=None
)
Arguments
index(list or int optional) : index of interest in the coefficients. Defaults toNone.
Outputs
- Summay of CGDRO aggregated weights, estimators, and confidence intervals.
Example¶
In [ ]:
Copied!
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_lowd
# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
n_list = [1000, 1000, 1000]
N = 10000 # target sample size
data = DataContainerSimu_linear_reg_lowd(n_list=n_list, N=N, p=5)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## First announcing the module
## Then calling the functions fit() and infer()
## Note: only when loss_type='reward', infer() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='reward')
reg.infer(alpha=0.05)
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])
## First announcing the module
## Then calling the functions fit()
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='squaredloss')
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])
## First announcing the module
## Then calling the functions fit()
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='regret')
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_lowd
# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
n_list = [1000, 1000, 1000]
N = 10000 # target sample size
data = DataContainerSimu_linear_reg_lowd(n_list=n_list, N=N, p=5)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## First announcing the module
## Then calling the functions fit() and infer()
## Note: only when loss_type='reward', infer() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='reward')
reg.infer(alpha=0.05)
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])
## First announcing the module
## Then calling the functions fit()
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='squaredloss')
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])
## First announcing the module
## Then calling the functions fit()
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='regret')
## summarize the fitted model
reg.summary()
## prediction on target data
pred = reg.predict()
print(pred[:10])