Fit CGDRO in low-dimensional linear regression¶

In [ ]:

Copied!





Regression.linear.ld(
    self, 
    intercept=False,
    delta=0,
    verbose=False
)
Regression.linear.ld(
    self, 
    intercept=False,
    delta=0,
    verbose=False
)

intercept (bool, optional): whether to include intercept in outcome models. Defaults to False.
delta (float, optional): ridge penalty level, non-positive. Defaults to 0.
verbose (bool, optional): whether to print out the fitting information. Defaults to False.

Built-in functions in Regression.linear.ld:

BUilt-in Functions	Description
`fit()`	Fit robust linear regression (low-dim) in the target domain.
`predict()`	Make robust prediction in the target domain.
`infer()`	Build confidence intervals of the target linear regression coefficients.
`summary()`	Summarize the results.

In [ ]:

Copied!





fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    loss_type='reward'
)
fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    loss_type='reward'
)

Arguments:

X_list (list of array-like): list of source domain features, each element is n_i x d.
y_list (list of array-like): list of source domain labels, each element is n_i x 1.
loss_type (str, optional): type of the loss function used to compute the optimal aggregation weights. Options include 'reward' (default), 'squaredloss', and 'regret'. Defaults to 'reward'.
X0 (array-like, optional): target domain features, n0 x d. If None, use all sources' data. Defaults to None.

Outputs: enabled the following attributes:

.parameters : "coef_": CGDRO aggregated coefficient estimators in the target domain; "weight_": CGDRO aggregated weights of the source domains.

In [ ]:

Copied!





predict(
    self,
    X=None
)
predict(
    self,
    X=None
)

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

pred : linear prediction in the target domain.

In [ ]:

Copied!





infer(
    self, 
    M=200, 
    alpha=0.05, 
    alpha_thres=0.01
)
infer(
    self, 
    M=200, 
    alpha=0.05, 
    alpha_thres=0.01
)

Arguments:

M (int, optional): number of resampling iterations. Defaults to 200.
alpha (float, optional): significance level for confidence intervals. Defaults to 0.05.
alpha_thres (float, optional): threshold for generating samples. Defaults to 0.01.

Outputs enabled the following attributes:

.CI : CGDRO aggregated confidence intervals of the target domain coefficients.

In [ ]:

Copied!





summary(
    self, 
    index=None
)
summary(
    self, 
    index=None
)

Arguments

index (list or int optional) : index of interest in the coefficients. Defaults to None.

Outputs

Summay of CGDRO aggregated weights, estimators, and confidence intervals.

Example¶

In [ ]:

Copied!





from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_lowd

# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
n_list = [1000, 1000, 1000]
N = 10000  # target sample size
data = DataContainerSimu_linear_reg_lowd(n_list=n_list, N=N, p=5)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target

## First announcing the module
## Then calling the functions fit() and infer()
## Note: only when loss_type='reward', infer() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='reward')
reg.infer(alpha=0.05)

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])

## First announcing the module
## Then calling the functions fit() 
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='squaredloss')

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])

## First announcing the module
## Then calling the functions fit() 
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='regret')

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])
from cgdro.Regression import linear
from cgdro.data import DataContainerSimu_linear_reg_lowd

# number of source groups = 3, with 1000 samples each
# sigma: source group 1,3: 0.5; source group 2: 2
# target sample size = 10000
# dimension p = 5
n_list = [1000, 1000, 1000]
N = 10000  # target sample size
data = DataContainerSimu_linear_reg_lowd(n_list=n_list, N=N, p=5)
data.generate_funcs_list(seed=0)
data.generate_data(seed=0)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target

## First announcing the module
## Then calling the functions fit() and infer()
## Note: only when loss_type='reward', infer() can be called to get confidence intervals
## For other loss_type, only point estimation and prediction can be done
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='reward')
reg.infer(alpha=0.05)

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])

## First announcing the module
## Then calling the functions fit() 
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='squaredloss')

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])

## First announcing the module
## Then calling the functions fit() 
reg = linear.ld()
reg.fit(Xlist, Ylist, X0, loss_type='regret')

## summarize the fitted model
reg.summary()

## prediction on target data
pred = reg.predict()
print(pred[:10])