Fit CGDRO in machine learning prediction model¶

In [ ]:

Copied!





Regression.ml(
    self, 
    f_learner = 'xgb', 
    w_learner = 'logistic', 
    seed = 123, 
    verbose = False
)
Regression.ml(
    self, 
    f_learner = 'xgb', 
    w_learner = 'logistic', 
    seed = 123, 
    verbose = False
)

f_learner (str, optional): method used to fit outcome models on each source. Defaults to 'xgb'. Including linear, xgb, mlp, and rf.
w_learner (str, optional): method used to fit density models on each source. Defaults to 'xgb'. Including logistic, xgb, and kliep.
seed (int, optional): random seed for data-splitting. Defaults to 123.
verbose (bool, optional): whether to print out the fitting information. Defaults to False.

Built-in functions in Regression.ml:

BUilt-in Functions	Description
`fit()`	Fit robust machine learning regression in the target domain.
`predict()`	Make robust prediction in the target domain.

In [ ]:

Copied!





fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    loss_type='reward', 
    bias_correct=True, 
    prior=None
)
fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    loss_type='reward', 
    bias_correct=True, 
    prior=None
)

Arguments:

X_list (list): list of feature matrices on each source domain.
y_list (list): list of label arrays on each source domain.
X0 (array, optional): feature matrix on the target domain. If None, use the pooled source data as the target data. Defaults to None.
loss_type (str, optional): type of the loss function used to compute the optimal aggregation weights. Options include 'reward' (default), 'squaredloss', and 'regret'. Defaults to 'reward'.
bias_correct (bool, optional): whether to use the bias-corrected estimator of the Gamma matrix. Defaults to True.
priors (tuple, optional): prior information on the aggregation weights, given as (prior_weight, rho), where prior_weight is the prior weight vector and rho is the radius of the L2-norm ball around prior_weight. If None, no prior information is used. Defaults to None.

Outputs: enabled the following attributes:

weight_: CGDRO aggregated weights of the source domains.

In [ ]:

Copied!





predict(
    self,
    X=None
)
predict(
    self,
    X=None
)

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

pred : prediction in the target domain.

Example¶

In [ ]:

Copied!





from cgdro.Regression import ml
from cgdro.data import DataContainerSimu_Nonlinear_reg

# number of source groups = 3, each with 10000 samples, and 100000 target samples
# dimension p = 5
# sigma: source group 1,3: 0.5; source group 2: 3.
data = DataContainerSimu_Nonlinear_reg(n=10000, N=100000)
data.generate_funcs_list(L=3, seed=0)
data.generate_data()

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target


## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='reward')
drol.weight_

## prediction
drol.predict()

## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='squaredloss')
drol.weight_

## prediction
drol.predict()

## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='regret')
drol.weight_

## prediction
drol.predict()
from cgdro.Regression import ml
from cgdro.data import DataContainerSimu_Nonlinear_reg

# number of source groups = 3, each with 10000 samples, and 100000 target samples
# dimension p = 5
# sigma: source group 1,3: 0.5; source group 2: 3.
data = DataContainerSimu_Nonlinear_reg(n=10000, N=100000)
data.generate_funcs_list(L=3, seed=0)
data.generate_data()

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target


## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='reward')
drol.weight_

## prediction
drol.predict()

## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='squaredloss')
drol.weight_

## prediction
drol.predict()

## First announcing the module
## Then calling the functions fit() 
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='regret')
drol.weight_

## prediction
drol.predict()