Fit CGDRO in classification model¶

In [ ]:

Copied!





Classification(
    self, 
    f_learner='linear', 
    w_learner='logistic',
    split=True, 
    seed=123
)
Classification(
    self, 
    f_learner='linear', 
    w_learner='logistic',
    split=True, 
    seed=123
)

f_learner (str, optional): method used to fit outcome models on each source. Defaults to 'linear'.
w_learner (str, optional): method used to fit density models on each source. Defaults to 'logistic'.
split (bool, optional): whether to split the source data into two halves for fitting outcome and density models. Defaults to True.
seed (int, optional): random seed for data-splitting. Defaults to 123.

Built-in functions in Classification:

BUilt-in Functions	Description
`fit()`	Fit robust classification model with cross-entropy loss in the target domain.
`predict_proba()`	Make robust classification probability prediction in the target domain.
`predict()`	Make robust label prediction in the target domain.
`infer()`	Build debiased confidence intervals of the target linear regression coefficients.
`summary()`	Summarize the results.

In [ ]:

Copied!





fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    max_iter=1000, 
    tol=1e-6, 
    check_dual=False, 
    verbose=False
)
fit(
    self, 
    X_list, 
    y_list, 
    X0=None, 
    max_iter=1000, 
    tol=1e-6, 
    check_dual=False, 
    verbose=False
)

Arguments:

X_list (list): list of feature matrices on each source domain
y_list (list): list of label arrays on each source domain
X0 (array, optional): feature matrix on the target domain. If None, use the pooled source data as the target data. Defaults to None.
max_iter (int, optional): maximum number of iterations. Defaults to 1000.
tol (float, optional): tolerance for convergence. Defaults to 1e-6.
check_dual (bool, optional): whether to check the duality gap. Defaults to False.
verbose (bool, optional): whether to print out the fitting information. Defaults to False.

Outputs: enabled the following attributes:

.parameters : "coef_": CGDRO aggregated debiased coefficient estimators in the target domain; "weight_": CGDRO aggregated weights of the source domains.

In [ ]:

Copied!





predict_proba(
    self,
    X=None
)
predict_proba(
    self,
    X=None
)

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

proba : classification probability prediction in the target domain.

In [ ]:

Copied!





predict(
    self,
    X=None
)
predict(
    self,
    X=None
)

Arguments:

X : Input features for prediction. If None, uses the training data. Defaults to None.

Outputs:

pred : label prediction in the target domain.

In [ ]:

Copied!





infer(
    self, 
    M=200, 
    alpha=0.05, 
    diag=True, 
    parallel=False, 
    n_workers=4
    )
infer(
    self, 
    M=200, 
    alpha=0.05, 
    diag=True, 
    parallel=False, 
    n_workers=4
    )

Arguments:

M (int, optional): number of resampling iterations. Defaults to 200.
alpha (float, optional): significance level for confidence intervals. Defaults to 0.05.
diag (bool, optional): whether to use diagonal approximation for covariance matrices. Defaults to True.
parallel (bool, optional): whether to use parallel computing. Defaults to False.
n_workers (int, optional): number of workers for parallel computing. Defaults to 4.

Outputs enabled the following attributes:

.CI : CGDRO debiased aggregated confidence intervals of the target domain coefficients.

In [ ]:

Copied!





summary(
    self, 
    index=None,
    class_index=None
)
summary(
    self, 
    index=None,
    class_index=None
)

Arguments -index (array-like or None): 1-based indices of dimensions to print (subset of 1..d). Defaults to all dimensions.

class_index (array-like or None): class labels to print (subset of 1..self.num_class-1). Defaults to all (1..self.num_class-1).

Outputs

Summay of CGDRO aggregated weights, estimators, and confidence intervals.

Example¶

In [ ]:

Copied!





from cgdro import Classification
from cgdro.data import DataContainerSimu_linear_Cl

# two source groups, each with 100 samples, and 1000 target samples
n = 100; p = 5; L = 2; N = 1000; K = 2
data = DataContainerSimu_linear_Cl(n=n, N=N, p=p, L=L, K=K)
data.generate_funcs_list(seed=123)
data.generate_data(seed=123)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target


## first call the module
## then run functions fit() and infer()
cc = Classification(f_learner='linear', w_learner='logistic')
cc.fit(Xlist,Ylist,X0)
cc.infer()

## summary
cc.summary()
cc.summary(
    index = [3,5], class_index=2
)

## predict_proba() and predict()
pred_proba = cc.predict_proba()
print(pred_proba[:10, :])
pred = cc.predict()
print(pred[:10])
from cgdro import Classification
from cgdro.data import DataContainerSimu_linear_Cl

# two source groups, each with 100 samples, and 1000 target samples
n = 100; p = 5; L = 2; N = 1000; K = 2
data = DataContainerSimu_linear_Cl(n=n, N=N, p=p, L=L, K=K)
data.generate_funcs_list(seed=123)
data.generate_data(seed=123)

Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target


## first call the module
## then run functions fit() and infer()
cc = Classification(f_learner='linear', w_learner='logistic')
cc.fit(Xlist,Ylist,X0)
cc.infer()

## summary
cc.summary()
cc.summary(
    index = [3,5], class_index=2
)

## predict_proba() and predict()
pred_proba = cc.predict_proba()
print(pred_proba[:10, :])
pred = cc.predict()
print(pred[:10])