Fit CGDRO in classification model¶
In [ ]:
Copied!
Classification(
self,
f_learner='linear',
w_learner='logistic',
split=True,
seed=123
)
Classification(
self,
f_learner='linear',
w_learner='logistic',
split=True,
seed=123
)
f_learner(str, optional): method used to fit outcome models on each source. Defaults to 'linear'.w_learner(str, optional): method used to fit density models on each source. Defaults to 'logistic'.split(bool, optional): whether to split the source data into two halves for fitting outcome and density models. Defaults to True.seed(int, optional): random seed for data-splitting. Defaults to 123.
Built-in functions in Classification:
| BUilt-in Functions | Description |
|---|---|
fit() |
Fit robust classification model with cross-entropy loss in the target domain. |
predict_proba() |
Make robust classification probability prediction in the target domain. |
predict() |
Make robust label prediction in the target domain. |
infer() |
Build debiased confidence intervals of the target linear regression coefficients. |
summary() |
Summarize the results. |
In [ ]:
Copied!
fit(
self,
X_list,
y_list,
X0=None,
max_iter=1000,
tol=1e-6,
check_dual=False,
verbose=False
)
fit(
self,
X_list,
y_list,
X0=None,
max_iter=1000,
tol=1e-6,
check_dual=False,
verbose=False
)
Arguments:
X_list(list): list of feature matrices on each source domainy_list(list): list of label arrays on each source domainX0(array, optional): feature matrix on the target domain. If None, use the pooled source data as the target data. Defaults to None.max_iter(int, optional): maximum number of iterations. Defaults to 1000.tol(float, optional): tolerance for convergence. Defaults to 1e-6.check_dual(bool, optional): whether to check the duality gap. Defaults to False.verbose(bool, optional): whether to print out the fitting information. Defaults to False.
Outputs: enabled the following attributes:
.parameters:"coef_": CGDRO aggregated debiased coefficient estimators in the target domain;"weight_": CGDRO aggregated weights of the source domains.
In [ ]:
Copied!
predict_proba(
self,
X=None
)
predict_proba(
self,
X=None
)
Arguments:
X: Input features for prediction. If None, uses the training data. Defaults to None.
Outputs:
proba: classification probability prediction in the target domain.
In [ ]:
Copied!
predict(
self,
X=None
)
predict(
self,
X=None
)
Arguments:
X: Input features for prediction. If None, uses the training data. Defaults to None.
Outputs:
pred: label prediction in the target domain.
In [ ]:
Copied!
infer(
self,
M=200,
alpha=0.05,
diag=True,
parallel=False,
n_workers=4
)
infer(
self,
M=200,
alpha=0.05,
diag=True,
parallel=False,
n_workers=4
)
Arguments:
M(int, optional): number of resampling iterations. Defaults to 200.alpha(float, optional): significance level for confidence intervals. Defaults to 0.05.diag(bool, optional): whether to use diagonal approximation for covariance matrices. Defaults to True.parallel(bool, optional): whether to use parallel computing. Defaults to False.n_workers(int, optional): number of workers for parallel computing. Defaults to 4.
Outputs enabled the following attributes:
.CI: CGDRO debiased aggregated confidence intervals of the target domain coefficients.
In [ ]:
Copied!
summary(
self,
index=None,
class_index=None
)
summary(
self,
index=None,
class_index=None
)
Arguments
-index (array-like or None): 1-based indices of dimensions to print (subset of 1..d). Defaults to all dimensions.
class_index(array-like or None): class labels to print (subset of 1..self.num_class-1). Defaults to all (1..self.num_class-1).
Outputs
- Summay of CGDRO aggregated weights, estimators, and confidence intervals.
Example¶
In [ ]:
Copied!
from cgdro import Classification
from cgdro.data import DataContainerSimu_linear_Cl
# two source groups, each with 100 samples, and 1000 target samples
n = 100; p = 5; L = 2; N = 1000; K = 2
data = DataContainerSimu_linear_Cl(n=n, N=N, p=p, L=L, K=K)
data.generate_funcs_list(seed=123)
data.generate_data(seed=123)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## first call the module
## then run functions fit() and infer()
cc = Classification(f_learner='linear', w_learner='logistic')
cc.fit(Xlist,Ylist,X0)
cc.infer()
## summary
cc.summary()
cc.summary(
index = [3,5], class_index=2
)
## predict_proba() and predict()
pred_proba = cc.predict_proba()
print(pred_proba[:10, :])
pred = cc.predict()
print(pred[:10])
from cgdro import Classification
from cgdro.data import DataContainerSimu_linear_Cl
# two source groups, each with 100 samples, and 1000 target samples
n = 100; p = 5; L = 2; N = 1000; K = 2
data = DataContainerSimu_linear_Cl(n=n, N=N, p=p, L=L, K=K)
data.generate_funcs_list(seed=123)
data.generate_data(seed=123)
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
## first call the module
## then run functions fit() and infer()
cc = Classification(f_learner='linear', w_learner='logistic')
cc.fit(Xlist,Ylist,X0)
cc.infer()
## summary
cc.summary()
cc.summary(
index = [3,5], class_index=2
)
## predict_proba() and predict()
pred_proba = cc.predict_proba()
print(pred_proba[:10, :])
pred = cc.predict()
print(pred[:10])