Regression.ml¶
In this module, we assume that the conditional outcome model in each source domain is a flexible machine learning model. For more details of methods, please refer CGDRO-Regression.
We can import Regression.ml module by the code below:
from cgdro.Regression import ml
Now we give an example showing how to implement Regression.ml with three different loss functions:
- Reward-based loss
- Squared loss
- Regret-based loss
Module Arguments & Outputs¶
Regression.ml¶
f_learner(str, optional): method used to fit outcome models on each source. Defaults to 'xgb'. Includinglinear,xgb,mlp, andrf.w_learner(str, optional): method used to fit density models on each source. Defaults to 'xgb'. Includinglinear,xgb, andkliep.seed(int, optional): random seed for data-splitting. Defaults to 123.verbose(bool, optional): whether to print out the fitting information. Defaults to False.
Built-in functions in Regression.ml:
| BUilt-in Functions | Description |
|---|---|
fit() |
Fit robust machine learning regression in the target domain. |
predict() |
Make robust prediction in the target domain. |
fit()¶
Arguments:
X_list(list): list of feature matrices on each source domain.y_list(list): list of label arrays on each source domain.X0(array, optional): feature matrix on the target domain. If None, use the pooled source data as the target data. Defaults to None.loss_type(str, optional): type of the loss function used to compute the optimal aggregation weights. Options include 'reward' (default), 'squaredloss', and 'regret'. Defaults to 'reward'.bias_correct(bool, optional): whether to use the bias-corrected estimator of the Gamma matrix. Defaults to True.priors(tuple, optional): prior information on the aggregation weights, given as (prior_weight, rho), where prior_weight is the prior weight vector and rho is the radius of the L2-norm ball around prior_weight. If None, no prior information is used. Defaults to None.
Outputs: enabled the following attributes:
weight_: CGDRO aggregated weights of the source domains.
predict()¶
Arguments:
X: Input features for prediction. If None, uses the training data. Defaults to None.
Outputs:
pred: prediction in the target domain.
Example¶
Data Generating Process¶
In this example, we generate a non-linear multi-source domain data with $3$ domains, putting $10,000$ samples on each source domain and $100,000$ samples on the target domain. The dimension of the parameters is $p=5$,
from cgdro.data import DataContainerSimu_Nonlinear_reg
from cgdro.geometry import *
# number of source groups = 3, each with 10000 samples, and 100000 target samples
# dimension p = 5
# sigma: source group 1,3: 0.5; source group 2: 3.
data = DataContainerSimu_Nonlinear_reg(n=10000, N=100000)
data.generate_funcs_list(L=3, seed=0)
data.generate_data()
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
Implementation & Prediction¶
We implement three loss functions by Regression.ml, including reward, squaredloss, and regret. Geometrically, reward: $f^∗$ is the point closest to the original within the convex hull of ${f(l)}_{l\in[L]}$; squaredloss: $f^{sq}$ corresponds to the source model with the largest noise level with the highest noise level when this noise is substantially higher than that in other sources; regret: $f^{reg}$ is the center of the smallest circle enclosing all individual source models.
loss_type = reward¶
## First announcing the module
## Then calling the functions fit()
## choose xgboost for f_learner and kliep for w_learner
### use 'reward' loss type for high-dimensional regression
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='reward')
## Get CGDRO Aggregated Weights
drol.weight_
array([0.23619293, 0.29104051, 0.47276655])
## Prediction
drol.predict()
array([-1.37911566, -1.01748387, 1.44855369, ..., -1.79589733,
-0.42999676, 2.98552634])
# Geometry view: the convex hull of the source coefficients (cloesest point to the origin)
pred_source = drol.pred_full_mat.T
pred_ch, w_ch = nearest_on_convex_hull(pred_source)
print("Predictions on convex hull:", pred_ch)
print("Weights:", w_ch)
Predictions on convex hull: [-1.38436167 -1.00976801 1.47364845 ... -1.81097766 -0.44569888 2.96259133] Weights: [0.23864968 0.28245441 0.4788959 ]
We can see from the results above, $f^∗$ is the point closest to the original within the convex hull of ${f(l)}_{l\in[L]}$.
loss_type = squaredloss¶
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='squaredloss')
drol.weight_
array([0., 1., 0.])
drol.predict()
array([-0.8600474 , -1.5655055 , -0.60189897, ..., -0.58915877,
0.9634735 , 5.21788502])
# Geometry view: the sufficiently large noise group dominates
pred_source = drol.pred_full_mat.T
pred_source[1]
array([-0.8600474 , -1.5655055 , -0.60189897, ..., -0.58915877,
0.9634735 , 5.21788502])
We can see from the results above, $f^{sq}$ corresponds to the source model with the largest noise level with the highest noise level when this noise is substantially higher than that in other sources.
loss_type = regret¶
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='regret')
drol.weight_
array([0.41122666, 0.21230474, 0.3764686 ])
drol.predict()
array([-1.82018649, -1.35423715, 1.57969483, ..., -1.75827626,
-1.01742939, 1.22626068])
# Geometry view: the center of the minimum enclosing ball of the source coefficients
pred_source = drol.pred_full_mat.T
pred_cr, r_cr, w_cr = circumcenter_3vectors(pred_source)
print("Predictions on center of minimum enclosing ball:", pred_cr)
print("Weights:", w_cr)
Predictions on center of minimum enclosing ball: [ 1.19534202 -0.7734592 0.13471152 ... 1.21178966 0.54131003 -0.07881488] Weights: [0.39353541 0.26546146 0.34100313]
We can see from the results above, regret: $f^{reg}$ is the center of the smallest circle enclosing all individual source models. The following are three examples of Regression.linear.ld with three types of loss functions.
Learning with Prior¶
This experiment demonstrates how prior information can improve the reward objective and produce better aggregated models.
We use the same data-generating process as before, with XGBoost as the outcome learner (f_learner='xgb') and KLIEP as the density-ratio estimator (w_learner='kliep').
We evaluate performance across:
target label sizes $$N_{\text{label}} \in \{20, 50, 100\},$$
prior radius $$\rho \in [0, 0.95].$$
For each combination, we compute the reward $$ \mathbb{E}\bigl[Y^2 - (Y - \hat f(X))^2\bigr], $$ for four different methods, described below.
- Naive ML (No Prior)
This is the standard CGDRO reward estimator:
- learns source models independently,
- aggregates them using the CGDRO optimizer,
- no prior information is used.
This method serves as the baseline.
- Uniform Prior
We add a prior that shrinks aggregation weights toward the uniform distribution: $$ q_{\text{prior}} = \left(\tfrac{1}{L}, \dots, \tfrac{1}{L}\right). $$
The prior strength is controlled by the radius parameter $\rho$. When $\rho=0$, the solution is exactly uniform; when $\rho$ increases, the model relaxes toward the unconstrained CGDRO solution.
This stabilizes estimation when labeled target data is scarce.
- Prior from Labeled Target Data
We estimate a prior weight vector $q_{\text{label}}$ using the few labeled target points.
We solve: $$ q_{\text{label}} = \arg\min_{q\in \Delta^L} \frac{1}{N_{\text{label}}},|Y_{\text{label}} - \hat F_{\text{src}}, q|*2^2, $$ where $\hat F*{\text{src}}$ collects source-model predictions on labeled target covariates.
This $q_{\text{label}}$ becomes a data-informed prior, with radius $\rho$ controlling how strongly it shapes the final aggregation.
This method is particularly effective when the target domain differs systematically from source domains.
- Target-Only Model
We train an XGBoost model purely on the labeled target data:
$$ \hat f_{\text{target}}(x) = \text{xgb.fit}(X_{\text{label}}, Y_{\text{label}}). $$
Because the number of labeled target points is small, this model tends to overfit, but it provides a useful benchmark: “How well can we do using only target data, without borrowing from sources?”
Across all combinations of $N_{\text{label}}$ and $\rho$, the results show that appropriate prior information consistently improves reward performance, especially when labeled target data is limited.
N_labels = [20,50,100]
rhos = np.arange(0.0, 0.95, 0.05)
reward_matrix = np.zeros((len(N_labels), len(rhos), 4)) # 4 methods
for N_label in N_labels:
data = DataContainerSimu_Nonlinear_reg_prior(n=10000, N=100000, N_label=N_label)
data.generate_funcs_list(L=3, seed=0)
data.generate_data()
Xlist = data.X_sources_list
Ylist = data.Y_sources_list
X0 = data.X_target
Y0 = data.Y_target
# Obtain q_label using labeled target data
import cvxpy as cp
pred_label = np.zeros((N_label, data.L))
for l in range(data.L):
pred_label[:, l] = drol.source_full_models[l].model_f.predict(data.X_target_label)
q_label = cp.Variable(data.L, nonneg=True)
constraints = [cp.sum(q_label) == 1]
objective = cp.Minimize(cp.sum_squares(data.Y_target_label - pred_label @ q_label) / N_label)
prob = cp.Problem(objective, constraints)
prob.solve()
q_label = q_label.value
for rho in rhos:
print("N_label:", N_label, "rho:", rho)
## Naive ml without prior
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='reward', priors=None)
pred = drol.predict()
reward = np.mean(Y0**2 - (Y0 - pred) ** 2)
reward_matrix[N_labels.index(N_label), rhos.tolist().index(rho), 0] = reward
## Uniform prior
drol = ml(f_learner='xgb', w_learner='kliep')
uni_weight = np.ones(len(Xlist)) / len(Xlist)
drol.fit(Xlist,Ylist,X0, loss_type='reward', priors=(uni_weight,rho))
pred = drol.predict()
reward = np.mean(Y0**2 - (Y0 - pred) ** 2)
reward_matrix[N_labels.index(N_label), rhos.tolist().index(rho), 1] = reward
## Prior from labeled target data
# Obtain q_label using labeled target data
import cvxpy as cp
pred_label = np.zeros((N_label, data.L))
for l in range(data.L):
pred_label[:, l] = drol.source_full_models[l].model_f.predict(data.X_target_label)
q_label = cp.Variable(data.L, nonneg=True)
constraints = [cp.sum(q_label) == 1]
objective = cp.Minimize(cp.sum_squares(data.Y_target_label - pred_label @ q_label) / N_label)
prob = cp.Problem(objective, constraints)
prob.solve()
q_label = q_label.value
drol = ml(f_learner='xgb', w_learner='kliep')
drol.fit(Xlist,Ylist,X0, loss_type='reward', priors=(q_label,rho))
pred = drol.predict()
reward = np.mean(Y0**2 - (Y0 - pred) ** 2)
reward_matrix[N_labels.index(N_label), rhos.tolist().index(rho), 2] = reward
## Target only
umodel = UtilModels(mode='reg', f_learner=drol.f_learner, w_learner=drol.w_learner, split=False, seed=drol.seed, verbose=False)
umodel.fit_f(data.X_target_label,data.Y_target_label)
pred_label = umodel.model_f.predict(data.X_target)
reward = np.mean(Y0**2 - (Y0 - pred_label) ** 2)
reward_matrix[N_labels.index(N_label), rhos.tolist().index(rho), 3] = reward
import numpy as np
import matplotlib.pyplot as plt
methods = ['No Prior', 'Uniform Prior', 'Labeled Prior', 'Target Only']
colors = ['tab:blue', 'tab:orange', 'tab:green', 'tab:red']
plt.figure(figsize=(16, 6))
# Loop over N_labels
for i, N in enumerate(N_labels):
plt.subplot(1, len(N_labels), i + 1)
for m in range(4):
plt.plot(rhos, reward_matrix[i, :, m], label=methods[m], color=colors[m])
plt.title(f'N_label = {N}')
plt.xlabel('ρ (rho)')
plt.ylabel('Reward')
plt.ylim(-0.4, 1.1 * reward_matrix.max())
plt.legend(fontsize=8)
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()