1. CGDRO, multi-source learning with no target labeled data.

Integrate diverse data sources, provide robust prediction and statistical inference on your target domain.

In many real-world applications, we often need to make predictions in a new environment where no labeled data are available. For instance, consider training models on patient records from several hospitals, and then deploying them in a new hospital whose patient population may differ. Each source hospital offers valuable information, but their data distributions may vary from the target one, making direct model transfer unreliable. This setting, known as multi-source unsupervised domain adaptation (MSDA), presents a fundamental challenge:

How can we learn a model that performs reliably on the unlabeled target domain,
by leveraging the labeled source domains?

Figure 1 provides a visual illustration of this setup.

Illustration of Multi-source Unsupervised Domain Adaptation.

Figure 1. Illustration of Multi-source Unsupervised Domain Adaptation. Each source domain provides labeled data, while the target domain contains only unlabeled data. (Guo et al. (2025))

The CGDRO package is designed to tackle exactly this problem. CGDRO stands for Conditional Group Distributionally Robust Optimization, a principled framework for constructing prediction models that are robust across domains. It not only learns models that generalize to unseen target distributions, but also includes built-in statistical inference tools, enabling users to quantify uncertainty and perform hypothesis testing on model parameters.


© 2025 CGDRO • Authors: Team • Visits: