Disentangled Multi-Context Meta-Learning
Unlocking Robust and Generalized Task Learning

Agency for Defense Development (ADD)
CoRL 2025

*Equal contribution

Video Presentation

Abstract

In meta-learning and its downstream tasks, many methods rely on implicit adaptation to task variations, where multiple factors are mixed together in a single entangled representation. This makes it difficult to interpret which factors drive performance and can hinder generalization. In this work, we introduce a disentangled multi-context meta-learning framework that explicitly assigns each task factor to a distinct context vector. By decoupling these variations, our approach improves robustness through deeper task understanding and enhances generalization by enabling context vector sharing across tasks with shared factors. We evaluate our approach in two domains. First, on a sinusoidal regression task, our model outperforms baselines on out-of-distribution tasks and generalizes to unseen sine functions by sharing context vectors associated with shared amplitudes or phase shifts. Second, in a quadruped robot locomotion task, we disentangle the robot-specific properties and the characteristics of the terrain in the robot dynamics model. By transferring disentangled context vectors acquired from the dynamics model into reinforcement learning, the resulting policy achieves improved robustness under out-of-distribution conditions, surpassing the baseline that relies on a single unified context. Furthermore, by effectively sharing context, our model enables successful sim-to-real policy transfer to challenging terrains with out-of-distribution robot-specific properties, using just 20 seconds of real data from flat terrain, which is not achievable with single-task adaptation.

Recombination of disentangled context vectors at three-factor sine task

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Result of recombination of three contexts. Each dotted blue, green, orange line is predicted by the model with adaptation of shown points. Solid line is the ground truth. Red dotted lines are predictions by model with amplitude(blue), phase shift(green), yshift(orange) context vectors without adaptation.

Locomotion at Stair with lidar payload (Inner-distribution)

Multi-DMCM policy - Success
Vanilla Baseline policy - Success
Single-CAVIA policy - Success

Locomotion at Low P gain (Out-Of-Distribution)

Multi-DMCM policy - 80% Success
Vanilla Baseline policy - 40% Success
Single-CAVIA policy - 0% Success

Multi-DMCM with assymetric water bottle payload (Out-Of-Distribution)

Multi-DMCM policy at wavy terrain with asymmetric payload
Multi-DMCM policy at stair terrain with asymmetric payload

BibTeX

@inproceedings{dmcm,
  title     = {Disentangled Multi-Context Meta-Learning: Unlocking Robust and Generalized Task Learning},
  author    = {Seonsoo Kim and Jun-Gill Kang and Taehong Kim and Seongil Hong},
  journal = {arXiv preprint arXiv:2509.01297},
  year      = {2025}
}