Risk Adaptation Settings In-Distribution Demos Out-of-Distribution Demos Statistical Results

Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions

Anonymous

TL;DR

Learning to adapt risk online is critical for navigation in uncertain crowds. We present an end-to-end risk-adaptive framework that combines reinforcement learning with a differentiable CVaR barrier-function safety layer, jointly learning nominal control, risk level, and safety margin while enforcing probabilistic safety guarantees. This lets the robot stay cautious in difficult interactions without becoming unnecessarily conservative, and it performs strongly across varying obstacle densities, robot models, and OOD crowd shifts.

Method Overview

Method Overview Figure

Simulation Settings

Aspect Setting
Obstacle Uncertainty Models
Obstacle model 20 dynamic circular obstacles, each with radius Ro = 0.4 m, speeds sampled in [0.5, 1.5] m/s, and randomized initial positions and velocities.
Uncertainty model Obstacle action uncertainty is modeled as a 3-mode GMM with one forward mode and two lateral modes. The weights [0.6, 0.2, 0.2] indicate how likely each mode is, while the standard deviations [0.1, 0.2, 0.2] indicate how spread out each mode is.
Control Limits
Single-integrator ||u|| <= 1.5 m/s.
Unicycle v ∈ [-1.5, 1.5] m/s and ω ∈ [-π/2, π/2] rad/s.
Policy Action
Action form at = (unom,t, βt, ΔRt).
Single-integrator unom,t = [ux, uy]T ∈ R2.
Unicycle unom,t = [vt, ωt]T ∈ R2.
Adaptive parameters βt ∈ [0, βu] is the adaptive CVaR risk level, with user-defined upper bound βu = 0.5. ΔRt ∈ [0, 1.5(Rr + Ro)] is the adaptive safety margin, where Rr and Ro are the robot and obstacle radii.
Baselines
Optimization ORCA, CBF-QP, CVaR-BF-QP, Adaptive-CVaR-BF
RL Vanilla RL, CrowdNav++ (const vel), CrowdNav++ (inferred)
RL + Optimization Safety-filtered RL variants, BarrierNet, Proposed
Stochastic & Unpredictable
Uncooperative & Dense
Constrained & Diverse Models

In-Distribution Demos

Out-of-Distribution Demos

Statistical Results

Table I baseline comparison in 20-obstacle environments
Success rate versus obstacle count for the single-integrator model
Single-integrator
Success rate versus obstacle count for the unicycle model
Unicycle
Q1: How do different method types compare in dynamic environments?
  • RL + Optimization: best safety & efficiency.
  • Optimization only: lower success rate due to infeasibility.
  • RL only: efficient but no formal safety guarantee.
Q2: How do different ways of integrating safety into RL affect performance?
  • End-to-end training: outperforms decoupled post-hoc safety filtering.
  • Joint learning eliminates the optimality gap.
  • Outperforms BarrierNet by modeling stochastic uncertainty.
Q3: How robust are the methods across different robot models and obstacle densities?
  • Proposed method: most robust among others, with graceful degradation.
  • Robot Models: trends remain consistent.
  • Obstacle Density: degrade gracefully.
Table II out-of-distribution generalization under environment shifts
Q4: How robust are different type of methods under OOD environment changes?
  • Proposed method: adaptation gives best OOD success rate.
  • RL only: degrades most under unseen policy shifts.
  • RL + Safety Filter: regains robustness but still lags behind.

BibTeX