Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions

TL;DR

Learning to adapt risk online is critical for navigation in uncertain crowds. We present an end-to-end risk-adaptive framework that combines reinforcement learning with a differentiable CVaR barrier-function safety layer, jointly learning nominal control, risk level, and safety margin while enforcing probabilistic safety guarantees. This lets the robot stay cautious in difficult interactions without becoming unnecessarily conservative, and it performs strongly across varying obstacle densities, robot models, and OOD crowd shifts.

How Does Risk Adaptation Work?

The coupling between risk level β and safety margin ΔR improves behavior in two key ways:

Prevent entering bad situations: the robot stays more conservative early on, helping it avoid crowded or high-risk regions in the first place.
Stay stable when constraints become unavoidable: their combination reduces oscillation, hesitation, and unsafe reactions, leading to smoother and more consistent decisions.

u_nom

Policy

Drives maximum efficiency.

Risk Level

Adjusts sensitivity to high-risk interactions within [0, β_u]

ΔR

Safety Margin

Adjusts adaptive spatial buffer when extra caution is needed.

Differentiable CVaR-BF-QP Layer

min 1/2 ||u - u_nom||² subject to CVaR_β(H_m(ΔR)) >= 0,
Provides a probabilistic safety guarantee.

Episode 1

Episode 2

Episode 3

Episode 4

Episode 5

Simulation Settings

Aspect

Setting

Obstacle Uncertainty Models

Obstacle model	20 dynamic circular obstacles, each with radius R_o = 0.4 m, speeds sampled in [0.5, 1.5] m/s, and randomized initial positions and velocities.
Uncertainty model	Obstacle action uncertainty is modeled as a 3-mode GMM with one forward mode and two lateral modes. The weights [0.6, 0.2, 0.2] indicate how likely each mode is, while the standard deviations [0.1, 0.2, 0.2] indicate how spread out each mode is.

Control Limits

Single-integrator	\|\|u\|\| <= 1.5 m/s.
Unicycle	v ∈ [-1.5, 1.5] m/s and ω ∈ [-π/2, π/2] rad/s.

Policy Action

Action form	a_t = (u_nom,t, β_t, ΔR_t).
Single-integrator	u_nom,t = [u_x, u_y]^T ∈ R².
Unicycle	u_nom,t = [v_t, ω_t]^T ∈ R².
Adaptive parameters	β_t ∈ [0, β_u] is the adaptive CVaR risk level, with user-defined upper bound β_u = 0.5. ΔR_t ∈ [0, 1.5(R_r + R_o)] is the adaptive safety margin, where R_r and R_o are the robot and obstacle radii.

Baselines

Optimization	ORCA, CBF-QP, CVaR-BF-QP, Adaptive-CVaR-BF
RL	Vanilla RL, CrowdNav++ (const vel), CrowdNav++ (inferred)
RL + Optimization	Safety-filtered RL variants, BarrierNet, Proposed

Stochastic & Unpredictable

Uncooperative & Dense

Constrained & Diverse Models

Statistical Results

Table I baseline comparison in 20-obstacle environments

Success rate versus obstacle count for the single-integrator model — Single-integrator

Success rate versus obstacle count for the unicycle model — Unicycle

Q1: How do different method types compare in dynamic environments?

RL + Optimization: best safety & efficiency.
Optimization only: lower success rate due to infeasibility.
RL only: efficient but no formal safety guarantee.

Table I

Q2: How do different ways of integrating safety into RL affect performance?

End-to-end training: outperforms decoupled post-hoc safety filtering.
Joint learning eliminates the optimality gap.
Outperforms BarrierNet by modeling stochastic uncertainty.

Table I

Q3: How robust are the methods across different robot models and obstacle densities?

Proposed method: most robust among others, with graceful degradation.
Robot Models: trends remain consistent.
Obstacle Density: degrade gracefully.

Table I Figure I

Table II out-of-distribution generalization under environment shifts

Q4: How robust are different type of methods under OOD environment changes?

Proposed method: adaptation gives best OOD success rate.
RL only: degrades most under unseen policy shifts.
RL + Safety Filter: regains robustness but still lags behind.

Table II

Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions

TL;DR

Method Overview

How Does Risk Adaptation Work?

Policy

Risk Level

Safety Margin

Simulation Settings

In-Distribution Demos

Out-of-Distribution Demos

Statistical Results

BibTeX