PCIe Heterogeneity and Smart RNIC Selection (SRS)

In current servers based on PCIe heterogeneous architectures (e.g., CPU, GPU, RNIC), we encounter network performance bottlenecks caused by topology differences. Manifestations include failed cross-device communication and imbalanced traffic across multi-NIC ports, resulting in RDMA performance falling short of expectations. To systematically address these issues, we designed corresponding optimization schemes for different hardware configurations.

Feature Overview

  • Heterogeneity support: topology-aware device binding
  • Traffic balancing: RDMA QP scheduling that balances across ports
  • Adaptivity: heterogeneity-aware adaptive strategies

Supported Heterogeneous Scenarios

PCIe heterogeneity scenarioSolution
One GPU, one RNIC, one portSupported
One GPU, one RNIC, two portsRDMA QPs use the two ports evenly
Two GPUs, two RNICs, two portsSupported
Two GPUs, two RNICs, four portsOne GPU uses one RNIC; RDMA QPs evenly use all ports

Configuration

Default behavior

SRS is enabled by default and automatically makes the optimal choice:

# Enabled by default — no extra configuration required
# VCCL will automatically discover and select the optimal RNIC