PXN
环境变量是(默认会打开):
NCCL_PARAM(PxnDisable, "PXN_DISABLE", 0);PXN的topo判断逻辑
- 需要确保PCI连接到NIC
- 需要确保NVLink连接到目标GPU
- 需要确保在同一节点
- 需要确保更高带宽到NIC
- 需要确保避免通过CPU
ncclResult_t ncclTopoComputePaths(...) {
...
if (peerNode->paths[NET][n].type <= PATH_PXB && // Is connected to the NIC through PCI
peerNode->paths[GPU][g].type <= PATH_NVL && // Is connected to us through NVLink
NCCL_TOPO_ID_SYSTEM_ID(peerNode->id) == NCCL_TOPO_ID_SYSTEM_ID(gpu->id) && // Is on the same node as us
(peerNode->paths[NET][n].bw > gpu->paths[NET][n].bw || // Has either higher BW to that NIC
gpu->paths[NET][n].type > PATH_PXB))
...
}