⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’

Excalidraw Data

Text Elements

group.cc

  1. task enqueue/dequeue 2. fill op message

hostfunc

Device

Host

proxy progress

sender

receiver

semaphorepoll

semaphoresignalCounter

semaphorepoll

semaphoresignalCounter

deviceAdaptor hostfunc(semaphore)

event/stream + copy(H2D)+devicefunc

sendBuff

recvBuff

same process

RecvFifo

HBM

sendrecv per op

sender/receiver

Shared Proxy Info

1.desc 2. SHM (sendHead recvTail )

ncclShmAllocateShareableBuffer

ncclShmImportShareableBuffer

Shared Proxy Info

1.desc 2. SHM (sendHead recvTail )

connectInfo (desc)

CPU Mem

bootstrapSend

cross process

setup

proxy

cuMemcpy

cuMemMap

cuMemCreate

cuMemAddressReserve

cuMemGetAllocationGranularity (granularity, prop)

cuMemSetAccess x 2 (CUmemAccessDesc.location.type)

virtual Address

handle

CUdeviceptr

desc

  1. setup+connect bootstapSend/Recv

ncclShmAllocateShareableBuffer

ncclShmImportShareableBuffer

cuMemImportFromShareableHandle fd变成handle

cuMemSetAccess x 2 (CUmemAccessDesc.location.type)

ncclProxyClientGetFdBlocking(…) 用desc表示handle的部分发给对端,对端export fd,再走UDS传回来

use cumem

use cudaIpc finish allocate/import shared buffer

Shared Proxy Info

RecvFifo

receiver(process2)

sender(process1)

cudaDeviceEnablePeerAccess (directPtr变成devMem)

p2pMap

p2pMap

psmP2pAllocateShareableBuffer

cudaIpcGetMemHandle(*p2pBuffdirectPtr)

cudaMalloc(**p2pBuffdirectPtr)

cudaIpcOpenMemHandle(devMemPtr)

psmP2pImportShareableBuffer

ncclShmAllocateShareableBuffer

ShmOpen

Shared Proxy Info

p2pShm

ConnectInfo.desc

fill

bootstrapSend p2pBuff/desc

ncclShmImportShareableBuffer

ShmOpen

reOpen

ConnectInfo.desc

dev/shm/

mmap

mmap

Shared Proxy Info

p2pShm