⚠ Switch to EXCALIDRAW VIEW in the MORE OPTIONS menu of this document. ⚠ You can decompress Drawing data with the command palette: ‘Decompress current Excalidraw file’. For more info check in plugin settings under ‘Saving’
Excalidraw Data
Text Elements
group.cc
- task enqueue/dequeue 2. fill op message
hostfunc
Device
Host
proxy progress
sender
receiver
semaphore→poll
semaphore→signalCounter
semaphore→poll
semaphore→signalCounter
deviceAdaptor→ hostfunc(semaphore)
event/stream + copy(H2D)+devicefunc
sendBuff
recvBuff
same process
RecvFifo
HBM
sendrecv per op
sender/receiver
Shared Proxy Info
1.desc 2. SHM (sendHead recvTail )
ncclShmAllocateShareableBuffer
ncclShmImportShareableBuffer
Shared Proxy Info
1.desc 2. SHM (sendHead recvTail )
connectInfo (desc)
CPU Mem
bootstrapSend
cross process
setup
proxy
cuMemcpy
cuMemMap
cuMemCreate
cuMemAddressReserve
cuMemGetAllocationGranularity (granularity, prop)
cuMemSetAccess x 2 (CUmemAccessDesc.location.type)
virtual Address
handle
CUdeviceptr
desc
- setup+connect bootstapSend/Recv
ncclShmAllocateShareableBuffer
ncclShmImportShareableBuffer
cuMemImportFromShareableHandle fd变成handle
cuMemSetAccess x 2 (CUmemAccessDesc.location.type)
ncclProxyClientGetFdBlocking(…) 用desc表示handle的部分发给对端,对端export fd,再走UDS传回来
use cumem
use cudaIpc finish allocate/import shared buffer
Shared Proxy Info
RecvFifo
receiver(process2)
sender(process1)
cudaDeviceEnablePeerAccess (directPtr变成devMem)
p2pMap
p2pMap
psmP2pAllocateShareableBuffer
cudaIpcGetMemHandle(*p2pBuff→directPtr)
cudaMalloc(**p2pBuff→directPtr)
cudaIpcOpenMemHandle(devMemPtr)
psmP2pImportShareableBuffer
ncclShmAllocateShareableBuffer
ShmOpen
Shared Proxy Info
p2pShm
ConnectInfo.desc
fill
bootstrapSend p2pBuff/desc
ncclShmImportShareableBuffer
ShmOpen
reOpen
ConnectInfo.desc
dev/shm/
mmap
mmap
Shared Proxy Info
p2pShm