Hello, the team,
I use a cluster equiped with A100 GPUs which don’t support torch 1.8.1 , so I have to install 1.9.1 version torch. It make me confused that if I run the code as tutorial says, it will crashed with the error:
‘RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled cuda error, NCCL version 2.7.8. ncclUnhandledCudaError: Call to CUDA function failed.’
It seems like version dismatch proplem? I’ll be appreciate for any advice. Thank you in advance.