Hi,
I’m encountering an issue when running PyTorch on NVIDIA A100 GPU using the “106a CUDA 12.3 (GPU)” software stack. Below is the test code I used:
import torch
print(f"PyTorch version: {torch.version}“)
print(f"CUDA version: {torch.version.cuda}”)
print(f"Device name: {torch.cuda.get_device_name(0)}")import subprocess
try:
nvidia_smi_output = subprocess.check_output([“nvidia-smi”]).decode(‘utf-8’)
print(“\nnvidia-smi output:”)
print(nvidia_smi_output)
except subprocess.CalledProcessError:
print(“nvidia-smi command not available”)x = torch.rand(5, 3).to(‘cuda’)
print(f"Tensor on GPU: {x}")
Here’s the output I got:
PyTorch version: 2.3.1
CUDA version: 12.3
Device name: NVIDIA A100-PCIE-40GBnvidia-smi output:
Thu Dec 19 15:21:03 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:00:06.0 Off | 0 |
| N/A 36C P0 43W / 250W | 422MiB / 40960MiB | 0% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+
The error occurs at the last line:
—> 17 print(f"Tensor on GPU: {x}")
RuntimeError: CUDA error: no kernel image is available for execution on the device.
This error only happens when I’m assigned an A100 GPU. When I’m assigned a Tesla T4, the code works fine.
Anyone has the same problem? Is this a problem with the PyTorch version?
Any help or suggestions would be greatly appreciated. Thanks in advance!