Check your CUDA driver and device.
!nvidia-smi
Wed Jul 3 22:10:58 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:00:1B.0 Off | 0 | | N/A 70C P0 228W / 300W | 7684MiB / 16130MiB | 78% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:00:1C.0 Off | 0 | | N/A 44C P0 38W / 300W | 11MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:00:1D.0 Off | 0 | | N/A 43C P0 59W / 300W | 978MiB / 16130MiB | 14% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:00:1E.0 Off | 0 | | N/A 40C P0 40W / 300W | 11MiB / 16130MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 118587 C ...iconda3/envs/d2l-en-numpy2-0/bin/python 7673MiB | | 2 119109 C ...iconda3/envs/d2l-en-numpy2-1/bin/python 967MiB | +-----------------------------------------------------------------------------+
Number of available GPUs
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()
npx.num_gpus()
2
Computation devices
print(npx.cpu(), npx.gpu(), npx.gpu(1))
def try_gpu(i=0):
return npx.gpu(i) if npx.num_gpus() >= i + 1 else npx.cpu()
def try_all_gpus():
ctxes = [npx.gpu(i) for i in range(npx.num_gpus())]
return ctxes if ctxes else [npx.cpu()]
try_gpu(), try_gpu(3), try_all_gpus()
cpu(0) gpu(0) gpu(1)
(gpu(0), cpu(0), [gpu(0), gpu(1)])
Create ndarrays on the 1st GPU
x = np.ones((2, 3), ctx=try_gpu())
print(x.context)
x
gpu(0)
array([[1., 1., 1.], [1., 1., 1.]], ctx=gpu(0))
Create on the 2nd GPU
y = np.random.uniform(size=(2, 3), ctx=try_gpu(1))
y
array([[0.59119 , 0.313164 , 0.76352036], [0.9731786 , 0.35454726, 0.11677533]], ctx=gpu(1))
Copying between devices
z = x.copyto(try_gpu(1))
print(x)
print(z)
[[1. 1. 1.] [1. 1. 1.]] @gpu(0) [[1. 1. 1.] [1. 1. 1.]] @gpu(1)
The inputs of an operator must be on the same device, then the computation will run on that device.
y + z
array([[1.59119 , 1.313164 , 1.7635204], [1.9731786, 1.3545473, 1.1167753]], ctx=gpu(1))
Initialize parameters on the first GPU.
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(ctx=try_gpu())
When the input is an ndarray on the GPU, Gluon will calculate the result on the same GPU.
net(x)
array([[0.04995865], [0.04995865]], ctx=gpu(0))
Let us confirm that the model parameters are stored on the same GPU.
net[0].weight.data()
array([[0.0068339 , 0.01299825, 0.0301265 ]], ctx=gpu(0))