This notebook will show you how to use the GPU functionality of PyHEADTAIL. Created on 19. Feb 2016, Stefan Hegglin
In order to use the GPU module, you will need the following: - A Nvidia GPU: Tested on Tesla C2075 and Kepler K20 - CUDA version >6.5 - PyCUDA version 2015.1.3. Earlier versions possible but are not tested. - scikit-cuda 0.5.1
The usual imports: numpy, matplotlib
from __future__ import division, print_function
import numpy as np
from scipy.constants import c, e, m_p
In order to use the GPU, initialise it via the following statement. If it fails, this means the GPU or pycuda was not setup correctly. [This could also be performed automatically inside the context module, however it is less safe since we do not know what happens if the user creates another context]
Note: it is important to initialise the GPU before PyHEADTAIL is imported for the first time.
import pycuda.autoinit
Add PyHEADTAIL to the path:
# sets the PyHEADTAIL directory etc.
try:
from settings import *
except:
pass
Import the GPU and CPU contextmanagers and the PyHEADTAIL Synchrotron
:
from PyHEADTAIL.machines.synchrotron import Synchrotron
from PyHEADTAIL.general.contextmanager import GPU
from PyHEADTAIL.general.contextmanager import CPU
PyHEADTAIL v1.12.4.7
Define machine parameters and create a machine object and a corresponding matched bunch:
# machine parameters
circumference = 26658.8832
n_segments = 10
charge = e
mass = m_p
beta_x = 92.7
D_x = 0
beta_y = 93.2
D_y = 0
Q_x = 64.28
Q_y = 59.31
Qp_x = 10.
Qp_y = 15.
app_x = 0.0000e-9
app_y = 0.0000e-9
app_xy = 0
alpha = 3.225e-04
h1, h2 = 35640, 35640*2
V1, V2 = 6e6, 0.
dphi1, dphi2 = 0, np.pi
longitudinal_mode = 'non-linear'
p0 = 450e9 * e / c
p_increment = 0
machine = Synchrotron(
optics_mode='smooth', circumference=circumference,
n_segments=n_segments,
beta_x=beta_x, D_x=D_x, beta_y=beta_y, D_y=D_y,
accQ_x=Q_x, accQ_y=Q_y, Qp_x=Qp_x, Qp_y=Qp_y,
app_x=app_x, app_y=app_y, app_xy=app_xy,
alpha_mom_compaction=alpha, longitudinal_mode=longitudinal_mode,
h_RF=[h1,h2], V_RF=[V1,V2], dphi_RF=[dphi1,dphi2],
p0=p0, p_increment=p_increment, charge=charge, mass=mass,
use_cython=False
)
# bunch parameters
macroparticlenumber = 100000
intensity = 1e11
epsn_x = 2.5e-6
epsn_y = 3.5e-6
sigma_z = 0.05
bunch = machine.generate_6D_Gaussian_bunch_matched(
macroparticlenumber, intensity, epsn_x, epsn_y, sigma_z=sigma_z
)
# simulation parameters
n_turns = 10
*** Maximum RMS bunch length 0.117895151015m.
/home/oeftiger/anaconda/lib/python2.7/site-packages/scipy/integrate/quadpack.py:356: IntegrationWarning: The integral is probably divergent, or slowly convergent. warnings.warn(msg, IntegrationWarning)
... distance to target bunch length: -5.0000e-02 ... distance to target bunch length: 6.4638e-02 ... distance to target bunch length: 4.8815e-02 ... distance to target bunch length: 5.6104e-03 ... distance to target bunch length: -1.3673e-03 ... distance to target bunch length: -2.1248e-05 ... distance to target bunch length: 8.4927e-09 ... distance to target bunch length: -2.6939e-07 --> Bunch length: 0.0500000084927 --> Emittance: 0.163402703633
Up to this point everything has been performed on the CPU and the script for CPUs and GPUs is the same (except the use_cython=False parameter and the import pycuda.autoinit
statement) when setting up the simulation. Next, we'll create a GPU context to enclose the main tracking loop.
with GPU(bunch) as context:
for n in range(n_turns):
machine.track(bunch)
Ok, this seems to work. How do we know it's actually running on the GPU? We can check the type of the bunch phase-space arrays inside of the with statement:
print ('The type of bunch.x before entering the with-statement is', type(bunch.x))
with GPU(bunch) as context:
machine.track(bunch)
print ('The type of bunch.x inside of the with-statement is', type(bunch.x))
print ('The type of bunch.x after the with-region is', type(bunch.x))
The type of bunch.x before entering the with-statement is <type 'numpy.ndarray'> The type of bunch.x inside of the with-statement is <class 'pycuda.gpuarray.GPUArray'> The type of bunch.x after the with-region is <type 'numpy.ndarray'>
You can also use the CPU contextmanager to have more similar code for GPU and CPU scripts:
print ('The type of bunch.x before entering the with-statement is ', type(bunch.x))
with CPU(bunch) as context:
machine.track(bunch)
print ('The type of bunch.x inside of the with-statement is ', type(bunch.x))
print ('The type of bunch.x after the with-region is ', type(bunch.x))
The type of bunch.x before entering the with-statement is <type 'numpy.ndarray'> The type of bunch.x inside of the with-statement is <type 'numpy.ndarray'> The type of bunch.x after the with-region is <type 'numpy.ndarray'>
That's it! If you need access to the bunch-phase space arrays during the simulation, you can move a copy back to the CPU by using bunch.x.get(). Printing GPUArrays works out of the box if you need it for debugging:
with GPU(bunch) as context:
print ('The type of bunch.x inside of the with-statement is ', type(bunch.x))
print ('\nThe first three entries of bunch.x are ', bunch.x[0:3], ' (note: the array sits in GPU memory!)\n')
cpu_bunch_x = bunch.x.get()
print ('A CPU copy of bunch.x inside the with-statement has type ', type(cpu_bunch_x))
The type of bunch.x inside of the with-statement is <class 'pycuda.gpuarray.GPUArray'> The first three entries of bunch.x are [ 0.00019505 -0.00044431 -0.00050942] (note: the array sits in GPU memory!) A CPU copy of bunch.x inside the with-statement has type <type 'numpy.ndarray'>