This is a set of notes on my successful install of CUDA Toolkit 8.0 and PGI 16.9 for OpenACC development on a machine running Fedora 24. Due to Fedora's model of having the latest software, the install is nontrivial and not likely to be successful based purely on following the PGI install instructions.
The OpenACC Toolkit, in theory, can install the CUDA and other needed components for PGI with one install script. However, in addition to the special concerns Fedora 24 raises I prefer to install the CUDA Tookit separately since it allows Fedora's package manager to maintain it. However, you should still sign up for the OpenACC Toolkit as it will give you a license for running PGI immediately as well as give you the opportunity to request a University developer license. University researchers can get free licenses for PGI compilers (for one year, but renewable at no cost), making it easy to do local development.
In what follows, I'm assuming all commands are run as root. This process is not possible without root access. I will assume you have a PGI license (e.g. via the OpenACC Toolkit).
NVIDIA's CUDA Toolkit website has good install guides. I'll be following one. At time of writing, it was here: https://developer.nvidia.com/cuda-downloads Entitled "NVIDIA CUDA INSTALLATION GUIDE FOR LINUX"
CUDA requires kernel headers and development packages that may not be installed by default. For Fedora, these are installed via
$ dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
For my machine at time of writing, uname -r
gives 4.7.2-201.fc24.x86_64
These were already installed in my case.
Get CUDA Toolkit rpm from https://developer.nvidia.com/cuda-downloads
Make sure you don't have any past installs of toolkit components. For my machine, this
meant verifying no conflicts in dnf list | grep cuda
.
If you have a /etc/X11/xorg.conf file, this may cause problems with the driver.
Ensure RPMFusion free repository is enabled
$ dnf repolist
Using the .rpm file downloaded previously, install metadata and clean dnf:
$ rpm --install cuda-repo-fedora23-8.0.44-1.x86_64.rpm
$ dnf clean expire-cache
The non-free repo can cause conflicts, so install cuda with it temporarily diabled:
$ dnf --disablerepo="rpmfusion-nonfree*" install cuda
Note that this installs X11 drivers, which I already had a version of from the nonfree repo. It seems to have properly removed the nonfree version of the driver in favor of the cuda repo version, but watch for possible conflicts.
Note that PGI Workstation 16.9 defaults to assuming CUDA 7.0, while I just installed CUDA 8.0. So far I've not had problems, but if you need them archives of past versions 7.0 and 7.5 can be found here: https://developer.nvidia.com/cuda-toolkit-70 https://developer.nvidia.com/cuda-75-downloads-archive
export PATH=/usr/local/cuda-8.0/bin:${PATH}
Based on the Puget systems discussion
you can configure your system by creating the file /etc/profile.d/cuda.sh
containing
export PATH=$PATH:/usr/local/cuda/bin
The CUDA Toolkit install should've done this, but just for reference that discussion also recommends creating /etc/ld.so.conf.d/cuda.conf
containing
/usr/local/cuda/lib64
then run
$ ldconfig
CUDA is only compatible with gcc 5.3.1 (on Fedora). While a guru in Fedora + CUDA + gcc + PGI could possible work out a fairly robust workaround allowing the use of F24's native gcc 6.1.x (for example, nvcc will often compile correctly with gcc 6.1.x as the host compiler if you pass the right collection of flags), the only solution that I can manage is to install gcc 5.3.1 and access it using environment modules.
To get environment modules do
$ dnf install environment-modules
Now you will be able to manage an install of gcc 5.3.1 in parallel with the system's default gcc (6.2.1 at time of writing).
First, get gcc 5.3.1. I don't think this is a GNU release version, but a Fedora one. To be safe, I got it from fedora with
$ wget http://pkgs.fedoraproject.org/repo/pkgs/gcc/gcc-5.3.1-20151207.tar.bz2/1458ebcc302cb4ac6bab5cbf8b3f3fcc/gcc-5.3.1-20151207.tar.bz2
Download requirements:
$ cd gcc-5.3.1-20151207
$ ./contrib/download_prerequisites
Configure to use a different install location, don't bother with 32-bit on our 64-bit machine, and only install languages we want (otherwise it installs java and some other stuff, taking up time/memory). You also need to configure the compiler so that gcc6 can compile gcc5 by setting the standard to gnu++98.
$ export CXX="g++ -std=gnu++98"
$ ./configure --prefix=/opt/gcc/gcc-5.3.1 --disable-multilib --enable-languages=c,c++,fortran
You'll need to mkdir -p /opt/gcc/gcc-5.3.1
as root.
Build
$ make -j 6
This will take a while.
Now let's setup a modulefile for gcc 5.3.1. As root, make a directory for it in
$MODULEPATH
(/usr/share/modulefiles
for us)
$ mkdir /usr/share/modulefiles/gcc
In this directory, create a file 5.3.1
with the following contents. I wish
there was a standard template, but honestly all I can say is I developed these
contents based on looking at various examples online and those used by others.
See man module
and man modulefile
#%Module1.0
## /usr/share/modulefiles/gcc/5.3.1
##
## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1
proc ModulesHelp { } {
global GCC_VER modfile indir
puts stderr "Module file: $modfile"
puts stderr ""
puts stderr "This module modifies the shell environment to use gcc version"
puts stderr "$GCC_VER installed at $indir"
}
module-whatis "Sets environment to use GCC 5.3.1"
conflict gcc
set GCC_VER 5.3.1
set modfile /usr/share/modulefiles/gcc/5.3.1
set indir /opt/gcc/gcc-5.3.1
## Start modifying the environment.
# Prepend environment variables
prepend-path PATH $indir/bin
prepend-path LD_LIBRARY_PATH $indir/lib64
prepend-path LIBRARY_PATH $indir/lib64
prepend-path MANPATH $indir/share/man
# Set environment variables
setenv CC gcc
setenv CXX g++
setenv FC gfortran
setenv F77 gfortran
setenv F90 gfortran
Here's a template I wrote with some comments:
#%Module1.0
## The above line/cookie is required for this to be recognized as a modulefile.
## Without it, the module won't work.
##
## Put some comments here with any explanation of the module you'd like. I
## prefer something like:
##
## /usr/share/modulefiles/gcc/5.3.1
##
## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1
# Define ModulesHelp so that `module help mymodule` does something. The
# convention in modulefiles seems to be to write to stderr
proc ModulesHelp { } {
# People often put global variable declarations here so that hey can use them
# in the displayed help message. You must define these variables later if
# you print them, or errors will occur when help is requested.
global GCC_VER modfile indir
puts stderr "Module file: $modfile"
puts stderr ""
puts stderr "This module modifies the shell environment to use gcc version"
puts stderr "$GCC_VER installed at $indir"
}
# Define whatis so that `module whatis mymodule` does something. This is
# basically a shorter version of the previous help text.
module-whatis "Sets environment to use GCC 5.3.1"
# Declare conflicts here. These are modules that cannot be loaded at the same
# time as this one (e.g. another version of gcc, in this case)
conflict gcc
# Define variables here
set GCC_VER 5.3.1
set modfile /usr/share/modulefiles/gcc/5.3.1
set indir /opt/gcc/gcc-5.3.1
## Start modifying the environment. There are many ways to do this, some of
## which are demonstrated below.
# Prepend an environment variable, often PATH
prepend-path PATH $indir/bin
prepend-path LD_LIBRARY_PATH $indir/lib64
prepend-path LIBRARY_PATH $indir/lib64
prepend-path MANPATH $indir/man
# Set environment variables
setenv CC gcc
setenv CXX g++
setenv FC gfortran
setenv F77 gfortran
setenv F90 gfortran
Useful sites discussing modulefiles: http://www.admin-magazine.com/HPC/Articles/Environment-Modules http://nickgeoghegan.net/linux/installing-environment-modules https://wiki.scinet.utoronto.ca/wiki/index.php/Installing_your_own_modules https://www.sharcnet.ca/help/index.php/Configuring_your_software_environment_with_Modules
The CUDA Toolkit will provide a script to install a set of examples into a folder. Execute it:
$ cuda-install-samples-8.0.sh CUDA-samples
Driver version can be checked with
$ cat /proc/driver/nvidia/version
Version can be checked with
$ nvcc -V
Change to the samples directory and do a make
to compile them. Be sure to
load the gcc/5.3.1
module.
Once built, cd
to the device query binary and run it
$ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
$ ./deviceQuery
This should output something like
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6072 MBytes (6366756864 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1848 MHz (1.85 GHz)
Memory Clock rate: 4104 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB
Result = PASS
The key to know the install worked is that the correct device was detected and
that you get Result = PASS
.
To verify your CPU and GPU are communicating well, do a bandwidth test
$ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest/
$ ./bandwidthTest
This yields something like
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GTX 1060 6GB
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11583.9
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12910.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 144277.7
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Again the key is that measurements were obtained and that you get Result = PASS
. If you got this far, your install is verified. If problems arose,
consult the CUDA Toolkit documentation.
I will outline the install process below. I am following the install guide provided here: http://www.pgroup.com/doc/pgiinstall169.pdf
PGI needs the Linux Standard Base, version 3 or greater. To check if you have it, do
$ lsb_release
To install this on Fedora:
$ dnf install redhat-lsb
Go to pgroup.com to download a copy of the PGI Accelerator Fortran/C/C++ Workstation compilers.
While anyone can download this, note that you will not be able to use them without a license. To obtain a free license, university researchers can go to https://developer.nvidia.com/openacc-toolkit . Anyone can get a 90-day license from this that will be immediately usable, and academics can get a renewable 1-year license for free. We'll cover licensing after the install.
Untar the download (I like to do this in a directory of the same name)
$ mkdir pgilinux-2016-169-x86_64
$ cd pgilinux-2016-169-x86_64
$ mv ../pgilinux-2016-169-x86_64.tar.gz .
$ tar xvzf pgilinux-2016-169-x86_64.tar.gz
Execute the install script as root, and follow the prompts
$ ./install
Make PGI accessible by modifying your environment, e.g. by putting the following in ~/.bashrc:
export PGI=/opt/pgi
export PATH=/opt/pgi/linux86-64/16.9/bin:$PATH
export MANPATH=$MANPATH:/opt/pgi/linux86-64/16.9/man
export LM_LICENSE_FILE=$LM_LICENSE_FILE:/opt/pgi/license.dat
The install script offers to setup licensing, but I prefer to do it as a separate step.
If you got a university developer license via the OpenACCToolkit (details for this would've been emailed to you when you downloaded the kit), you should be able to go to your pgroup.com account and click "Create permanent keys"
You'll need a hostid that you can get with
$ lmutil lmhostid
Your hostname can be gotten with
$ lmutil lmhostid -hostname
You'll be asked for these two bits of information, and can use them to generate
a key. Once you've generated the key, simply copy it into /opt/pgi/license.dat
.
For trial licenses, that's all you need. For permanent licenses, you need to start the license service. To do so manually, simply do:
$ lmgrd
You likely want this started automatically on boot. For that, as root do:
$ cp $PGI/linux86-64/16.9/bin/lmgrd.rc /etc/rc.d/init.d/lmgrd
$ ln -s /etc/rc.d/init.d/lmgrd /etc/rc.d/rc5.d/S90lmgrd
The above was for a Fedora 24 system. For other distros, "rc#.d" should have
the # be the same as that given by /sbin/runlevel
. Your traditional init
files (rc files) may also be in a different location, e.g. /etc/init.d . I
really wish PGI would support just using systemd directly, as many distros are
moving to it over the traditional init.d framework. But on Fedora 24,
traditional init executable scripts in /etc/rc.d/init.d/ should be run by
systemd in the same way as they are run on distros not deploying systemd.
If you try to build code including C++ source, you may run into errors you don't see with other C++ compilers. This is because PGI's pgc++ makes use of your system's C++ STL. The reason for this, as I understand it, is that PGI wants pgc++ to be object-compatible with g++, so they need to use the GNU STL. In the case of Fedora 24, your default GCC (6.2.1) is too new.
We've already installed GCC 5.3.1 to build CUDA, and in my limited testing this works fine with pgc++, so we'll tell PGI to use its STL.
To do this we can evoke a PGI command for creating a local configuration. As root, do
$ cd /opt/pgi/linux86-64/16.9/bin
To see your current configuration, you can do ./makelocalrc -n
or cat localrc
. It's a good idea now to backup the current localrc
$ mv localrc localrc.orig
To tell PGI to use different a different GNU C++ STL for the current host, you can do
$ ./makelocalrc -gpp /opt/gcc/gcc-5.3.1/bin/g++ -x -net
The -x -net
options will create in the install directory (/opt/pgi/linux86-64/16.9/bin) a
localrc.<hostname>
file. If you simply do -x
, you will overwrite the
current localrc
. Either option should work.
If you do cat localrc.<hostname>
you should now see that PGI will look for C++
libraries in our install of GCC 5.3.1. For a few test builds, this was
sufficient to get PGI to compile code including C++ source.
By default, PGI 16.9 will assume you're using CUDA 7.0. To set the default to
8.0, add the following to the localrc.<hostname>
file you generated:
set DEFCUDAVERSION=8.0;
You should now be able to compile OpenACC code with PGI 16.9 compilers! I did
notice one issue where the linking wants object files to be ordered according to
dependencies. It seems this is an issue with nvlink
that may eventually be addressed. Until then, you may need to order object files on the link line based on dependencies. When I order file this way, I am able to successfully build non-trivial OpenACC code including C, C++, and Fortran source.