Installing CUDA Toolkit 8.0 and PGI 16.9 on Fedora 24

This is a set of notes on my successful install of CUDA Toolkit 8.0 and PGI 16.9 for OpenACC development on a machine running Fedora 24. Due to Fedora's model of having the latest software, the install is nontrivial and not likely to be successful based purely on following the PGI install instructions.

The OpenACC Toolkit, in theory, can install the CUDA and other needed components for PGI with one install script. However, in addition to the special concerns Fedora 24 raises I prefer to install the CUDA Tookit separately since it allows Fedora's package manager to maintain it. However, you should still sign up for the OpenACC Toolkit as it will give you a license for running PGI immediately as well as give you the opportunity to request a University developer license. University researchers can get free licenses for PGI compilers (for one year, but renewable at no cost), making it easy to do local development.

In what follows, I'm assuming all commands are run as root. This process is not possible without root access. I will assume you have a PGI license (e.g. via the OpenACC Toolkit).

Install CUDA Components

NVIDIA's CUDA Toolkit website has good install guides. I'll be following one. At time of writing, it was here: https://developer.nvidia.com/cuda-downloads Entitled "NVIDIA CUDA INSTALLATION GUIDE FOR LINUX"

1) install headers and dev packages

CUDA requires kernel headers and development packages that may not be installed by default. For Fedora, these are installed via

$ dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

For my machine at time of writing, uname -r gives 4.7.2-201.fc24.x86_64

These were already installed in my case.

2) Install distro-specific toolkit

Get CUDA Toolkit rpm from https://developer.nvidia.com/cuda-downloads

Make sure you don't have any past installs of toolkit components. For my machine, this meant verifying no conflicts in dnf list | grep cuda.

If you have a /etc/X11/xorg.conf file, this may cause problems with the driver.

Ensure RPMFusion free repository is enabled

$ dnf repolist

Using the .rpm file downloaded previously, install metadata and clean dnf:

$ rpm --install cuda-repo-fedora23-8.0.44-1.x86_64.rpm
$ dnf clean expire-cache

The non-free repo can cause conflicts, so install cuda with it temporarily diabled:

$ dnf --disablerepo="rpmfusion-nonfree*" install cuda

Note that this installs X11 drivers, which I already had a version of from the nonfree repo. It seems to have properly removed the nonfree version of the driver in favor of the cuda repo version, but watch for possible conflicts.

Note that PGI Workstation 16.9 defaults to assuming CUDA 7.0, while I just installed CUDA 8.0. So far I've not had problems, but if you need them archives of past versions 7.0 and 7.5 can be found here: https://developer.nvidia.com/cuda-toolkit-70 https://developer.nvidia.com/cuda-75-downloads-archive

3) Get CUDA into your path, config system

export PATH=/usr/local/cuda-8.0/bin:${PATH}

Based on the Puget systems discussion you can configure your system by creating the file /etc/profile.d/cuda.sh containing

export PATH=$PATH:/usr/local/cuda/bin

The CUDA Toolkit install should've done this, but just for reference that discussion also recommends creating /etc/ld.so.conf.d/cuda.conf containing

/usr/local/cuda/lib64

then run

$ ldconfig

4) Install a gcc 5.3.1 module

CUDA is only compatible with gcc 5.3.1 (on Fedora). While a guru in Fedora + CUDA + gcc + PGI could possible work out a fairly robust workaround allowing the use of F24's native gcc 6.1.x (for example, nvcc will often compile correctly with gcc 6.1.x as the host compiler if you pass the right collection of flags), the only solution that I can manage is to install gcc 5.3.1 and access it using environment modules.

To get environment modules do

$ dnf install environment-modules

Now you will be able to manage an install of gcc 5.3.1 in parallel with the system's default gcc (6.2.1 at time of writing).

First, get gcc 5.3.1. I don't think this is a GNU release version, but a Fedora one. To be safe, I got it from fedora with

$ wget http://pkgs.fedoraproject.org/repo/pkgs/gcc/gcc-5.3.1-20151207.tar.bz2/1458ebcc302cb4ac6bab5cbf8b3f3fcc/gcc-5.3.1-20151207.tar.bz2

Download requirements:

$ cd gcc-5.3.1-20151207
$ ./contrib/download_prerequisites

Configure to use a different install location, don't bother with 32-bit on our 64-bit machine, and only install languages we want (otherwise it installs java and some other stuff, taking up time/memory). You also need to configure the compiler so that gcc6 can compile gcc5 by setting the standard to gnu++98.

$ export CXX="g++ -std=gnu++98"
$ ./configure --prefix=/opt/gcc/gcc-5.3.1 --disable-multilib --enable-languages=c,c++,fortran

You'll need to mkdir -p /opt/gcc/gcc-5.3.1 as root.

Build

$ make -j 6

This will take a while.

Now let's setup a modulefile for gcc 5.3.1. As root, make a directory for it in $MODULEPATH (/usr/share/modulefiles for us)

$ mkdir /usr/share/modulefiles/gcc

In this directory, create a file 5.3.1 with the following contents. I wish there was a standard template, but honestly all I can say is I developed these contents based on looking at various examples online and those used by others. See man module and man modulefile

#%Module1.0
## /usr/share/modulefiles/gcc/5.3.1
##
## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1

proc ModulesHelp { } {
   global GCC_VER modfile indir

   puts stderr "Module file:      $modfile"
   puts stderr ""
   puts stderr "This module modifies the shell environment to use gcc version"
   puts stderr "$GCC_VER installed at $indir"
}

module-whatis "Sets environment to use GCC 5.3.1"

conflict gcc

set  GCC_VER  5.3.1
set  modfile  /usr/share/modulefiles/gcc/5.3.1
set  indir    /opt/gcc/gcc-5.3.1

## Start modifying the environment.

# Prepend environment variables
prepend-path    PATH            $indir/bin
prepend-path    LD_LIBRARY_PATH $indir/lib64
prepend-path    LIBRARY_PATH    $indir/lib64
prepend-path    MANPATH         $indir/share/man

# Set environment variables
setenv CC  gcc
setenv CXX g++
setenv FC  gfortran
setenv F77 gfortran
setenv F90 gfortran

Here's a template I wrote with some comments:

#%Module1.0
## The above line/cookie is required for this to be recognized as a modulefile.
## Without it, the module won't work.
##
## Put some comments here with any explanation of the module you'd like.  I
## prefer something like:
## 
## /usr/share/modulefiles/gcc/5.3.1
##
## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1

# Define ModulesHelp so that `module help mymodule` does something.  The
# convention in modulefiles seems to be to write to stderr
proc ModulesHelp { } {
   # People often put global variable declarations here so that hey can use them
   # in the displayed help message.  You must define these variables later if
   # you print them, or errors will occur when help is requested.
   global GCC_VER modfile indir

   puts stderr "Module file:      $modfile"
   puts stderr ""
   puts stderr "This module modifies the shell environment to use gcc version"
   puts stderr "$GCC_VER installed at $indir"
}

# Define whatis so that `module whatis mymodule` does something.  This is
# basically a shorter version of the previous help text.
module-whatis "Sets environment to use GCC 5.3.1"

# Declare conflicts here.  These are modules that cannot be loaded at the same
# time as this one (e.g. another version of gcc, in this case)
conflict gcc

# Define variables here
set  GCC_VER  5.3.1
set  modfile  /usr/share/modulefiles/gcc/5.3.1
set  indir    /opt/gcc/gcc-5.3.1

## Start modifying the environment.  There are many ways to do this, some of
## which are demonstrated below.

# Prepend an environment variable, often PATH
prepend-path    PATH            $indir/bin
prepend-path    LD_LIBRARY_PATH $indir/lib64
prepend-path    LIBRARY_PATH    $indir/lib64
prepend-path    MANPATH         $indir/man

# Set environment variables
setenv CC  gcc
setenv CXX g++
setenv FC  gfortran
setenv F77 gfortran
setenv F90 gfortran

Useful sites discussing modulefiles: http://www.admin-magazine.com/HPC/Articles/Environment-Modules http://nickgeoghegan.net/linux/installing-environment-modules https://wiki.scinet.utoronto.ca/wiki/index.php/Installing_your_own_modules https://www.sharcnet.ca/help/index.php/Configuring_your_software_environment_with_Modules

5) Verify the install

The CUDA Toolkit will provide a script to install a set of examples into a folder. Execute it:

$ cuda-install-samples-8.0.sh CUDA-samples

Driver version can be checked with

$ cat /proc/driver/nvidia/version

Version can be checked with

$ nvcc -V

Change to the samples directory and do a make to compile them. Be sure to load the gcc/5.3.1 module.

Once built, cd to the device query binary and run it

$ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
$ ./deviceQuery

This should output something like

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6072 MBytes (6366756864 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1848 MHz (1.85 GHz)
  Memory Clock rate:                             4104 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB
Result = PASS

The key to know the install worked is that the correct device was detected and that you get Result = PASS.

To verify your CPU and GPU are communicating well, do a bandwidth test

$ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest/
$ ./bandwidthTest

This yields something like

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1060 6GB
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         11583.9

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         12910.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         144277.7

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Again the key is that measurements were obtained and that you get Result = PASS. If you got this far, your install is verified. If problems arose, consult the CUDA Toolkit documentation.

Install PGI Accelerator Workstation

I will outline the install process below. I am following the install guide provided here: http://www.pgroup.com/doc/pgiinstall169.pdf

Pre-install checks

PGI needs the Linux Standard Base, version 3 or greater. To check if you have it, do

$ lsb_release

To install this on Fedora:

$ dnf install redhat-lsb

Install components

Go to pgroup.com to download a copy of the PGI Accelerator Fortran/C/C++ Workstation compilers.

While anyone can download this, note that you will not be able to use them without a license. To obtain a free license, university researchers can go to https://developer.nvidia.com/openacc-toolkit . Anyone can get a 90-day license from this that will be immediately usable, and academics can get a renewable 1-year license for free. We'll cover licensing after the install.

Untar the download (I like to do this in a directory of the same name)

$ mkdir pgilinux-2016-169-x86_64
$ cd pgilinux-2016-169-x86_64
$ mv ../pgilinux-2016-169-x86_64.tar.gz .
$ tar xvzf pgilinux-2016-169-x86_64.tar.gz

Execute the install script as root, and follow the prompts

$ ./install

Make PGI accessible by modifying your environment, e.g. by putting the following in ~/.bashrc:

export PGI=/opt/pgi
export PATH=/opt/pgi/linux86-64/16.9/bin:$PATH
export MANPATH=$MANPATH:/opt/pgi/linux86-64/16.9/man
export LM_LICENSE_FILE=$LM_LICENSE_FILE:/opt/pgi/license.dat

Setup licensing

The install script offers to setup licensing, but I prefer to do it as a separate step.

If you got a university developer license via the OpenACCToolkit (details for this would've been emailed to you when you downloaded the kit), you should be able to go to your pgroup.com account and click "Create permanent keys"

You'll need a hostid that you can get with

$ lmutil lmhostid

Your hostname can be gotten with

$ lmutil lmhostid -hostname

You'll be asked for these two bits of information, and can use them to generate a key. Once you've generated the key, simply copy it into /opt/pgi/license.dat .

For trial licenses, that's all you need. For permanent licenses, you need to start the license service. To do so manually, simply do:

$ lmgrd

You likely want this started automatically on boot. For that, as root do:

$ cp $PGI/linux86-64/16.9/bin/lmgrd.rc /etc/rc.d/init.d/lmgrd
$ ln -s /etc/rc.d/init.d/lmgrd /etc/rc.d/rc5.d/S90lmgrd

The above was for a Fedora 24 system. For other distros, "rc#.d" should have the # be the same as that given by /sbin/runlevel. Your traditional init files (rc files) may also be in a different location, e.g. /etc/init.d . I really wish PGI would support just using systemd directly, as many distros are moving to it over the traditional init.d framework. But on Fedora 24, traditional init executable scripts in /etc/rc.d/init.d/ should be run by systemd in the same way as they are run on distros not deploying systemd.

Tell PGI where to find compatible GCC and that you're using CUDA 8.0

If you try to build code including C++ source, you may run into errors you don't see with other C++ compilers. This is because PGI's pgc++ makes use of your system's C++ STL. The reason for this, as I understand it, is that PGI wants pgc++ to be object-compatible with g++, so they need to use the GNU STL. In the case of Fedora 24, your default GCC (6.2.1) is too new.

We've already installed GCC 5.3.1 to build CUDA, and in my limited testing this works fine with pgc++, so we'll tell PGI to use its STL.

To do this we can evoke a PGI command for creating a local configuration. As root, do

$ cd /opt/pgi/linux86-64/16.9/bin

To see your current configuration, you can do ./makelocalrc -n or cat localrc. It's a good idea now to backup the current localrc

$ mv localrc localrc.orig

To tell PGI to use different a different GNU C++ STL for the current host, you can do

$ ./makelocalrc -gpp /opt/gcc/gcc-5.3.1/bin/g++ -x -net

The -x -net options will create in the install directory (/opt/pgi/linux86-64/16.9/bin) a localrc.<hostname> file. If you simply do -x, you will overwrite the current localrc. Either option should work.

If you do cat localrc.<hostname> you should now see that PGI will look for C++ libraries in our install of GCC 5.3.1. For a few test builds, this was sufficient to get PGI to compile code including C++ source.

By default, PGI 16.9 will assume you're using CUDA 7.0. To set the default to 8.0, add the following to the localrc.<hostname> file you generated:

set DEFCUDAVERSION=8.0;

You should now be able to compile OpenACC code with PGI 16.9 compilers! I did notice one issue where the linking wants object files to be ordered according to dependencies. It seems this is an issue with nvlink that may eventually be addressed. Until then, you may need to order object files on the link line based on dependencies. When I order file this way, I am able to successfully build non-trivial OpenACC code including C, C++, and Fortran source.