#!/usr/bin/env python
# coding: utf-8

# # Installing CUDA Toolkit 8.0 and PGI 16.9 on Fedora 24
# 
# This is a set of notes on my successful install of CUDA Toolkit 8.0 and PGI 16.9 for OpenACC development on a machine running Fedora 24.  Due to Fedora's model of having the latest software, the install is nontrivial and not likely to be successful based purely on following the PGI install instructions.
# 
# The [OpenACC Toolkit](https://developer.nvidia.com/openacc-toolkit), in theory, can install the CUDA and other needed components for PGI with one install script.  However, in addition to the special concerns Fedora 24 raises I prefer to install the CUDA Tookit separately since it allows Fedora's package manager to maintain it.  However, you should still sign up for the OpenACC Toolkit as it will give you a license for running PGI immediately as well as give you the opportunity to request a University developer license.  University researchers can get free licenses for PGI compilers (for one year, but renewable at no cost), making it easy to do local development.
# 
# In what follows, I'm assuming all commands are run as root.  This
# process is not possible without root access.  I will assume you have a PGI license (e.g. via the OpenACC Toolkit).
# 
# ## Install CUDA Components
# 
# NVIDIA's CUDA Toolkit website has good install guides.  I'll be following one.
# At time of writing, it was here: https://developer.nvidia.com/cuda-downloads
# Entitled "NVIDIA CUDA INSTALLATION GUIDE FOR LINUX"
# 
# #### 1) install headers and dev packages
# CUDA requires kernel headers and development packages that may not be installed
# by default.  For Fedora, these are installed via
# 
# ```
# $ dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
# ```
# 
# For my machine at time of writing, `uname -r` gives 4.7.2-201.fc24.x86_64
# 
# These were already installed in my case.
# 
# #### 2) Install distro-specific toolkit
# 
# Get CUDA Toolkit rpm from https://developer.nvidia.com/cuda-downloads
# 
# Make sure you don't have any past installs of toolkit components.  For my machine, this
# meant verifying no conflicts in `dnf list | grep cuda`.
# 
# If you have a /etc/X11/xorg.conf file, this may cause problems with the driver.
# 
# Ensure RPMFusion free repository is enabled
# ```
# $ dnf repolist
# ```
# 
# Using the .rpm file downloaded previously, install metadata and clean dnf:
# ```
# $ rpm --install cuda-repo-fedora23-8.0.44-1.x86_64.rpm
# $ dnf clean expire-cache
# ```
# 
# The non-free repo can cause conflicts, so install cuda with it temporarily
# diabled:
# ```
# $ dnf --disablerepo="rpmfusion-nonfree*" install cuda
# ```
# 
# Note that this installs X11 drivers, which I already had a version of from the
# nonfree repo.  It seems to have properly removed the nonfree version of the
# driver in favor of the cuda repo version, but watch for possible conflicts.
# 
# Note that PGI Workstation 16.9 defaults to assuming CUDA 7.0, while I just installed CUDA 8.0.
# So far I've not had problems, but if you need them archives of past versions 7.0 and 7.5 can be found here: 
# https://developer.nvidia.com/cuda-toolkit-70
# https://developer.nvidia.com/cuda-75-downloads-archive
# 
# #### 3) Get CUDA into your path, config system
# ```
# export PATH=/usr/local/cuda-8.0/bin:${PATH}
# ```
# 
# Based on the [Puget systems discussion](https://www.pugetsystems.com/labs/hpc/Install-NVIDIA-CUDA-on-Fedora-22-with-gcc-5-1-654/)
# you can configure your system by creating the file `/etc/profile.d/cuda.sh` containing
# 
# ```
# export PATH=$PATH:/usr/local/cuda/bin
# ```
# 
# The CUDA Toolkit install should've done this, but just for reference that discussion also recommends creating `/etc/ld.so.conf.d/cuda.conf` containing
# ```
# /usr/local/cuda/lib64
# ```
# then run
# ```
# $ ldconfig
# ```
# 
# #### 4) Install a gcc 5.3.1 module
# 
# CUDA is only compatible with gcc 5.3.1 (on Fedora).  While a guru in Fedora +
# CUDA + gcc + PGI could possible work out a fairly robust workaround allowing the
# use of F24's native gcc 6.1.x (for example, nvcc will often compile correctly
# with gcc 6.1.x as the host compiler if you pass the right collection of flags),
# the only solution that I can manage is to install gcc 5.3.1 and access it using
# environment modules.
# 
# To get environment modules do
# ```
# $ dnf install environment-modules
# ```
# 
# Now you will be able to manage an install of gcc 5.3.1 in parallel with the
# system's default gcc (6.2.1 at time of writing).
# 
# First, get gcc 5.3.1.  I don't think this is a GNU release version, but a Fedora
# one.  To be safe, I got it from fedora with 
# ```
# $ wget http://pkgs.fedoraproject.org/repo/pkgs/gcc/gcc-5.3.1-20151207.tar.bz2/1458ebcc302cb4ac6bab5cbf8b3f3fcc/gcc-5.3.1-20151207.tar.bz2
# ```
# 
# Download requirements:
# ```
# $ cd gcc-5.3.1-20151207
# $ ./contrib/download_prerequisites
# ```
# 
# Configure to use a different install location, don't bother with 32-bit on our
# 64-bit machine, and only install languages we want (otherwise it installs java
# and some other stuff, taking up time/memory).  You also need to configure the
# compiler so that gcc6 can compile gcc5 by setting the standard to gnu++98.
# ```
# $ export CXX="g++ -std=gnu++98"
# $ ./configure --prefix=/opt/gcc/gcc-5.3.1 --disable-multilib --enable-languages=c,c++,fortran
# ```
# 
# You'll need to `mkdir -p /opt/gcc/gcc-5.3.1` as root.
# 
# Build
# ```
# $ make -j 6
# ```
# This will take a while.
# 
# Now let's setup a modulefile for gcc 5.3.1.  As root, make a directory for it in
# `$MODULEPATH` (`/usr/share/modulefiles` for us)
# ```
# $ mkdir /usr/share/modulefiles/gcc
# ```
# In this directory, create a file `5.3.1` with the following contents.  I wish
# there was a standard template, but honestly all I can say is I developed these
# contents based on looking at various examples online and those used by others.
# See `man module` and `man modulefile`
# 
# ```
# #%Module1.0
# ## /usr/share/modulefiles/gcc/5.3.1
# ##
# ## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1
# 
# proc ModulesHelp { } {
#    global GCC_VER modfile indir
# 
#    puts stderr "Module file:      $modfile"
#    puts stderr ""
#    puts stderr "This module modifies the shell environment to use gcc version"
#    puts stderr "$GCC_VER installed at $indir"
# }
# 
# module-whatis "Sets environment to use GCC 5.3.1"
# 
# conflict gcc
# 
# set  GCC_VER  5.3.1
# set  modfile  /usr/share/modulefiles/gcc/5.3.1
# set  indir    /opt/gcc/gcc-5.3.1
# 
# ## Start modifying the environment.
# 
# # Prepend environment variables
# prepend-path    PATH            $indir/bin
# prepend-path    LD_LIBRARY_PATH $indir/lib64
# prepend-path    LIBRARY_PATH    $indir/lib64
# prepend-path    MANPATH         $indir/share/man
# 
# # Set environment variables
# setenv CC  gcc
# setenv CXX g++
# setenv FC  gfortran
# setenv F77 gfortran
# setenv F90 gfortran
# ```
# 
# Here's a template I wrote with some comments:
# ```
# #%Module1.0
# ## The above line/cookie is required for this to be recognized as a modulefile.
# ## Without it, the module won't work.
# ##
# ## Put some comments here with any explanation of the module you'd like.  I
# ## prefer something like:
# ## 
# ## /usr/share/modulefiles/gcc/5.3.1
# ##
# ## Provides gcc version 5.3.1, installed at /opt/gcc/gcc-5.3.1
# 
# # Define ModulesHelp so that `module help mymodule` does something.  The
# # convention in modulefiles seems to be to write to stderr
# proc ModulesHelp { } {
#    # People often put global variable declarations here so that hey can use them
#    # in the displayed help message.  You must define these variables later if
#    # you print them, or errors will occur when help is requested.
#    global GCC_VER modfile indir
# 
#    puts stderr "Module file:      $modfile"
#    puts stderr ""
#    puts stderr "This module modifies the shell environment to use gcc version"
#    puts stderr "$GCC_VER installed at $indir"
# }
# 
# # Define whatis so that `module whatis mymodule` does something.  This is
# # basically a shorter version of the previous help text.
# module-whatis "Sets environment to use GCC 5.3.1"
# 
# # Declare conflicts here.  These are modules that cannot be loaded at the same
# # time as this one (e.g. another version of gcc, in this case)
# conflict gcc
# 
# # Define variables here
# set  GCC_VER  5.3.1
# set  modfile  /usr/share/modulefiles/gcc/5.3.1
# set  indir    /opt/gcc/gcc-5.3.1
# 
# ## Start modifying the environment.  There are many ways to do this, some of
# ## which are demonstrated below.
# 
# # Prepend an environment variable, often PATH
# prepend-path    PATH            $indir/bin
# prepend-path    LD_LIBRARY_PATH $indir/lib64
# prepend-path    LIBRARY_PATH    $indir/lib64
# prepend-path    MANPATH         $indir/man
# 
# # Set environment variables
# setenv CC  gcc
# setenv CXX g++
# setenv FC  gfortran
# setenv F77 gfortran
# setenv F90 gfortran
# 
# ```
# 
# Useful sites discussing modulefiles:
# http://www.admin-magazine.com/HPC/Articles/Environment-Modules
# http://nickgeoghegan.net/linux/installing-environment-modules
# https://wiki.scinet.utoronto.ca/wiki/index.php/Installing_your_own_modules
# https://www.sharcnet.ca/help/index.php/Configuring_your_software_environment_with_Modules
# 
# #### 5) Verify the install
# 
# The CUDA Toolkit will provide a script to install a set of examples into a
# folder.  Execute it:
# ```
# $ cuda-install-samples-8.0.sh CUDA-samples
# ```
# 
# Driver version can be checked with
# ```
# $ cat /proc/driver/nvidia/version
# ```
# 
# Version can be checked with
# ```
# $ nvcc -V
# ```
# 
# Change to the samples directory and do a `make` to compile them.  Be sure to
# load the `gcc/5.3.1` module.
# 
# Once built, `cd` to the device query binary and run it
# ```
# $ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/
# $ ./deviceQuery
# ```
# 
# This should output something like
# 
# ```
# ./deviceQuery Starting...
# 
#  CUDA Device Query (Runtime API) version (CUDART static linking)
# 
# Detected 1 CUDA Capable device(s)
# 
# Device 0: "GeForce GTX 1060 6GB"
#   CUDA Driver Version / Runtime Version          8.0 / 8.0
#   CUDA Capability Major/Minor version number:    6.1
#   Total amount of global memory:                 6072 MBytes (6366756864 bytes)
#   (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
#   GPU Max Clock rate:                            1848 MHz (1.85 GHz)
#   Memory Clock rate:                             4104 Mhz
#   Memory Bus Width:                              192-bit
#   L2 Cache Size:                                 1572864 bytes
#   Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
#   Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
#   Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
#   Total amount of constant memory:               65536 bytes
#   Total amount of shared memory per block:       49152 bytes
#   Total number of registers available per block: 65536
#   Warp size:                                     32
#   Maximum number of threads per multiprocessor:  2048
#   Maximum number of threads per block:           1024
#   Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
#   Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
#   Maximum memory pitch:                          2147483647 bytes
#   Texture alignment:                             512 bytes
#   Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
#   Run time limit on kernels:                     Yes
#   Integrated GPU sharing Host Memory:            No
#   Support host page-locked memory mapping:       Yes
#   Alignment requirement for Surfaces:            Yes
#   Device has ECC support:                        Disabled
#   Device supports Unified Addressing (UVA):      Yes
#   Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
#   Compute Mode:
#      < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
# 
# deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060 6GB
# Result = PASS
# ```
# 
# The key to know the install worked is that the correct device was detected and
# that you get `Result = PASS`.
# 
# To verify your CPU and GPU are communicating well, do a bandwidth test
# ```
# $ cd NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest/
# $ ./bandwidthTest
# ```
# 
# This yields something like
# ```
# [CUDA Bandwidth Test] - Starting...
# Running on...
# 
#  Device 0: GeForce GTX 1060 6GB
#  Quick Mode
# 
#  Host to Device Bandwidth, 1 Device(s)
#  PINNED Memory Transfers
#    Transfer Size (Bytes)	Bandwidth(MB/s)
#    33554432			11583.9
# 
#  Device to Host Bandwidth, 1 Device(s)
#  PINNED Memory Transfers
#    Transfer Size (Bytes)	Bandwidth(MB/s)
#    33554432			12910.8
# 
#  Device to Device Bandwidth, 1 Device(s)
#  PINNED Memory Transfers
#    Transfer Size (Bytes)	Bandwidth(MB/s)
#    33554432			144277.7
# 
# Result = PASS
# 
# NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
# ```
# 
# Again the key is that measurements were obtained and that you get `Result =
# PASS`.  If you got this far, your install is verified.  If problems arose,
# consult the CUDA Toolkit documentation.
# 
# 
# ## Install PGI Accelerator Workstation
# I will outline the install process below.  I am following the install guide
# provided here: http://www.pgroup.com/doc/pgiinstall169.pdf 
# 
# #### Pre-install checks
# PGI needs the Linux Standard Base, version 3 or greater.  To check if you have
# it, do 
# ```
# $ lsb_release
# ```
# 
# To install this on Fedora:
# ```
# $ dnf install redhat-lsb
# ```
# 
# #### Install components
# Go to pgroup.com to download a copy of the PGI Accelerator Fortran/C/C++
# Workstation compilers.  
# 
# While anyone can download this, note that you will not
# be able to use them without a license.  To obtain a free license, university
# researchers can go to https://developer.nvidia.com/openacc-toolkit .  Anyone can
# get a 90-day license from this that will be immediately usable, and academics
# can get a renewable 1-year license for free.  We'll cover licensing after the
# install.
# 
# Untar the download (I like to do this in a directory of the same name)
# ```
# $ mkdir pgilinux-2016-169-x86_64
# $ cd pgilinux-2016-169-x86_64
# $ mv ../pgilinux-2016-169-x86_64.tar.gz .
# $ tar xvzf pgilinux-2016-169-x86_64.tar.gz
# ```
# 
# Execute the install script as root, and follow the prompts
# ```
# $ ./install
# ```
# 
# Make PGI accessible by modifying your environment, e.g. by putting the following
# in ~/.bashrc:
# 
# ```
# export PGI=/opt/pgi
# export PATH=/opt/pgi/linux86-64/16.9/bin:$PATH
# export MANPATH=$MANPATH:/opt/pgi/linux86-64/16.9/man
# export LM_LICENSE_FILE=$LM_LICENSE_FILE:/opt/pgi/license.dat
# ```
# 
# #### Setup licensing
# 
# The install script offers to setup licensing, but I prefer to do it as a
# separate step.
# 
# If you got a university developer license via the OpenACCToolkit (details for
# this would've been emailed to you when you downloaded the kit), you should be
# able to go to your pgroup.com account and click "Create permanent keys"
# 
# You'll need a hostid that you can get with
# ```
# $ lmutil lmhostid
# ```
# 
# Your hostname can be gotten with
# ```
# $ lmutil lmhostid -hostname
# ```
# 
# You'll be asked for these two bits of information, and can use them to generate
# a key.  Once you've generated the key, simply copy it into `/opt/pgi/license.dat` .
# 
# For trial licenses, that's all you need.  For permanent licenses, you need to
# start the license service.  To do so manually, simply do:
# ```
# $ lmgrd
# ```
# 
# You likely want this started automatically on boot.  For that, as root do:
# ```
# $ cp $PGI/linux86-64/16.9/bin/lmgrd.rc /etc/rc.d/init.d/lmgrd
# $ ln -s /etc/rc.d/init.d/lmgrd /etc/rc.d/rc5.d/S90lmgrd
# ```
# 
# The above was for a Fedora 24 system.  For other distros, "rc#.d" should have
# the # be the same as that given by `/sbin/runlevel`.  Your traditional init
# files (rc files) may also be in a different location, e.g. /etc/init.d .  I
# really wish PGI would support just using systemd directly, as many distros are
# moving to it over the traditional init.d framework.  But on Fedora 24,
# traditional init executable scripts in /etc/rc.d/init.d/ should be run by
# systemd in the same way as they are run on distros not deploying systemd.
# 
# #### Tell PGI where to find compatible GCC and that you're using CUDA 8.0
# 
# If you try to build code including C++ source, you may run into errors you
# don't see with other C++ compilers.  This is because PGI's pgc++ makes use of
# your system's C++ STL.  The reason for this, as I understand it, is that PGI
# wants pgc++ to be object-compatible with g++, so they need to use the GNU STL.
# In the case of Fedora 24, your default GCC (6.2.1) is too new.
# 
# We've already installed GCC 5.3.1 to build CUDA, and in my limited testing this
# works fine with pgc++, so we'll tell PGI to use its STL.
# 
# To do this we can evoke a PGI command for creating a local configuration.  As
# root, do
# ```
# $ cd /opt/pgi/linux86-64/16.9/bin
# ```
# 
# To see your current configuration, you can do `./makelocalrc -n` or `cat
# localrc`.  It's a good idea now to backup the current localrc
# ```
# $ mv localrc localrc.orig
# ```
# 
# To tell PGI to use different a different GNU C++ STL for the current
# host, you can do
# ```
# $ ./makelocalrc -gpp /opt/gcc/gcc-5.3.1/bin/g++ -x -net
# ```
# 
# The `-x -net` options will create in the install directory (/opt/pgi/linux86-64/16.9/bin) a
# `localrc.<hostname>` file.  If you simply do `-x`, you will overwrite the
# current `localrc`.  Either option should work.
# 
# If you do `cat localrc.<hostname>` you should now see that PGI will look for C++
# libraries in our install of GCC 5.3.1.  For a few test builds, this was
# sufficient to get PGI to compile code including C++ source.
# 
# By default, PGI 16.9 will assume you're using CUDA 7.0.  To set the default to
# 8.0, add the following to the `localrc.<hostname>` file you generated:
# ```
# set DEFCUDAVERSION=8.0;
# ```
# 
# You should now be able to compile OpenACC code with PGI 16.9 compilers!  I did
# notice one issue where the linking wants object files to be ordered according to
# dependencies.  It seems this is an issue with `nvlink` that may eventually be addressed.  Until then, you may need to order object files on the link line based on dependencies.  When I order file this way, I am able to successfully build non-trivial OpenACC code including C, C++, and Fortran source.