【CUDA】Ubuntu 16.04 (GPU) に Keras/TensorFlow 環境を構築

今回は Ubuntu 16.04 LTS (GPUインスタンス) に Keras/TensorFlow 環境を構築する手順の備忘録です。

NVIDIA Driver
CUDA Toolkit
NVIDIA cuDNN
Python
Keras/TensorFlow

NVIDIA Driver

NVIDIA Driver をインストールする。

$ sudo add-apt-repository ppa:graphics-drivers/ppa -y
$ sudo apt-get update
$ sudo apt-get install -y nvidia-375 nvidia-settings

CUDA Toolkit

GPUアプリケーション開発環境 (低レベルなプログラミング言語の実行環境) を提供する CUDA Toolkit のインストール。今回はバージョン 9.0 を選択したが, 他のバージョンは Download Page で選択する。

$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda=9.0.176-1

ちなみに, TensorFlow は特定のバージョンの CUDA Toolkit と cuDNN に依存するため, 当初 CUDA Toolkit 9.2 をインストールしたがバージョンが合わずに 9.0 にダウングレードした。 [1] を参照。

NVIDIA cuDNN

DNN のためのプリミティブな GPU-accelerated library である NVIDIA cuDNN のインストール。NVIDIA Developer から NVIDIA Developer Program に登録しログイン後, 以下をダウンロードする。

libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb

cuDNN をインストールする。

$ sudo dpkg -i libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb

以下を実行して .bashrc に環境変数を追加する。

$ echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"' >> ~/.bashrc
$ echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc

Python環境

pyenv をインストールする。

$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv

以下を実行して .bashrc に環境変数を追加する。

$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv init -)"\nfi' >> ~/.bashrc

Anaconda3 をインストールする。

$ pyenv install anaconda3-4.4.0
$ pyenv global anaconda3-4.4.0
$ conda create -n Python34 anaconda python=3.4
$ source activate Python34

Keras/TensorFlow

TensorFlow-GPU 1.8, Keras 2.1.6 をインストールする。

$ pip install tensorflow-gpu pillow h5py keras

GPUが利用可能か確認する。

$ ipython
Python 3.4.5 |Anaconda 4.3.1 (64-bit)| (default, Jul  2 2016, 17:47:47)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from tensorflow.python.client import device_lib

In [2]: device_lib.list_local_devices()
2018-05-27 07:21:05.669941: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-27 07:21:08.200170: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-27 07:21:08.200610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-05-27 07:21:08.200641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-27 07:21:08.472218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-27 07:21:08.472266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-27 07:21:08.472289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-27 07:21:08.472599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 10764 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Out[2]:
[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 13555977001484831058, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 11287530701
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 17595250577370987790
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7"]

Keras の動作確認。

$ git clone https://github.com/fchollet/keras.git
$ cd keras/examples
$ python mnist_cnn.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz

11493376/11490434 [==============================] - 14s 1us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-05-27 07:23:32.705762: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-27 07:23:35.238854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-27 07:23:35.239260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-05-27 07:23:35.239289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-27 07:23:35.510533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-27 07:23:35.510581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-27 07:23:35.510603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-27 07:23:35.510912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10764 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
60000/60000 [==============================] - 14s 231us/step - loss: 0.2843 - acc: 0.9127 - val_loss: 0.0657 - val_acc: 0.9798
Epoch 2/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0952 - acc: 0.9719 - val_loss: 0.0435 - val_acc: 0.9850
Epoch 3/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0704 - acc: 0.9788 - val_loss: 0.0340 - val_acc: 0.9879
Epoch 4/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0566 - acc: 0.9835 - val_loss: 0.0333 - val_acc: 0.9884
Epoch 5/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0487 - acc: 0.9853 - val_loss: 0.0296 - val_acc: 0.9903
Epoch 6/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0427 - acc: 0.9872 - val_loss: 0.0319 - val_acc: 0.9887
Epoch 7/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0392 - acc: 0.9881 - val_loss: 0.0311 - val_acc: 0.9893
Epoch 8/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0372 - acc: 0.9889 - val_loss: 0.0303 - val_acc: 0.9898
Epoch 9/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0348 - acc: 0.9893 - val_loss: 0.0319 - val_acc: 0.9906
Epoch 10/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0310 - acc: 0.9908 - val_loss: 0.0282 - val_acc: 0.9908
Epoch 11/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0301 - acc: 0.9907 - val_loss: 0.0270 - val_acc: 0.9909
Epoch 12/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0282 - acc: 0.9916 - val_loss: 0.0275 - val_acc: 0.9910
Test loss: 0.027509962708507918
Test accuracy: 0.991

上記実行中に nvidia-smi コマンドで GPU 使用率を確認した。

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   84C    P0   135W / 149W |  10956MiB / 11441MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     27904      C   python                                     10943MiB |
+-----------------------------------------------------------------------------+

[1] ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
[2] Error while install tensorflow with pip install and python 3.5.2
[3] Python For Data Science Cheat Sheet