今回は Ubuntu 16.04 LTS (GPUインスタンス) に Keras/TensorFlow 環境を構築する手順の備忘録です。
NVIDIA Driver
NVIDIA Driver をインストールする。
$ sudo add-apt-repository ppa:graphics-drivers/ppa -y
$ sudo apt-get update
$ sudo apt-get install -y nvidia-375 nvidia-settings
CUDA Toolkit
GPUアプリケーション開発環境 (低レベルなプログラミング言語の実行環境) を提供する CUDA Toolkit のインストール。今回はバージョン 9.0 を選択したが, 他のバージョンは Download Page で選択する。
$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda=9.0.176-1
ちなみに, TensorFlow は特定のバージョンの CUDA Toolkit と cuDNN に依存するため, 当初 CUDA Toolkit 9.2 をインストールしたがバージョンが合わずに 9.0 にダウングレードした。 [1] を参照。
NVIDIA cuDNN
DNN のためのプリミティブな GPU-accelerated library である NVIDIA cuDNN のインストール。NVIDIA Developer から NVIDIA Developer Program に登録しログイン後, 以下をダウンロードする。
- libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
- libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
- libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
cuDNN をインストールする。
$ sudo dpkg -i libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
以下を実行して .bashrc に環境変数を追加する。
$ echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"' >> ~/.bashrc
$ echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
Python環境
pyenv をインストールする。
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
以下を実行して .bashrc に環境変数を追加する。
$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc
Anaconda3 をインストールする。
$ pyenv install anaconda3-4.4.0
$ pyenv global anaconda3-4.4.0
$ conda create -n Python34 anaconda python=3.4
$ source activate Python34
Keras/TensorFlow
TensorFlow-GPU 1.8, Keras 2.1.6 をインストールする。
$ pip install tensorflow-gpu pillow h5py keras
GPUが利用可能か確認する。
$ ipython
Python 3.4.5 |Anaconda 4.3.1 (64-bit)| (default, Jul 2 2016, 17:47:47)
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from tensorflow.python.client import device_lib
In [2]: device_lib.list_local_devices()
2018-05-27 07:21:05.669941: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-27 07:21:08.200170: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-27 07:21:08.200610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-05-27 07:21:08.200641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-27 07:21:08.472218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-27 07:21:08.472266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-27 07:21:08.472289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-27 07:21:08.472599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 10764 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Out[2]:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13555977001484831058, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11287530701
locality {
bus_id: 1
links {
}
}
incarnation: 17595250577370987790
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7"]
Keras の動作確認。
$ git clone https://github.com/fchollet/keras.git
$ cd keras/examples
$ python mnist_cnn.py
Using TensorFlow backend.
Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz
11493376/11490434 [==============================] - 14s 1us/step
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-05-27 07:23:32.705762: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-27 07:23:35.238854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-27 07:23:35.239260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-05-27 07:23:35.239289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-27 07:23:35.510533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-27 07:23:35.510581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-27 07:23:35.510603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-27 07:23:35.510912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10764 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
60000/60000 [==============================] - 14s 231us/step - loss: 0.2843 - acc: 0.9127 - val_loss: 0.0657 - val_acc: 0.9798
Epoch 2/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0952 - acc: 0.9719 - val_loss: 0.0435 - val_acc: 0.9850
Epoch 3/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0704 - acc: 0.9788 - val_loss: 0.0340 - val_acc: 0.9879
Epoch 4/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0566 - acc: 0.9835 - val_loss: 0.0333 - val_acc: 0.9884
Epoch 5/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0487 - acc: 0.9853 - val_loss: 0.0296 - val_acc: 0.9903
Epoch 6/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0427 - acc: 0.9872 - val_loss: 0.0319 - val_acc: 0.9887
Epoch 7/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0392 - acc: 0.9881 - val_loss: 0.0311 - val_acc: 0.9893
Epoch 8/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0372 - acc: 0.9889 - val_loss: 0.0303 - val_acc: 0.9898
Epoch 9/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0348 - acc: 0.9893 - val_loss: 0.0319 - val_acc: 0.9906
Epoch 10/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0310 - acc: 0.9908 - val_loss: 0.0282 - val_acc: 0.9908
Epoch 11/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0301 - acc: 0.9907 - val_loss: 0.0270 - val_acc: 0.9909
Epoch 12/12
60000/60000 [==============================] - 8s 134us/step - loss: 0.0282 - acc: 0.9916 - val_loss: 0.0275 - val_acc: 0.9910
Test loss: 0.027509962708507918
Test accuracy: 0.991
上記実行中に nvidia-smi コマンドで GPU 使用率を確認した。
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 84C P0 135W / 149W | 10956MiB / 11441MiB | 76% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 27904 C python 10943MiB |
+-----------------------------------------------------------------------------+
[1] ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
[2] Error while install tensorflow with pip install and python 3.5.2
[3] Python For Data Science Cheat Sheet