TensorFlow and Google Cloud GPU Instances
Von Eric Antoine ScuccimarraI decided to try a Google Cloud GPU instance as well as EC2. Once I had my quotas set properly and was able to start the instance it took me all day to get TensorFlow running with GPU. The instructions Google provides are for CUDA 8.0, and the latest version of TensorFlow requires CUDA 9.0.
To get everything running follow these steps:
-
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
-
sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
-
sudo apt-get update
-
sudo apt-get install cuda-9-0
-
sudo nvidia-smi -pm 1
These are the steps in the instructions with the proper repo to CUDA 9.0 inserted.
Then I had to install cudnn, which isn't mentioned at all in Google's instructions. I downloaded libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb from the Nvidia cudnn site, and then uploaded it to the instance with scp. Then install it with:
sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb
Then you need to export the path with:
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH'
>> ~/.bashrc
source ~/.bashrc
And finally install TensorFlow:
sudo apt-get install python-dev python-pip libcupti-dev
sudo pip install tensorflow-gpu
I used pip3 and python3, but the rest is the same.
Update: I thought it was working fine but I was still getting errors about locating libcupti.so.9.0. That was fixed by making symlinks as described here.
I ran these commands and now it seems to be working...
# Put symlinks in /usr/local/cuda
sudo mkdir /usr/local/cuda
cd /usr/local/cuda
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
sudo ln -s /usr/bin/ bin
sudo ln -s /usr/lib/x86_64-linux-gnu/ nvvm
sudo mkdir -p extras/CUPTI
cd extras/CUPTI
sudo ln -s /usr/lib/x86_64-linux-gnu/ lib64
sudo ln -s /usr/include/ include
Another Update: TensorFlow requires version 7.0.4 of the cudnn, I had originally downloaded 7.1.2, the code has been updated accordingly.
Final Update: I set up another instance and followed this process and it almost worked. I needed to export another path which I added here. The commands to export the path were temporary and had to be repeated every time the instance was booted, I changed that to echo the path to .bashrc so it would be automatically set.
Etiketten: coding, machine_learning, tensorflow, google_cloud