在ubuntu 18.04(GeForce GTX 760 4GB显存)使用Pytorch Pix2PixGAN(CUDA-10.1)

1. 参照 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 建立 `pytorch 1.0.1` 的编译环境,并解决编译时遇到的问题。

2. 依旧是推荐在 Anaconda 上建立独立的编译环境,然后执行编译:

$ sudo apt-get install git

# conda remove -n Pix2Pix --all

$ conda create -n Pix2Pix -y python=3.6.8 pip

$ source activate Pix2Pix

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja

# magma-cuda90 magma-cuda91 magma-cuda92 会编译失败 
$ conda install -c pytorch magma-cuda101

$ git clone https://github.com/pytorch/pytorch

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

# 如果不需要使用cuda的话,这里还要加上一句:export NO_CUDA=1

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ python setup.py install

# 对于开发者模式,可以使用
# python setup.py build develop

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..

# 退出环境 
$ conda deactivate

编译出错信息,参考 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 里面的介绍解决。

3. 编译安装 TorchVision

$ sudo apt-get install git

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/pytorch/vision.git

# 也可本站下载一份拷贝 wget https://www.mobibrw.com/wp-content/uploads/2019/11/vision.zip

$ cd vision

$ git checkout v0.2.1 -b v0.2.1

$ python setup.py install

# 退出环境 
$ conda deactivate

4. 检出 CycleGAN and pix2pix in PyTorch 的代码,并安装依赖

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git

# 也可本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/12/pytorch-CycleGAN-and-pix2pix.zip

$ cd pytorch-CycleGAN-and-pix2pix

# 下载人脸替换部分的数据集
$ bash datasets/download_pix2pix_dataset.sh facades

# 也可本站下载然后自己参照脚本解压缩到指定目录 https://www.mobibrw.com/wp-content/uploads/2019/12/facades.tar.gz

# 安装依赖
$ pip install pillow==6.2.1
$ pip install dominate==2.4.0
$ pip install visdom==0.1.8.9

# 修正错误 models/networks.py
# TypeError: cuda() got an unexpected keyword argument 'device_id'
$ sed -i "s/netG\.cuda(device_id=gpu_ids\[0\])/netG.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/netD\.cuda(device_id=gpu_ids\[0\])/netD.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/network\.cuda(device_id=gpu_ids\[0\])/network.cuda(gpu_ids[0])/g" models/base_model.py

# 开启WEB服务,主要是第一次运行需要下载部分辅助软件包,
# 训练之前需要执行,否则下面训练的时候会报错
$ python -m visdom.server & 

# 等待屏幕上出现 “You can navigate to http://localhost:8097” 代表服务启动成功

# 执行训练
$ bash scripts/train_pix2pix.sh

执行训练的时候,如果出现如下错误:

Traceback (most recent call last):
  File "train.py", line 47, in <module>
    errors = model.get_current_errors()
  File "~/pytorch-CycleGAN-and-pix2pix/models/pix2pix_model.py", line 122, in get_current_errors
    return OrderedDict([('G_GAN', self.loss_G_GAN.data[0]),
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

这个原因是由于 PyTorch 版本差异造成的,(作者在 `Pytorch 0.4.1` 版本上测试,我们在 `Pytorch 1.0.1` 版本上测试),执行如下命令修复:

#loss_G_GAN.data[0] 替换为 loss_G_GAN.item()

$ sed -i "s/self\.loss_G_GAN\.data\[0]/self.loss_G_GAN.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_G_L1\.data\[0]/self.loss_G_L1.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_real\.data\[0]/self.loss_D_real.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_fake\.data\[0]/self.loss_D_fake.item()/g" models/pix2pix_model.py

5. 测试训练结果

$ bash scripts/test_pix2pix.sh

# 观察结果需要打开 ./results/facades_pix2pix/test_latest/index.html

参考链接