



有了语料后我们需要将其提取出来,因为wiki百科中的数据是以XML格式组织起来的,所以我们需要寻求些方法。查询之后发现有两种主要的方式:gensim的wikicorpus库,以及wikipedia Extractor。


Wikipedia Extractor是一个用Python写的维基百科抽取器,使用非常方便。下载之后直接使用这条命令即可完成抽取,运行时间很快。执行以下命令。



$ git clone https://github.com/attardi/wikiextractor.git

$ python ./wikiextractor/WikiExtractor.py -b 2048M -o extracted zhwiki-latest-pages-articles.xml.bz2


  • -b 2048M表示的是以128M为单位进行切分,默认是1M。
  • extracted:需要将提取的文件存放的路径;
  • zhwiki-latest-pages-articles.xml.bz2:需要进行提取的.bz2文件的路径


通过Wikipedia Extractor处理时会将一些特殊标记的内容去除了,但有时这些并不影响我们的使用场景,所以只要把抽取出来的标签和一些空括号、「」、『』、空书名号等去除掉即可。

import re
import sys
import codecs
def filte(input_file):
    p1 = re.compile('[(\(][,;。?!\s]*[)\)]')
    p2 = re.compile('《》')
    p3 = re.compile('「')
    p4 = re.compile('」')
    p5 = re.compile('<doc (.*)>')
    p6 = re.compile('</doc>')
    p7 = re.compile('『』')
    p8 = re.compile('『')
    p9 = re.compile('』')
    p10 = re.compile('-\{.*?(zh-hans|zh-cn):([^;]*?)(;.*?)?\}-')
    outfile = codecs.open('std_' + input_file, 'w', 'utf-8')
    with codecs.open(input_file, 'r', 'utf-8') as myfile:
        for line in myfile:
            line = p1.sub('', line)
            line = p2.sub('', line)
            line = p3.sub('“', line)
            line = p4.sub('”', line)
            line = p5.sub('', line)
            line = p6.sub('', line)
            line = p7.sub('', line)
            line = p8.sub('“', line)
            line = p9.sub('”', line)
            line = p10.sub('', line)
if __name__ == '__main__':
    input_file = sys.argv[1]

保存后执行 python filte.py wiki_00 即可进行二次处理。



# -*- coding: utf-8 -*-
from gensim.corpora import WikiCorpus
import os
class Config:
    data_path = '/home/qw/CodeHub/Word2Vec/zhwiki'
    zhwiki_bz2 = 'zhwiki-latest-pages-articles.xml.bz2'
    zhwiki_raw = 'zhwiki_raw.txt'
def data_process(_config):
    i = 0
    output = open(os.path.join(_config.data_path, _config.zhwiki_raw), 'w')
    wiki = WikiCorpus(os.path.join(_config.data_path, _config.zhwiki_bz2), lemmatize=False, dictionary={})
    for text in wiki.get_texts():
        output.write(' '.join(text) + '\n')
        i += 1
        if i % 10000 == 0:
            print('Saved ' + str(i) + ' articles')
    print('Finished Saved ' + str(i) + ' articles')
config = Config()



$ opencc -i zhwiki_raw.txt -o zhswiki_raw.txt -c t2s.json



$ python -m jieba -d " " ./zhswiki_raw.txt >./zhswiki_cut.txt

转换成 utf-8 格式

非 UTF-8 字符会被删除

$ iconv -c -t UTF-8 -o zhwiki.utf8.txt zhwiki.zhs.txt




ImageNet 数据集是目前世界上图像识别最大的数据库,根据 WordNet 层次 结构 (目前仅限物体)组织,主要用于机器视觉领域的图像分类和目标检测。其中层次结构的每个节点由数百和数千个图像描绘,每个节点平均有超过 500 个图像,有大约 1500 万张图片,2.2 万类。 ImageNet 数据集于 2009 年由斯坦福大学的李飞飞等人在视觉科学学会(VSS)首次发布,而后自 2010 年起一年一度的 ImageNet 大规模视觉识别挑战(ILSVRC)挑战赛不断完善 ImageNet 数据集。

ImageNet.torrent  需要占用磁盘空间 860.55 GB


在ubuntu 18.04(GeForce GTX 760 4GB显存)使用Pytorch Pix2PixGAN(CUDA-10.1)

1. 参照 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 建立 `pytorch 1.0.1` 的编译环境,并解决编译时遇到的问题。

2. 依旧是推荐在 Anaconda 上建立独立的编译环境,然后执行编译:

$ sudo apt-get install git

# conda remove -n Pix2Pix --all

$ conda create -n Pix2Pix -y python=3.6.8 pip

$ source activate Pix2Pix

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja

# magma-cuda90 magma-cuda91 magma-cuda92 会编译失败 
$ conda install -c pytorch magma-cuda101

$ git clone https://github.com/pytorch/pytorch

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

# 如果不需要使用cuda的话,这里还要加上一句:export NO_CUDA=1

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ python setup.py install

# 对于开发者模式,可以使用
# python setup.py build develop

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..

# 退出环境 
$ conda deactivate

编译出错信息,参考 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 里面的介绍解决。

3. 编译安装 TorchVision

$ sudo apt-get install git

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/pytorch/vision.git

# 也可本站下载一份拷贝 wget https://www.mobibrw.com/wp-content/uploads/2019/11/vision.zip

$ cd vision

$ git checkout v0.2.1 -b v0.2.1

$ python setup.py install

# 退出环境 
$ conda deactivate

4. 检出 CycleGAN and pix2pix in PyTorch 的代码,并安装依赖

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git

# 也可本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/12/pytorch-CycleGAN-and-pix2pix.zip

$ cd pytorch-CycleGAN-and-pix2pix

# 下载人脸替换部分的数据集
$ bash datasets/download_pix2pix_dataset.sh facades

# 也可本站下载然后自己参照脚本解压缩到指定目录 https://www.mobibrw.com/wp-content/uploads/2019/12/facades.tar.gz

# 安装依赖
$ pip install pillow==6.2.1
$ pip install dominate==2.4.0
$ pip install visdom==

# 修正错误 models/networks.py
# TypeError: cuda() got an unexpected keyword argument 'device_id'
$ sed -i "s/netG\.cuda(device_id=gpu_ids\[0\])/netG.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/netD\.cuda(device_id=gpu_ids\[0\])/netD.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/network\.cuda(device_id=gpu_ids\[0\])/network.cuda(gpu_ids[0])/g" models/base_model.py

# 开启WEB服务,主要是第一次运行需要下载部分辅助软件包,
# 训练之前需要执行,否则下面训练的时候会报错
$ python -m visdom.server & 

# 等待屏幕上出现 “You can navigate to http://localhost:8097” 代表服务启动成功

# 执行训练
$ bash scripts/train_pix2pix.sh


Traceback (most recent call last):
  File "train.py", line 47, in <module>
    errors = model.get_current_errors()
  File "~/pytorch-CycleGAN-and-pix2pix/models/pix2pix_model.py", line 122, in get_current_errors
    return OrderedDict([('G_GAN', self.loss_G_GAN.data[0]),
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

这个原因是由于 PyTorch 版本差异造成的,(作者在 `Pytorch 0.4.1` 版本上测试,我们在 `Pytorch 1.0.1` 版本上测试),执行如下命令修复:

#loss_G_GAN.data[0] 替换为 loss_G_GAN.item()

$ sed -i "s/self\.loss_G_GAN\.data\[0]/self.loss_G_GAN.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_G_L1\.data\[0]/self.loss_G_L1.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_real\.data\[0]/self.loss_D_real.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_fake\.data\[0]/self.loss_D_fake.item()/g" models/pix2pix_model.py

5. 测试训练结果

$ bash scripts/test_pix2pix.sh

# 观察结果需要打开 ./results/facades_pix2pix/test_latest/index.html


在ubuntu 18.04(GeForce GTX 760 4GB显存)使用MaskTextSpotter(CUDA-10.1)进行训练

参考 在ubuntu 18.04(GeForce GTX 760 4GB显存)编译/测试MaskTextSpotter(CUDA-10.1) 建立能运行的测试环境。

由于测试集使用的是 icdar2013 ,因此,务必保证已经可以在 icdar2013 数据集中进行测试。


1. 修改训练脚本,默认情况下,训练脚本中使用了 8 张卡进行训练,我们只有一张卡,因此要调整训练参数

$ cd MaskTextSpotter

$ export ROOT_PATH=`pwd`

$ sed -i 's/nproc_per_node=8/nproc_per_node=1/g' train.sh

2. 下载训练集 MaskTextSpotter 默认使用的是 SynthText 数据集进行训练,需要先下载这个数据集,大约 40GB

$ mkdir datasets

$ cd datasets

$ sudo apt-get install aria2

$ aria2c -c -j16 -s16 -x16 --follow-torrent=mem -o 'hyperai.torrent' 'https://hyper.ai/tracker/download?torrent=7783'

# 也可下载种子文件 wget https://www.mobibrw.com/wp-content/uploads/2019/11/SynthText.zip

3. 解压缩 SynthText 数据集到指定目录

$ mkdir synthtext

$ unzip SynthText/data/SynthText.zip -d synthtext

# 目录改名
$ mv synthtext/SynthText synthtext/train_images

4. 下载转换后的 SynthText 数据集索引文件,上面解压缩出来的索引是 .mat 扩展名的文件,我们需要转换成 MaskTextSpotter 需要的数据索引文件,作者提供了一份已经转换好的文件,我们直接下载并使用这个文件即可,这个文件大概要 1.6GB 的样子。

$ aria2 -c https://1drv.ms/u/s!ArsnjfK83FbXgb5vgOOVPYywgCWuQw?e=UPuNTa

# 解压缩到指定目录
$ tar -xvf SynthText_GT_E2E.tar.gz -C synthtext

# 目录改名
$ mv synthtext/SynthText_GT_E2E synthtext/train_gts

5. 生成训练文件 train_list.txt

import os

path = 'train_images'

train_list = 'train_list.txt'

tf = open(train_list, 'w')

for root, dirs, files in os.walk(path):
    files = [f for f in files if not f[0] == '.']
    dirs[:] = [d for d in dirs if not d[0] == '.']
    # use files and dirs
    for file_name in files:
        fn = os.path.join(root, file_name)
        fn = fn.replace('./', '')
        fn = fn.replace(path + '/', '')
        ext = os.path.splitext(fn)[1]
        if '.jpg' == ext :  
            tf.write(fn + '\n')



$ cd synthtext

$ python gen_train.py



# 减少一次性加载图片数量,解决“OSError: [Errno 24] Too many open files”
# 参数设置为 0 代表从主进程加载图片资源
$ sed -i "s/NUM_WORKERS: 4/NUM_WORKERS: 0/g" configs/pretrain.yaml

# 调整训练参数,对于单个GPU来说,默认参数太大了,会导致GPU内存不足
# 解决 “RuntimeError: CUDA out of memory.”
$ sed -i "s/IMS_PER_BATCH: 8/IMS_PER_BATCH: 1/g" configs/pretrain.yaml

# 修正错误 “AttributeError: module 'torch' has no attribute 'bool'”
# 从Pytorch 1.2开始,torch.uint8被修改为torch.bool,如果是低于 Pytorch 1.2的版本
# 需要修改为torch.uint8
$ sed -i "s/torch.bool/torch.uint8/g" maskrcnn_benchmark/modeling/rpn/inference.py
$ sed -i "s/torch.bool/torch.uint8/g" maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py

# 修改SOLVER设置上的GPU相关参数
# https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14
# 官方参考建议单个GPU的学习速率是0.0025但是实际运行中会报错,调整为0.0015可以正常运行
$ sed -i "s/BASE_LR: 0.01/BASE_LR: 0.0015/g" configs/pretrain.yaml

# 4GB 显存设置为 8 ,8GB显存可以设置为64/128 
$ sed -i "s/MASK_BATCH_SIZE_PER_IM: 512/MASK_BATCH_SIZE_PER_IM: 8/g" configs/pretrain.yaml

# 目前在RTX 2070 Super 8GB显存版本上测试来看,使用
# “WEIGHT: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl” 
# 的配置情况下,BASE_LR可以设置为 0.0025 , MASK_BATCH_SIZE_PER_IM 可以设置为 128 

# 进入运行环境
$ source activate MaskTextSpotter

$ bash train.sh

注意,我们在 configs/pretrain.yaml 加载的权重文件是 "WEIGHT: "./outputs/finetune/model_finetune.pth" ,这个权重文件是从 SynthText 训练得来的,那么这个"model_finetune.pth"是怎么生成的呢?

作者没有详细介绍,我们从 masktextspotter.caffe2 项目的配置文件中可以知道,这个文件其实是从 " WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl" 开始生成的。这个文件也可以从本站下载 R-50.pkl

R-50.pkl: converted copy of MSRA’s original ResNet-50 model


  TYPE: generalized_rcnn
  CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
  MASK_ON: True
  NAME: shrink++
  WEIGHT_DECAY: 0.0001
  LR_POLICY: steps_with_decay
  BASE_LR: 0.005   #synth
  GAMMA: 0.1
  MAX_ITER: 200000
  STEPS: [0, 120000]
  FPN_ON: True
  RPN_ASPECT_RATIOS: (0.5, 1, 2)
  ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
  ROI_MASK_HEAD: text_mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
  RESOLUTION: 28  # (output mask resolution) default 14
  ROI_XFORM_RESOLUTION: 14  # default 7
  ROI_XFORM_SAMPLING_RATIO: 2  # default 0
  DILATION: 1  # default 2
  CONV_INIT: MSRAFill  # default GaussianFill
  IS_E2E: True
  WEIGHT_WH: True  ## default is false 

  RPN_PRE_NMS_TOP_N: 2000  # Per FPN level

  ##################### pre-train on synth ##########################
  WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
  DATASETS: ('synth_train', )
  SCALES: (800,)
  MAX_SIZE: 1333
  MIX_TRAIN: False

  ######################  Fine tune   #################################
  # MIX_TRAIN: True
  # WEIGHTS: ./train/synth_train/pretrain_model/model_iter159999.pkl
  # DATASETS: ('totaltext_train', 'scut-eng-char_train', 'synth_train', 'icdar2013_train', 'icdar2015_train')
  # USE_CHARANNS: [False, True, True, True, False]
  # # the ratios of synth, icdar2013, icdar2015 is 2:1:1, defaultly
  # # MIX_RATIOS: [0.125, 0.125, 0.5, 0.125, 0.125]
  # MIX_RATIOS: [1.0 / 6, 1.0 / 6, 1.0 / 3, 1.0 / 6, 1.0 / 6]
  # SCALES: (600, 800, 1000)
  # MAX_SIZE: 1333
  # # # SCALES: (800,)
  # # # MAX_SIZE: 1333

  aug: False
  saturation_prob: 0.5
  saturation_lower: 0.5
  saturation_upper: 1.5
  hue_prob: 0.5
  hue_delta: 18
  lighting_noise_prob: 0.5
  contrast_prob: 0.5
  contrast_lower: 0.5
  contrast_upper: 1.5
  brightness_prob: 0.5
  brightness_delta: 32
  rotate_prob: 0.5
  rotate_delta: 15
  OUTPUT_POLYGON: False # only set to True for totaltext
  WEIGHTS: ./train/shrink++_finetune/model_iter79999.pkl
  DATASETS: ('icdar2015_test',)
  SCALES: (1000,)
  MAX_SIZE: 3333
  NMS: 0.5
  RPN_PRE_NMS_TOP_N: 1000  # Per FPN level
  VIS: False
    ENABLED: False
    SCORE_HEUR: UNION  # AVG NOTE: cannot use AVG for e2e model
    COORD_HEUR: UNION  # AVG NOTE: cannot use AVG for e2e model
    H_FLIP: False
    SCALES: (800,)
    MAX_SIZE: 2000
    SCALE_H_FLIP: False
    AREA_TH_LO: 2500   # 50^2
    AREA_TH_HI: 32400  # 180^2
    ENABLED: False
    H_FLIP: False
    SCALES: (1600,)
    MAX_SIZE: 3333
    SCALE_H_FLIP: False
    AREA_TH: 32400  # 180^2
    ENABLED: True
    VOTE_TH: 0.9
    ENABLED: False


对于 4GB 显存的机器来说,由于显存非常有限,导致非常可能在运行的途中出现 "RuntimeError: CUDA out of memory." ,目前测试来看,继续执行命令即可。

训练结果存储在 outputs/pretrain 目录下,训练结果会在训练到一定阶段之后,存储到这个目录下。

如果出现类似如下错误,请适当减少学习速率 BASE_LR

如果出现类似如下错误,请适当减少学习速率 BASE_LR

~/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [2,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
[... similar error messages repeated many times ...]
Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 166, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 76, in train
  File "~/MaskTextSpotter/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
    loss_dict = model(images, targets)
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/rpn.py", line 94, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/rpn.py", line 110, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/inference.py", line 138, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/inference.py", line 113, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "~/MaskTextSpotter/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered


pytorch 1.0.1在ubuntu 18.04(Lenveo Thinkpad T440)编译(CUDA-9.1.85)

目前由于 `CUDA-9.1.85` 已经不支持 `Femi` 架构了。



会导致全部的 `.cu` 文件会全部编译失败,我们只能是从 `CUDA-8.x` 上进行编译。

老老实实装一个 `ubuntu 16.04` 编译吧,实体机或者 `nvidia-docker` ,都可以试试。

目前 `ubuntu 18.04` 上使用 `sudo  apt-get install nvidia-cuda-toolkit` 安装的是 `9.1.85` 版本的 `nvidia cuda` , 尽管版本比较老,但是好在稳定性好,适用范围广。

当我们的项目需要使用指定版本的 `pytorch` 的时候,目前官方提供的编译好的 `nvidia cuda` 安装包并不兼容全部的硬件。这个在实际环境中是比较麻烦的。


如果显卡是几年前的显卡(GeForce GTX 760  Compute Capability = 3.0 / GeForce GT 720M  Lenveo Thinkpad T440 Compute Capability = 2.1),运行的时候会提示:

Found GPU0 GeForce GTX 760 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.


RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

硬件的计算能力查询 Recommended GPU for Developers


安装官方软件源的 `cuda-9.1.85`, 高版本的显卡驱动不支持:

# 卸载 nvidia-340 驱动,切换到开源的Nouveau驱动,否则在后面安装 nvidia-cuda-toolkit 会存在冲突
$ sudo apt-get remove nvidia-340

# 安装系统自带的cuda
$ sudo apt-get install nvidia-cuda-toolkit

# 安装390版本驱动
$ sudo apt-get install nvidia-driver-390

# 更新驱动之后,一定要重启系统,否则可能会出现各种莫名的异常
$ sudo reboot


$ sudo apt-get install nvidia-cuda-toolkit 
正在读取软件包列表... 完成
正在读取状态信息... 完成       
nvidia-cuda-toolkit 已经是最新版 (9.1.85-3ubuntu1)。
您也许需要运行“apt --fix-broken install”来修正上面的错误。
 libcuinj64-9.1 : 依赖: libcuda1 (>= 387.26) 或
E: 有未能满足的依赖关系。请尝试不指明软件包的名字来运行“apt --fix-broken install”(也可以指定一个解决办法)。

并且` sudo apt --fix-broken install`无效,则执行强制包清除命令:

$ sudo dpkg -P nvidia-340

Lenveo T440 Compute Capability = 2.1 不支持 `cuDNN` ,因此没必要安装 , 其实连最新版本的 `CUDA-10.1` 也不能安装,原因在于 `NVIDIA GT 720M` 的驱动只支持到 `390` 版本,而 `CUDA-10.1` 需 `418` 以上的版本才能支持,具体表现在于系统启动后没有加载显卡驱动,`dmesg` 可以查看到如下信息:

[   72.533870] NVRM: The NVIDIA GeForce GT 720M GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 430.50 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[   72.533875] NVRM: No NVIDIA graphics adapter found!


切换 `GCC` 版本到 `GCC-5`

$ sudo apt install gcc-5

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 70

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 60

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50

$ sudo apt install g++-5

$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 70 

$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 60 

$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50

$ sudo update-alternatives --config g++

# 一定要退出当前运行的SHELL,否则环境变量可能没有刷新
$ exit


依旧是推荐在 Anaconda 上建立独立的编译环境,然后执行编译:

$ sudo apt-get install git

# conda remove -n pytorch --all

$ conda create -n pytorch -y python=3.6.8 pip

$ source activate pytorch

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja
$ conda install -c soumith magma-cuda80 cudatoolkit=8.0

$ git clone https://github.com/pytorch/pytorch

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

# 如果不需要使用cuda的话,这里还要加上一句:export NO_CUDA=1

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

$ export CUDA_HOST_COMPILER=/usr/bin/gcc-5

$ export CUDAHOSTCXX=/usr/bin/gcc-5

$ export CMAKE_CXX_COMPILER=/usr/bin/gcc-5

# 调整代码,修正一系列已知的编译问题,代码要求6.0以上的GCC编译,否则报错,我们直接把这个要求降级到5.0
$ sed -i "s/6.0.0/5.0.0/g" cmake/MiscCheck.cmake

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ python setup.py install

# 对于开发者模式,可以使用
# python setup.py build develop

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..


[ 68%] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o
~/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(58): error: more than one instance of function "at::native::sparse::cuda::cusparseGetErrorString" matches the argument list:
            function "cusparseGetErrorString(cusparseStatus_t)"
            function "at::native::sparse::cuda::cusparseGetErrorString(cusparseStatus_t)"
            argument types are: (cusparseStatus_t)

则需要调整代码 `aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu`, 在其中的 `cusparseGetErrorString` 函数上增加 `#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))`


#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))
const char* cusparseGetErrorString(cusparseStatus_t status) {
      return "success";

      return "library not initialized";

      return "resource allocation failed";

      return "an invalid numeric value was used as an argument";

      return "an absent device architectural feature is required";

      return "an access to GPU memory space failed";

      return "the GPU program failed to execute";

      return "an internal operation failed";

      return "the matrix type is not supported by this function";

      return "an entry of the matrix is either structural zero or numerical zero (singular block)";

      return "unknown error";

这样解决跟 `CUDA-10.1`自带函数的冲突问题。

具体参考: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu


# conda uninstall pytorch

$ pip uninstall torch

$ python setup.py clean

Pytorch 代码下载非常缓慢,可以本站下载同步好的pytorch源代码


在ubuntu 18.04(GeForce GTX 760 4GB显存)编译/测试MaskTextSpotter(CUDA-10.1)

如果需要运行 `MaskTextSpotter`, 最少需要 `4GB` 显存,低于这个容量,运行不起来。

安装最新版本的 `cuda-10.1`,低版本的编译会出问题:

# 卸载之前已经安装的cuda
$ sudo apt-get remove nvidia-cuda-toolkit

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$ sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$ sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub

$ sudo apt-get update

$ sudo apt-get -y install cuda

# 部分驱动可能会更新,需要执行更新,否则可能依旧不正常
$ sudo apt-get dist-upgrade

$ sudo apt-get autoremove

# 可能需要删除一下XWindow的配置文件,否则驱动可能不能正常加载
$ sudo rm -rf ~/.Xauthority 

# 如果出现如下错误
# ubuntu 18.04 "nvidia-340 导致 /usr/lib/x86_64-linux-gnu/libGL.so.1 
# 转移到 /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib"
# 参考 http://www.mobibrw.com/?p=21739 

# 删除安装源,可以节约几个GB的磁盘,安装完成后这部分已经用不上了
$ sudo apt-get remove --purge cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00 

$ sudo apt-get update
# 部分驱动可能会更新,需要执行更新,否则可能依旧不正常
$ sudo apt-get dist-upgrade
$ sudo apt-get autoremove


# first, make sure that your conda is setup properly with the right environment
# for that, check that `which conda`, `which pip` and `which python` points to the
# right path. From a clean conda env, this is what you need to do

# conda remove -n MaskTextSpotter --all

$ conda create -n MaskTextSpotter -y python=3.6.8 pip

编译安装 Pytoch

$ sudo apt-get install git

# 进入运行环境
$ source activate MaskTextSpotter

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja

# magma-cuda90 magma-cuda91 magma-cuda92 会编译失败 
$ conda install -c pytorch magma-cuda101

$ git clone https://github.com/pytorch/pytorch

# 也可直接本站下载一份同步好的代码 wget https://www.mobibrw.com/wp-content/uploads/2019/11/pytorch.zip

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行 
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

$ pip uninstall pytorch

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ TORCH_CUDA_ARCH_LIST="3.0" python setup.py install

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..

# 退出环境
$ conda deactivate 


[ 68%] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o
~/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(58): error: more than one instance of function "at::native::sparse::cuda::cusparseGetErrorString" matches the argument list:
            function "cusparseGetErrorString(cusparseStatus_t)"
            function "at::native::sparse::cuda::cusparseGetErrorString(cusparseStatus_t)"
            argument types are: (cusparseStatus_t)

则需要调整代码 `aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu`, 在其中的 `cusparseGetErrorString` 函数上增加 `#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))`


#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))
const char* cusparseGetErrorString(cusparseStatus_t status) {
      return "success";

      return "library not initialized";

      return "resource allocation failed";

      return "an invalid numeric value was used as an argument";

      return "an absent device architectural feature is required";

      return "an access to GPU memory space failed";

      return "the GPU program failed to execute";

      return "an internal operation failed";

      return "the matrix type is not supported by this function";

      return "an entry of the matrix is either structural zero or numerical zero (singular block)";

      return "unknown error";

这样解决跟 `CUDA-10.1`自带函数的冲突问题。

具体参考: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu

编译安装 TorchVision

$ sudo apt-get install git

# 进入运行环境
$ source activate MaskTextSpotter

$ git clone https://github.com/pytorch/vision.git

# 也可本站下载一份拷贝 wget https://www.mobibrw.com/wp-content/uploads/2019/11/vision.zip

$ cd vision

$ git checkout v0.2.1 -b v0.2.1

$ python setup.py install

# 退出环境 
$ conda deactivate


$ source activate MaskTextSpotter

# this installs the right pip and dependencies for the fresh python
$ conda install ipython pip

# python dependencies
$ pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX


# install pycocotools
$ git clone https://github.com/cocodataset/cocoapi.git
$ cd cocoapi/PythonAPI
$ python setup.py build_ext install

# 本站下载 https://www.mobibrw.com/wp-content/uploads/2019/11/cocoapi.zip

# install apex (optional)
$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ python setup.py install --cuda_ext --cpp_ext

# 本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/11/apex.zip

# clone repo
$ git clone https://github.com/MhLiao/MaskTextSpotter.git
$ cd MaskTextSpotter

# 本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/11/MaskTextSpotter.zip

# build
$ python setup.py build develop



# 创建目录(源代码根目录)
$ mkdir outputs

$ cd outputs

$ mkdir finetune

$ cd finetune

# 下载已经训练好的模型 https://drive.google.com/open?id=1pPRS7qS_K1keXjSye0kksqhvoyD0SARz

# 本站下载
$ wget https://www.mobibrw.com/wp-content/uploads/2019/11/model_finetune.zip

$ unzip model_finetune.zip

$ cd ../../

$ mkdir datasets

$ cd datasets

# 下载 icdar2013 数据集
$ wget https://www.mobibrw.com/wp-content/uploads/2019/11/icdar2013.zip

$ unzip icdar2013.zip

$ cd icdar2013

# 下载测试集文件
$ git clone https://github.com/zazaliu/ICDAR2PASCAL_VOC.git

# 本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/11/ICDAR2PASCAL_VOC.zip

$ cp -r ICDAR2PASCAL_VOC/ICDAR2015/ch4_training_localization_transcription_gt/ test_gts

# 执行测试

$ cd ../../

# 预先删除生成的文件,否则可能会启动之后就崩溃退出
$ rm -rf outputs/finetune/inference/

$ bash test.sh


  File "tools/test_net.py", line 95, in <module>
  File "tools/test_net.py", line 89, in main
  File "~/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 380, in inference
    predictions = compute_on_dataset(model, data_loader, device)
  File "~/MaskTextSpotter/maskrcnn_benchmark/engine/text_inference.py", line 55, in compute_on_dataset
    for i, batch in tqdm(enumerate(data_loader)):
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/tqdm/std.py", line 1091, in __iter__
    for obj in iterable:
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "~/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 32, in __getitem__
  File "~/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 94, in load_gt_from_txt
    strs, loc = self.line2boxes(line)
  File "~/MaskTextSpotter/maskrcnn_benchmark/data/datasets/icdar.py", line 153, in line2boxes
    loc = np.vstack(v).transpose()
  File "<__array_function__ internals>", line 6, in vstack
  File "~.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/numpy/core/shape_base.py", line 282, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 1


import numpy as np

line = '478,239,511,241,511,255,478,253,$5,000'
def line2boxes(line):
    parts = line.strip().split(',')
    if '\xef\xbb\xbf' in parts[0]:
        parts[0] = parts[0][3:]
    if '\ufeff' in parts[0]:
        parts[0] = parts[0].replace('\ufeff', '')
    x1 = np.array([int(float(x)) for x in parts[::9]])
    y1 = np.array([int(float(x)) for x in parts[1::9]])
    x2 = np.array([int(float(x)) for x in parts[2::9]])
    y2 = np.array([int(float(x)) for x in parts[3::9]])
    x3 = np.array([int(float(x)) for x in parts[4::9]])
    y3 = np.array([int(float(x)) for x in parts[5::9]])
    x4 = np.array([int(float(x)) for x in parts[6::9]])
    y4 = np.array([int(float(x)) for x in parts[7::9]])
    strs = parts[8::9]
    loc = np.vstack((x1, y1, x2, y2, x3, y3, x4, y4)).transpose()
    return strs, loc



import numpy as np

line = '478,239,511,241,511,255,478,253,$5,000'
def line2boxes(line):
    parts = line.strip().split(',', 8)
    if '\xef\xbb\xbf' in parts[0]:
        parts[0] = parts[0][3:]
    if '\ufeff' in parts[0]:
        parts[0] = parts[0].replace('\ufeff', '')
    x1 = np.array([int(float(x)) for x in parts[::9]])
    y1 = np.array([int(float(x)) for x in parts[1::9]])
    x2 = np.array([int(float(x)) for x in parts[2::9]])
    y2 = np.array([int(float(x)) for x in parts[3::9]])
    x3 = np.array([int(float(x)) for x in parts[4::9]])
    y3 = np.array([int(float(x)) for x in parts[5::9]])
    x4 = np.array([int(float(x)) for x in parts[6::9]])
    y4 = np.array([int(float(x)) for x in parts[7::9]])
    strs = parts[8::9]
    loc = np.vstack((x1, y1, x2, y2, x3, y3, x4, y4)).transpose()
    return strs, loc




pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1)

目前 `ubuntu 18.04` 上使用 `sudo  apt-get install nvidia-cuda-toolkit` 安装的是 `9.1.85` 版本的 `nvidia cuda` , 尽管版本比较老,但是好在稳定性好,适用范围广。

当我们的项目需要使用指定版本的 `pytorch` 的时候,目前官方提供的编译好的 `nvidia cuda` 安装包并不兼容全部的硬件。这个在实际环境中是比较麻烦的。


如果显卡是几年前的显卡(GeForce GTX 760  Compute Capability = 3.0 / GeForce GT 720M  Lenveo Thinkpad T440 Compute Capability = 2.1),运行的时候会提示:

Found GPU0 GeForce GTX 760 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.
The minimum cuda capability that we support is 3.5.


RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

硬件的计算能力查询 Recommended GPU for Developers


安装最新版本的 `cuda-10.1`,低版本的编译会出问题:

# 卸载之前已经安装的cuda
$ sudo apt-get remove nvidia-cuda-toolkit

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

$ wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$ sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$ sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub

$ sudo apt-get update

$ sudo apt-get -y install cuda

# 部分驱动可能会更新,需要执行更新,否则可能依旧不正常
$ sudo apt-get dist-upgrade

$ sudo apt-get autoremove

# 可能需要删除一下XWindow的配置文件,否则驱动可能不能正常加载
$ sudo rm -rf ~/.Xauthority 

# 如果出现如下错误
# ubuntu 18.04 "nvidia-340 导致 /usr/lib/x86_64-linux-gnu/libGL.so.1 
# 转移到 /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib"
# 参考 http://www.mobibrw.com/?p=21739 

# 删除安装源,可以节约几个GB的磁盘,安装完成后这部分已经用不上了
$ sudo apt-get remove --purge cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00 

$ sudo apt-get update

# 部分驱动可能会更新,需要执行更新,否则可能依旧不正常
$ sudo apt-get dist-upgrade

$ sudo apt-get autoremove

安装 `cuDNN` 去官网下载对应版本的 `cuDNN` 一共是三个安装包,逐个安装即可。

Lenveo T440 Compute Capability = 2.1 不支持 `cuDNN` ,因此没必要安装 , 其实连最新版本的 `CUDA-10.1` 也不能安装,原因在于 `NVIDIA GT 720M` 的驱动只支持到 `390` 版本,而 `CUDA-10.1` 需 `418` 以上的版本才能支持,具体表现在于系统启动后没有加载显卡驱动,`dmesg` 可以查看到如下信息:

[   72.533870] NVRM: The NVIDIA GeForce GT 720M GPU installed in this system is
               NVRM:  supported through the NVIDIA 390.xx Legacy drivers. Please
               NVRM:  visit http://www.nvidia.com/object/unix.html for more
               NVRM:  information.  The 430.50 NVIDIA driver will ignore
               NVRM:  this GPU.  Continuing probe...
[   72.533875] NVRM: No NVIDIA graphics adapter found!


依旧是推荐在 Anaconda 上建立独立的编译环境,然后执行编译:

$ sudo apt-get install git

# conda remove -n pytorch --all

$ conda create -n pytorch -y python=3.6.8 pip

$ source activate pytorch

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja

# magma-cuda90 magma-cuda91 magma-cuda92 会编译失败 
$ conda install -c pytorch magma-cuda101

$ git clone https://github.com/pytorch/pytorch

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

# 如果不需要使用cuda的话,这里还要加上一句:export NO_CUDA=1

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ python setup.py install

# 对于开发者模式,可以使用
# python setup.py build develop

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..


[ 68%] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o
~/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(58): error: more than one instance of function "at::native::sparse::cuda::cusparseGetErrorString" matches the argument list:
            function "cusparseGetErrorString(cusparseStatus_t)"
            function "at::native::sparse::cuda::cusparseGetErrorString(cusparseStatus_t)"
            argument types are: (cusparseStatus_t)

则需要调整代码 `aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu`, 在其中的 `cusparseGetErrorString` 函数上增加 `#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))`


#if (!((CUSPARSE_VER_MAJOR >= 10) && (CUSPARSE_VER_MINOR >= 2)))
const char* cusparseGetErrorString(cusparseStatus_t status) {
      return "success";

      return "library not initialized";

      return "resource allocation failed";

      return "an invalid numeric value was used as an argument";

      return "an absent device architectural feature is required";

      return "an access to GPU memory space failed";

      return "the GPU program failed to execute";

      return "an internal operation failed";

      return "the matrix type is not supported by this function";

      return "an entry of the matrix is either structural zero or numerical zero (singular block)";

      return "unknown error";

这样解决跟 `CUDA-10.1`自带函数的冲突问题。

具体参考: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu


# conda uninstall pytorch

$ pip uninstall torch

$ python setup.py clean

Pytorch 代码下载非常缓慢,可以本站下载同步好的pytorch源代码


树莓派4B使用ARM Compute Library运行AlexNet

# Install Build Tools 
$ pip install scons 

# Reload Environment
$ source ~/.profile

# Clone Compute Library 
$ git clone https://github.com/Arm-software/ComputeLibrary.git 

# or wget https://www.mobibrw.com/wp-content/uploads/2019/10/ComputeLibrary.zip

# Enter ComputeLibrary folder 
$ cd ComputeLibrary  

# Build the library and the examples 
$ scons Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 os=linux arch=armv7a -j4 

# Run on the Raspberry Pi
$ export LD_LIBRARY_PATH=build/ 

# Download AlexNet

# Install unzip
$ sudo apt-get install unzip

# Download the zip file with the AlexNet model, input images and labels
$ wget https://armkeil.blob.core.windows.net/developer/developer/technologies/Machine%20learning%20on%20Arm/Tutorials/Running%20AlexNet%20on%20Pi%20with%20Compute%20Library/compute_library_alexnet.zip

# or wget https://www.mobibrw.com/wp-content/uploads/2019/10/compute_library_alexnet.zip

# Create a new folder
$ mkdir assets_alexnet

# Unzip
$ unzip compute_library_alexnet.zip -d assets_alexnet

$ PATH_ASSETS=./assets_alexnet 

$ ./build/examples/graph_alexnet 0 $PATH_ASSETS  $PATH_ASSETS/go_kart.ppm $PATH_ASSETS/labels.txt

Why GEMM is at the heart of deep learning

I spend most of my time worrying about how to make deep learning with neural networks faster and more power efficient. In practice that means focusing on a function called GEMM. It’s part of the BLAS (Basic Linear Algebra Subprograms) library that was first created in 1979, and until I started trying to optimize neural networks I’d never heard of it.
