ubuntu 22.04系统Docker和Nvidia-docker的安装、测试,及运行GUI应用


Docker文档:https://docs.docker.com/,Docker安装指南: Install Docker Engine on Ubuntu


Uninstall old versions
~$ sudo apt-get remove docker docker-engine docker.io containerd runc

~$ sudo apt-get install curl
Install using the repository

~$ curl https://get.docker.com | sh \
  && sudo systemctl --now enable docker
Verify that Docker Engine is installed correctly by running the hello-world image.
~$ sudo docker run hello-world


# 启动docker服务
$ sudo service docker start

# Docker: hello-world
$ sudo docker run hello-world


Usage: service docker {start|stop|restart|status}

$ sudo docker images

$ sudo docker container ls -a

$ nvidia-smi
Thu Nov 26 10:34:37 2020       
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   37C    P8     9W / 190W |    301MiB /  5931MiB |      2%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A       942      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      2278      G   /usr/lib/xorg/Xorg                 96MiB |
|    0   N/A  N/A      2404      G   /usr/bin/gnome-shell              150MiB |
|    0   N/A  N/A      4051      G   /usr/lib/firefox/firefox            3MiB |

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ sudo apt-get update

$ sudo apt-get install -y nvidia-container-toolkit

$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

$ docker pull nvidia/cudagl:11.0-base
# 测试
$ docker run --rm --gpus all nvidia/cudagl:11.0-base nvidia-smi
Thu Nov 26 02:30:34 2020       
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   37C    P8    10W / 190W |    307MiB /  5931MiB |     13%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |


$ sudo docker run --runtime=nvidia --rm nvidia/cudagl:11.0-base nvidia-smi

Docker 容器 GUI

$ sudo apt-get install x11-xserver-utils

# 关闭权限控制,允许其他X客户端绘制
$ xhost +
access control disabled, clients can connect from any host

$ docker run -e DISPLAY=$DISPLAY -e GDK_SCALE -e GDK_DPI_SCAL -v /tmp/.X11-unix:/tmp/.X11-unix --rm -it container-name-or-id

$ sudo apt-get install x11-xserver-utils

$ nvidia-smi

$ docker pull nvidia/cudagl:11.0-base

# 关闭权限控制,允许其他X客户端绘制
$ xhost +
access control disabled, clients can connect from any host

$ sudo docker run --rm --runtime=nvidia -it -e DISPLAY=$DISPLAY -e GDK_SCALE -e GDK_DPI_SCAL -v /tmp/.X11-unix:/tmp/.X11-unix nvidia/cudagl:11.0-base

$ apt-get update

$ apt-get install mesa-utils

$ glxgears


$ sudo apt-get install x11-xserver-utils

$ nvidia-smi

$ docker pull nvidia/cudagl:10.2-base

# 关闭权限控制,允许其他X客户端绘制
$ xhost +
access control disabled, clients can connect from any host

$ sudo docker run -it --name isaac --runtime=nvidia -it -e DISPLAY=$DISPLAY -e GDK_SCALE -e GDK_DPI_SCAL -v /tmp/.X11-unix:/tmp/.X11-unix -v /data/isaac:/data/isaac nvidia/cudagl:10.2-base

$ apt-get update

$ apt-get install mesa-utils

$ glxgears

$ exit

$ sudo docker ps -a

$ sudo docker start isaac

$ sudo docker attach isaac


Nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory





下图表示了小孔成像模型(图片及公式参考 OpenCV官方资料


  1. 世界坐标系:其坐标原点可视情况而定,可以表示空间的物体,单位为长度单位,比如MM(毫米),用矩阵$\begin{bmatrix} X_W \\ Y_W \\Z_W \end{bmatrix}$表示;
  2. 相机坐标系:以摄像机光心为原点(在针孔模型中也就是针孔为中心),z轴与光轴重合,也就是z轴指向相机的前方(与成像平面垂直),x轴与y轴的正方向与世界坐标系平行,单位为长度单位,比如MM(毫米),用矩阵$\begin{bmatrix}X_c \\ Y_c \\ Z_c\end{bmatrix}$表示;
  3. 图像物理坐标系(也叫成像平面坐标系):用物理长度单位表示像素的位置,坐标原点为摄像机光轴与图像物理坐标系的交点位置,单位为长度单位,比如MM(毫米),用矩阵$\begin{bmatrix}x \\ y \end{bmatrix}$表示。
  4. 像素坐标系:坐标原点在左上角,以像素为单位,有明显的范围限制,即用于表示全画面的像素长和像素长宽,矩阵$\begin{bmatrix}u \\ v \end{bmatrix}$表示。

以下公式描述了$\begin{bmatrix}u & v \end{bmatrix}^T$、$\begin{bmatrix}x & y \end{bmatrix}^T$、$\begin{bmatrix}X_c & Y_c & Z_c\end{bmatrix}^T$和$\begin{bmatrix}X_W & Y_W & Z_W \end{bmatrix}^T$之间的转换关系。

$z\begin{bmatrix}u \\ v\\ 1 \end{bmatrix}= \begin{bmatrix}1/d_x&0&c_x\\0&1/d_y&c_y\\0&0&1 \end{bmatrix} \begin{bmatrix}f&0&0\\ 0&f&0\\ 0&0&1 \end{bmatrix} \begin{bmatrix}r11&r12&r13&t1\\ r21&r22&r23&t2\\ r31&r32&r33&t3 \end{bmatrix} \begin{bmatrix}X_W \\ Y_W \\Z_W \\ 1\end{bmatrix}$

以上公式中,$d_x$和$d_y$表示1个像素有多少长度,即用传感器的尺寸除以像素数量,比如2928.384um * 2205.216um的传感的分辨率为2592 * 1944,每个像素的大小即约1.12um。

由于相机与物体的视角来看,都是三维坐标,因此两者之间的变换只需要进行矩阵的旋转、平移即可达到坐标系转换的目的(不同坐标系中,物体的绝对大小并不会随着坐标系的变化而变化,因此不涉及缩放处理)。对于变换矩阵  $\begin{bmatrix}r11&r12&r13&t1\\ r21&r22&r23&t2\\ r31&r32&r33&t3 \end{bmatrix}$ 需要理解,矩阵是由 3*3 的旋转矩阵 r (rotation) 和 3*1的平移向量 t (translation)组成。


$\frac{X_c}{x} = \frac{Y_c}{y} = \frac{Z_c}{f}$ 即$x=X_c/(\frac{Z_c}{f})$ $y=Y_c/(\frac{Z_c}{f})$,可见:$f$越大,$x$和$y$越大,$Z_c$越大,$x$和$y$越小。




$z\begin{bmatrix}u \\ v\\ 1 \end{bmatrix} = \begin{bmatrix}f_x&0&c_x\\0&f_y&c_y\\0&0&1 \end{bmatrix} \begin{bmatrix}x \\ y\\ 1 \end{bmatrix} = K\begin{bmatrix}x \\ y\\ 1 \end{bmatrix}$


$\begin{bmatrix}x \\ y\\ z \end{bmatrix}=K^{-1} \begin{bmatrix}u \\ v\\ 1 \end{bmatrix}$


$\begin{bmatrix}X_c \\ Y_c\\ Z_c \end{bmatrix} = R \begin{bmatrix}X \\ Y\\ Z \end{bmatrix} + t$


$\begin{bmatrix}X \\ Y\\ Z \end{bmatrix} = \begin{bmatrix}X_c \\ Y_c \\ Z_c \end{bmatrix}R^{-1} - t R^{-1}= z\begin{bmatrix}x\\ y\\ 1 \end{bmatrix}R^{-1} - t R^{-1}$



void cameraToWorld(InputArray cameraMatrix, InputArray rV, InputArray tV, vector<Point2f> imgPoints, vector<Point3f> &worldPoints)
    Mat invK64, invK;
    invK64 = cameraMatrix.getMat().inv();
    invK64.convertTo(invK, CV_32F);
    Mat r, t, rMat;
    rV.getMat().convertTo(r, CV_32F);
    tV.getMat().convertTo(t, CV_32F);
    Rodrigues(r, rMat);

    //计算 invR * T
    Mat invR = rMat.inv();
    //cout << "invR\n" << invR << endl;
    //cout << "t\n" << t << t.t() << endl;
    Mat transPlaneToCam;
    if(t.size() == Size(1, 3)){
        transPlaneToCam = invR * t;//t.t();
    else if(t.size() == Size(3, 1)){
        transPlaneToCam = invR * t.t();
    //cout << "transPlaneToCam\n" << transPlaneToCam << endl;

    int npoints = (int)imgPoints.size();
    //cout << "npoints\n" << npoints << endl;
    for (int j = 0; j < npoints; ++j){
        Mat coords(3, 1, CV_32F);
        Point3f pt;
        coords.at<float>(0, 0) = imgPoints[j].x;
        coords.at<float>(1, 0) = imgPoints[j].y;
        coords.at<float>(2, 0) = 1.0f;
        //[x,y,z] = invK * [u,v,1]
        Mat worldPtCam = invK * coords;
        //cout << "worldPtCam:" << worldPtCam << endl;
        //[x,y,1] * invR
        Mat worldPtPlane = invR * worldPtCam;
        //cout << "worldPtPlane:" << worldPtPlane << endl;
        float scale = transPlaneToCam.at<float>(2) / worldPtPlane.at<float>(2);
        //cout << "scale:" << scale << endl;
        Mat scale_worldPtPlane(3, 1, CV_32F);
        //scale_worldPtPlane.at<float>(0, 0) = worldPtPlane.at<float>(0, 0) * scale;
        //zc * [x,y,1] * invR
        scale_worldPtPlane = scale * worldPtPlane;
        //cout << "scale_worldPtPlane:" << scale_worldPtPlane << endl;
        //[X,Y,Z]=zc*[x,y,1]*invR - invR*T
        Mat worldPtPlaneReproject = scale_worldPtPlane - transPlaneToCam;
        //cout << "worldPtPlaneReproject:" << worldPtPlaneReproject << endl;
        pt.x = worldPtPlaneReproject.at<float>(0);
        pt.y = worldPtPlaneReproject.at<float>(1);
        //pt.z = worldPtPlaneReproject.at<float>(2);
        pt.z = 1.0f;
    def cameraToWorld(self, cameraMatrix, r, t, imgPoints):
        invK = np.asmatrix(cameraMatrix).I
        rMat = np.zeros((3, 3), dtype=np.float64)
        cv2.Rodrigues(r, rMat)
        #print('rMat=', rMat)
        #计算 invR * T
        invR =  np.asmatrix(rMat).I #3*3
        #print('invR=', invR)
        transPlaneToCam = np.dot(invR , np.asmatrix(t)) #3*3 dot 3*1 = 3*1
        #print('transPlaneToCam=', transPlaneToCam)
        worldpt = []   
        coords = np.zeros((3, 1), dtype=np.float64)

        for imgpt in imgPoints:
            coords[0][0] = imgpt[0][0]
            coords[1][0] = imgpt[0][1]
            coords[2][0] = 1.0
            worldPtCam = np.dot(invK , coords)  #3*3 dot 3*1 = 3*1
            #print('worldPtCam=', worldPtCam)
            #[x,y,1] * invR
            worldPtPlane = np.dot(invR , worldPtCam) #3*3 dot 3*1 = 3*1
            #print('worldPtPlane=', worldPtPlane)
            scale = transPlaneToCam[2][0] / worldPtPlane[2][0]
            #print("scale: ", scale)
            #zc * [x,y,1] * invR
            scale_worldPtPlane = np.multiply(scale , worldPtPlane)
            #print("scale_worldPtPlane: ", scale_worldPtPlane)
            #[X,Y,Z]=zc*[x,y,1]*invR - invR*T
            worldPtPlaneReproject = np.asmatrix(scale_worldPtPlane) - np.asmatrix(transPlaneToCam)  #3*1 dot 1*3 = 3*3
            #print("worldPtPlaneReproject: ", worldPtPlaneReproject)
            pt = np.zeros((3, 1), dtype=np.float64)
            pt[0][0] = worldPtPlaneReproject[0][0]
            pt[1][0] = worldPtPlaneReproject[1][0]
            pt[2][0] = 0
        return worldpt



Vec3f eulerAngles;//欧拉角

vector<Vec3d> translation_vectors;/* 每幅图像的平移向量 */

Mat rotationMatrix = eulerAnglesToRotationMatrix(eulerAngles);

*pR_matrix = rotationMatrix;

cvRodrigues2(pR_matrix, pnew_vec, 0);   //从旋转矩阵求旋转向量

Mat mat_tmp(pnew_vec->rows, pnew_vec->cols, pnew_vec->type, pnew_vec->data.fl);

cv::Mat distortion_coeffs1 = cv::Mat(1, 5, CV_32FC1, cv::Scalar::all(0)); /* 摄像机的5个畸变系数:k1,k2,p1,p2,k3 */

projectPoints(tempPointSet, mat_tmp, translation_vectors[i], intrinsic_matrix, distortion_coeffs1, image_points2);


[0, 0, 0]//欧拉角度,表示平面和相机的角度
旋转向量:[0, 0, 0]


原始:[134.0870803179094, 132.7580766544178, 200.3789038923399]


        Size board_size = Size(11,8);
        Size square_size = Size(30, 30);
        vector<Point3f> tempPointSet;
        for (int j = 0; j<board_size.height; j++)
            for (int i = 0; i<board_size.width; i++)
                /* 假设定标板放在世界坐标系中z=0的平面上 */
                Point3f tempPoint;
                tempPoint.x = i*square_size.height;
                tempPoint.y = j*square_size.width;
                tempPoint.z = 0;


projectPoints(tempPointSet, mat_tmp, translation_vectors[i], intrinsic_matrix, distortion_coeffs1, image_points2);
cout << "原始空间点:\n" << image_points2 << endl;
手眼标定行业内分为两种形式,根据相机固定的地方不同,如果相机和机器人末端固定在一起,就称之为“眼在手”(eye in hand),如果相机固定在机器人外面的底座上,则称之为“眼在外”(eye to hand)。

eye to hand 眼在外
eye to hand 眼在外
eye to hand 眼在手
eye to hand 眼在手


1. eye in hand,这种关系下,两次运动,机器人底座和标定板的关系始终不变。求解的量为相机和机器人末端坐标系的位姿关系。

2. eye to hand,这种关系下,两次运动,机器人末端和标定板的位姿关系始终不变。求解的量为相机和机器人底座坐标系之间的位姿关系。

手眼标定eye in hand 和eye to hand 的区别主要是机器人那边,一个是end相对于base,另一个是base相对于end。千万注意。

三、AX = XB问题的求解


  • Y. Shiu, S. Ahmad Calibration of Wrist-Mounted Robotic Sensors by Solving Homogeneous Transform Equations of the Form AX = XB. In IEEE Transactions on Robotics and Automation, 5(1):16-29, 1989.
  • R. Tsai, R. Lenz A New Technique for Fully Autonomous and Efficient 3D Robotics Hand/Eye Calibration. In IEEE Transactions on Robotics and Automation, 5(3):345-358, 1989.

迭代求解及相关资料可以看看相关网上的英文教程 Calibration and Registration Techniques for Robotics 其中也有一些AX= XB的matlab代码可以使用。

ROS 下也有相关的一些package可以利用


============== Halcon 官方示例-手眼标定 ==================


相机与机器人是eye-to-hand模式,机器人为加拿大Kinova 6轴机械臂,机器人pose为基座相对于末端的x,y,z,rx,ry,rz,rw, 单位为米。姿态使用单位四元数表示。



pattern pose为标定板相对于相机的x,y,z,rx,ry,rz,rw, 单位为米。姿态使用单位四元数表示。



此Matlab文件调用数据进行离线解算。Calibration and Registration Techniques for Robotics 的这部分 Registering Two Sets of 6DoF Data with 1 Unknown,有code下载,下载好命名为shiu.m和tsai.m供下面程序调用就行。我这里贴出

function X = tsai_wechat(A,B)
% Calculates the least squares solution of
% AX = XB
% A New Technique for Fully Autonomous 
% and Efficient 3D Robotics Hand/Eye Calibration
% Lenz Tsai
% Mili Shah
% July 2014
[m,n] = size(A); n = n/4;
S = zeros(3*n,3);
v = zeros(3*n,1);
%Calculate best rotation R
for i = 1:n
    A1 = logm(A(1:3,4*i-3:4*i-1)); 
    B1 = logm(B(1:3,4*i-3:4*i-1));
    a = [A1(3,2) A1(1,3) A1(2,1)]'; a = a/norm(a);
    b = [B1(3,2) B1(1,3) B1(2,1)]'; b = b/norm(b);
    S(3*i-2:3*i,:) = skew(a+b);
    v(3*i-2:3*i,:) = a-b;
x = S\v;
theta = 2*atan(norm(x));
x = x/norm(x);
R = (eye(3)*cos(theta) + sin(theta)*skew(x) + (1-cos(theta))*x*x')';
%Calculate best translation t
C = zeros(3*n,3);
d = zeros(3*n,1);
I = eye(3);
for i = 1:n
    C(3*i-2:3*i,:) = I - A(1:3,4*i-3:4*i-1);
    d(3*i-2:3*i,:) = A(1:3,4*i)-R*B(1:3,4*i);
t = C\d;
%Put everything together to form X
X = [R t;0 0 0 1];

Jaco_handeye_test_10.m 测试程序中用到了Peter Corke老师的机器人工具箱。我的Matlab版本R2013a,利用机器人工具箱的一些转换函数(四元数的构建,欧拉角转换等),它安装和基本使用参考这里:Matlab机器人工具箱_Learning by doing-CSDN博客_matlab机器人工具箱

close all;
% Robot pose in Quatenion xyzw
 JacoCartesianPose = importdata('E:\\Matlab_work\\handeye\\yake_handeye\\2017.08.29Kinova_pose_all_8_1.txt');
 [m,n] = size(JacoCartesianPose); % 8* 7
A = cell(1,m); % 1*8
for i = 1: m
    robotHtool_qua = Quaternion([ JacoCartesianPose(i, 7), JacoCartesianPose(i, 4), JacoCartesianPose(i, 5) , JacoCartesianPose(i, 6)]) ; 
    A{1,i}  = transl(JacoCartesianPose(i, 1), JacoCartesianPose(i, 2), JacoCartesianPose(i, 3)) *  robotHtool_qua.T;
% Pattern Pose(Homogeneous) stored in  cell B.
patternInCamPose = importdata('E:\\Matlab_work\\handeye\\yake_handeye\\2017.08.29Pattern_pose_all_8_1.txt');
[melem,nelem] = size(patternInCamPose); % 8*7
for x=1: melem
    camHmarker_qua = Quaternion([ patternInCamPose(x, 7) , patternInCamPose(x, 4), patternInCamPose(x, 5) , patternInCamPose(x, 6)])    ;
    B{1,x} = transl(patternInCamPose(x, 1), patternInCamPose(x, 2), patternInCamPose(x, 3)) *  camHmarker_qua.T;
%--------------------- 8 -----------------------------------
 for j=[1: m-1]% 1-7.
     TA{1, j} = inv(A{1, j}) * A{1, j+1};    
     M1=[M1 TA{1, j}];
     TB{1, j} = B{1, j} * inv(B{1, j+1});
     M2=[M2 TB{1, j}];
 % M1=[TA{1,1} TA{1,2} TA{1,3} TA{1,4} TA{1,5} TA{1,6} TA{1,7} ];
 % M2=[TB{1,1} TB{1,2} TB{1,3} TB{1,4} TB{1,5} TB{1,6} TB{1,7} ];
%--------------------- 8 -----------------------------------
C_Tsai=tsai(M1, M2)
T_Tsai =  (transl(C_Tsai))';
C_Tsai_rad = tr2rpy(C_Tsai);
C_Tsai_rpy_rx_ry_rz =rad2deg(C_Tsai_rad);
fprintf('Tsai(rad) = \n%f, %f, %f, %f, %f, %f\n',T_Tsai(1,1), T_Tsai(1,2), T_Tsai(1,3), C_Tsai_rad(1,1), C_Tsai_rad(1,2), C_Tsai_rad(1,3));
fprintf('Tsai(deg) = \n%f, %f, %f, %f, %f, %f\n\n',T_Tsai(1,1), T_Tsai(1,2), T_Tsai(1,3), C_Tsai_rpy_rx_ry_rz(1,1), C_Tsai_rpy_rx_ry_rz(1,2), C_Tsai_rpy_rx_ry_rz(1,3));
fprintf('Tsai(Qxyzw) = \n %f, %f, %f, %f\n\n',  Q_Tsai_Qxyzw.v(1),  Q_Tsai_Qxyzw.v(2), Q_Tsai_Qxyzw.v(3), Q_Tsai_Qxyzw.s);

稍微解释一下,程序做的就是读入机器人和相机的两两姿态信息,转换为4x4 的齐次变换矩阵,送入tsai.m程序求解。

手眼标定eye in hand 和eye to hand 的区别主要是机器人那边,一个是end相对于base,另一个是base相对于end。千万注意。






//Solve equation:AX=b
#include <cv.h>
using namespace std;
using namespace cv;
int main(int argc, char** argv)
	printf("\nSolve equation:AX=b\n\n");
	//Mat A = (Mat_<float>(6, 3) <<
	//480.8, 639.4, 1,
	//227.1, 317.5, 1,
	//292.4, 781.6, 1,
	//597.4, 1044.1, 1,
	//557.7, 491.6, 1,
	//717.8, 263.7, 1
	//		 );// 4x3
	//Mat B = (Mat_<float>(6, 3) <<
	//197170, 185349, 1,
	//195830, 186789, 1,
	//196174, 184591, 1,
	//197787, 183176, 1,
	//197575, 186133, 1,
	//198466, 187335, 1
	//		 );
	Mat A = (Mat_<float>(4, 3) << 
	2926.36, 2607.79, 1, 
	587.093, 2616.89, 1, 
	537.031, 250.311, 1, 
	1160.53, 1265.21, 1);// 4x3
	Mat B = (Mat_<float>(4, 3) << 
	320.389, 208.197, 1,
	247.77, 209.726, 1,
	242.809, 283.182, 1,
	263.152, 253.715, 1);
	Mat X;
	cout << "A=" << endl << A << endl;
	cout << "B=" << endl << B << endl;
	solve(A, B, X, CV_SVD);
	cout << "X=" << endl << X << endl;
	Mat a1 = (Mat_<float>(1, 3) << 1864, 1273, 1);
	Mat b1 = a1*X;
	cout << "b1=" << endl << b1 << endl;
	cout << "真实值为:" << "283.265, 253.049, 1" << endl;
	return 0;

OpenCV: Operations on arrays

#include "pch.h"
#include <iostream>
#include <cv.h>
using namespace cv;
int main(int argc, char** argv)
	printf("\nSolve equation:AX=b\n\n");
	Point2f srcTri[3];
	srcTri[0] = Point2f(480.8f, 639.4f);
	srcTri[1] = Point2f(227.1f, 317.5f);
	srcTri[2] = Point2f(292.4f, 781.6f);
	Point2f dstTri[3];
	dstTri[0] = Point2f(197170.0f, 185349.0f);
	dstTri[1] = Point2f(195830.0f, 186789.0f);
	dstTri[2] = Point2f(196174.0f, 184591.0f);
	Mat warp_mat = getAffineTransform(srcTri, dstTri);
	// 计算图像坐标
	std::cout << "img_corners:\n" << srcTri << std::endl;
	std::cout << "robot_corners:\n" << dstTri << std::endl;
	std::cout << warp_mat << std::endl;
	//[5.284835627018051, -0.00236967559639317, 194630.5662011061;
	//0.4056177440315953, -4.793119669651512, 188218.6997054448]
	return 0;

================= Eye in hand 数据及Ground truth =========================

Marker in Camera 八组数据,单位:米及弧度,姿态用的是RotVector表示


Robot end-effector in Base 八组数据,单位:米及弧度,姿态用的是RotVector表示


Ground truth:Camera in end-effector

Calibration Result:

-0.005816946773 -0.997849042178 -0.065295115862 0.053457653403

0.999355366046 -0.003487653357 -0.035730779845 -0.009463725064

0.035426197714 -0.065460868457 0.997226082298 -0.045113271057

0 0 0 1

RotVector(rad): -0.023440 -0.079412 1.57466

AngVector(rad): 1.576836 <-0.014865, -0.050362, 0.998620>

AngVector(deg): 90.346026 <-0.014865, -0.050362, 0.998620>

rx_ry_rz_rw: -0.010543 -0.079412 1.574660 0.704968

rx_ry_rz_rw_norm: -0.010543 -0.035718 0.708260 0.704968


nvidia-smi GPU性能状态(Performance State)含义

我正在使用Nvidia GTX Titan X进行深度学习实验。


Performance State
The current performance state for the GPU. States range from P0 (maximum performance) to P12 (minimum performance).


我的问题是,为什么我的GPU闲置时处于P0状态,但是在执行繁重的计算任务时切换到P2? 不应该相反吗?











要始终"强制" P0电源状态,可以尝试通过nvidia-smi工具尝试持久性模式和应用程序时钟。使用nvidia-smi --help或nvidia-smi的手册页了解选项。

尽管我认为这通常不适用于Tesla GPU,但除非特别设置更高的应用时钟,否则某些NVIDIA GPU可能会在计算负载下将自身限制为P2功耗状态。使用nvidia-smi -a命令查看可用于GPU的当前应用程序时钟,默认应用程序时钟和最大时钟。 (某些GPU(包括较旧的GPU)可能会在其中某些字段中显示N / A。这通常表明应用程序时钟无法通过nvidia-smi进行修改。)如果在计算负载期间卡似乎以P2状态运行,则可能通过将应用程序时钟增加到最大可用时钟(即最大时钟),可以将其增加到P0状态。使用nvidia-smi --help了解如何格式化命令以更改GPU上的应用程序时钟。修改应用程序时钟或启用可修改的应用程序时钟可能需要root / admin特权。设置GPU持久模式也可能是理想的或必要的。这将防止驱动程序在GPU活动期间"卸载",这可能导致驱动程序重新加载时重置应用程序时钟。



ubuntu 21.10(GeForce GTX 3060 12GB)编译StyleGAN3


# 清理全部的其他版本的nvidia驱动
$ sudo apt-get purge nvidia-*

$ sudo reboot

# nvidia-smi
$ sudo apt install nvidia-utils-470

# 驱动
$ sudo apt install nvidia-driver-470

# cuda 11.3 
$ sudo apt install nvidia-cuda-toolkit

$ sudo apt-get update

# 部分驱动可能会更新,需要执行更新,否则可能依旧不正常
$ sudo apt-get dist-upgrade

$ sudo apt-get autoremove

# 重启,否则部分驱动可能工作不正常
$ sudo reboot

Anaconda 上建立独立的编译环境,然后执行编译:

# wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
# 国内镜像下载
$ wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2021.11-Linux-x86_64.sh

$ bash Anaconda3-*-Linux-x86_64.sh

# 更新到最新版本
$ conda update -n base -c defaults conda

参考 Anaconda conda切换为国内源  加速下载。


$ sudo apt-get install git

$ git clone git@github.com:NVlabs/stylegan3.git

$ cd stylegan3

$ conda env create -f environment.yml

$ conda activate stylegan3

$ pip install psutil

# cudnn加速
$ conda install cudnn

# 目前测试 RTX 3060 12GB的情况下,batch建议是2,更高会报告OOM
# 并且当batch低于4的时候,需要同时指定 --mbstd-group=2
$ python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/metfaces-1024x1024.zip --gpus=1 --batch=2 --mbstd-group=2 --gamma=8.2 --mirror=1 --metrics=none


Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Failed!
Traceback (most recent call last):
  File "~/source/stylegan3/train.py", line 286, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "~/source/stylegan3/train.py", line 281, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "~/source/stylegan3/train.py", line 96, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "~/source/stylegan3/train.py", line 47, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "~/source/stylegan3/training/training_loop.py", line 168, in training_loop
    img = misc.print_module_summary(G, [z, c])
  File "~/source/stylegan3/torch_utils/misc.py", line 216, in print_module_summary
    outputs = module(*inputs)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "~/source/stylegan3/training/networks_stylegan3.py", line 511, in forward
    ws = self.mapping(z, c, truncation_psi=truncation_psi, truncation_cutoff=truncation_cutoff, update_emas=update_emas)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "~/source/stylegan3/training/networks_stylegan3.py", line 151, in forward
    x = getattr(self, f'fc{idx}')(x)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1071, in _call_impl
    result = forward_call(*input, **kwargs)
  File "~/source/stylegan3/training/networks_stylegan3.py", line 100, in forward
    x = bias_act.bias_act(x, b, act=self.activation)
  File "~/source/stylegan3/torch_utils/ops/bias_act.py", line 84, in bias_act
    if impl == 'cuda' and x.device.type == 'cuda' and _init():
  File "~/source/stylegan3/torch_utils/ops/bias_act.py", line 41, in _init
    _plugin = custom_ops.get_plugin(
  File "~/source/stylegan3/torch_utils/custom_ops.py", line 136, in get_plugin
    torch.utils.cpp_extension.load(name=module_name, build_directory=cached_build_dir,
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1080, in load
    return _jit_compile(
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1318, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1701, in _import_module_from_library
    module = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 565, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1173, in create_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: ~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/lib/../../../../libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by ~/.cache/torch_extensions/bias_act_plugin/3cb576a0039689487cfba59279dd6d46-nvidia-geforce-rtx-3060/bias_act_plugin.so)

上述报错产生的原因是在 Anaconda 下载的包,在进行编译的时候,使用了高版本的 libstdc++.so。而运行时却使用了Anaconda 环境里低版本的 libstdc++.so 导致报错。

了解了原因,解决方法就比较简单了,可以手工升级 Anaconda 环境下的 libstdc++.so 动态库。


$ conda activate stylegan3

$ conda install cmake

$ conda install make

# 关键升级命令,更新当前项目里面的 libstdc++.so
$ conda install -c conda-forge libstdcxx-ng

# 删除上次失败时候的编译缓存
$ rm -rf ~/.cache

# 目前测试 RTX 3060 12GB的情况下,batch建议是2,更高会报告OOM
# 当batch=4的时候会在第11天的时候报告OOM
# 并且当batch低于4的时候,需要同时指定 --mbstd-group=2
$ python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=~/datasets/metfaces-1024x1024.zip --gpus=1 --batch=2 --mbstd-group=2 --gamma=8.2 --mirror=1 --metrics=none


tick 444   kimg 1776.0   time 11d 17h 14m  sec/tick 2292.6  sec/kimg 573.16  maintenance 0.2    cpumem 5.40   gpumem 7.69   reserved 10.03  augment 0.344
Traceback (most recent call last):
  File "~/source/stylegan3/train.py", line 286, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "~/source/stylegan3/train.py", line 281, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "~/source/stylegan3/train.py", line 96, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "~/source/stylegan3/train.py", line 47, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "~/source/stylegan3/training/training_loop.py", line 278, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "~/source/stylegan3/training/loss.py", line 81, in accumulate_gradients
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
  File "~/anaconda3/envs/stylegan3/lib/python3.9/site-packages/torch/autograd/function.py", line 87, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "~/source/stylegan3/torch_utils/ops/grid_sample_gradfix.py", line 50, in backward
    grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid)
  File "~/source/stylegan3/torch_utils/ops/grid_sample_gradfix.py", line 59, in forward
    grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False)
RuntimeError: CUDA out of memory. Tried to allocate 1.39 GiB (GPU 0; 11.76 GiB total capacity; 7.06 GiB already allocated; 443.88 MiB free; 10.02 GiB reserved in total by PyTorch)



ImageNet 数据集是目前世界上图像识别最大的数据库,根据 WordNet 层次 结构 (目前仅限物体)组织,主要用于机器视觉领域的图像分类和目标检测。其中层次结构的每个节点由数百和数千个图像描绘,每个节点平均有超过 500 个图像,有大约 1500 万张图片,2.2 万类。 ImageNet 数据集于 2009 年由斯坦福大学的李飞飞等人在视觉科学学会(VSS)首次发布,而后自 2010 年起一年一度的 ImageNet 大规模视觉识别挑战(ILSVRC)挑战赛不断完善 ImageNet 数据集。

ImageNet.torrent  需要占用磁盘空间 860.55 GB


在ubuntu 18.04(GeForce GTX 760 4GB显存)使用Pytorch Pix2PixGAN(CUDA-10.1)

1. 参照 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 建立 `pytorch 1.0.1` 的编译环境,并解决编译时遇到的问题。

2. 依旧是推荐在 Anaconda 上建立独立的编译环境,然后执行编译:

$ sudo apt-get install git

# conda remove -n Pix2Pix --all

$ conda create -n Pix2Pix -y python=3.6.8 pip

$ source activate Pix2Pix

$ conda install numpy pyyaml mkl=2019.1 mkl-include=2019.1 setuptools cmake cffi typing pybind11

$ conda install ninja

# magma-cuda90 magma-cuda91 magma-cuda92 会编译失败 
$ conda install -c pytorch magma-cuda101

$ git clone https://github.com/pytorch/pytorch

$ cd pytorch

# pytorch 1.0.1 版本支持“Compute Capability” 低于3.0版本的硬件,pytorch 1.2.0需要至少3.5版本的硬件才可以正常运行
# https://github.com/pytorch/pytorch/blob/v1.3.0/torch/utils/cpp_extension.py
$ git checkout v1.0.1 -b v1.0.1

$ git submodule sync

$ git submodule update --init --recursive

$ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

# 如果不需要使用cuda的话,这里还要加上一句:export NO_CUDA=1

$ python setup.py clean

# 卸载以前安装的pytorch
$ conda uninstall pytorch

# 从Nvidia开发网站查询到自己硬件对应的“Compute Capability” 
# 比如 “GeForce GTX 760” 对应 “3.0” 计算能力,能力不正确会导致运行异常
# RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device

$ python setup.py install

# 对于开发者模式,可以使用
# python setup.py build develop

# 一定要退出 pytorch 的编译目录,在pytorch代码目录下执行命令会出现异常
$ cd ..

# 退出环境 
$ conda deactivate

编译出错信息,参考 pytorch 1.0.1在ubuntu 18.04(GeForce GTX 760)编译(CUDA-10.1) 里面的介绍解决。

3. 编译安装 TorchVision

$ sudo apt-get install git

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/pytorch/vision.git

# 也可本站下载一份拷贝 wget https://www.mobibrw.com/wp-content/uploads/2019/11/vision.zip

$ cd vision

$ git checkout v0.2.1 -b v0.2.1

$ python setup.py install

# 退出环境 
$ conda deactivate

4. 检出 CycleGAN and pix2pix in PyTorch 的代码,并安装依赖

# 进入运行环境
$ source activate Pix2Pix

$ git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git

# 也可本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/12/pytorch-CycleGAN-and-pix2pix.zip

$ cd pytorch-CycleGAN-and-pix2pix

# 下载人脸替换部分的数据集
$ bash datasets/download_pix2pix_dataset.sh facades

# 也可本站下载然后自己参照脚本解压缩到指定目录 https://www.mobibrw.com/wp-content/uploads/2019/12/facades.tar.gz

# 安装依赖
$ pip install pillow==6.2.1
$ pip install dominate==2.4.0
$ pip install visdom==

# 修正错误 models/networks.py
# TypeError: cuda() got an unexpected keyword argument 'device_id'
$ sed -i "s/netG\.cuda(device_id=gpu_ids\[0\])/netG.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/netD\.cuda(device_id=gpu_ids\[0\])/netD.cuda(gpu_ids[0])/g" models/networks.py

$ sed -i "s/network\.cuda(device_id=gpu_ids\[0\])/network.cuda(gpu_ids[0])/g" models/base_model.py

# 开启WEB服务,主要是第一次运行需要下载部分辅助软件包,
# 训练之前需要执行,否则下面训练的时候会报错
$ python -m visdom.server & 

# 等待屏幕上出现 “You can navigate to http://localhost:8097” 代表服务启动成功

# 执行训练
$ bash scripts/train_pix2pix.sh


Traceback (most recent call last):
  File "train.py", line 47, in <module>
    errors = model.get_current_errors()
  File "~/pytorch-CycleGAN-and-pix2pix/models/pix2pix_model.py", line 122, in get_current_errors
    return OrderedDict([('G_GAN', self.loss_G_GAN.data[0]),
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

这个原因是由于 PyTorch 版本差异造成的,(作者在 `Pytorch 0.4.1` 版本上测试,我们在 `Pytorch 1.0.1` 版本上测试),执行如下命令修复:

#loss_G_GAN.data[0] 替换为 loss_G_GAN.item()

$ sed -i "s/self\.loss_G_GAN\.data\[0]/self.loss_G_GAN.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_G_L1\.data\[0]/self.loss_G_L1.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_real\.data\[0]/self.loss_D_real.item()/g" models/pix2pix_model.py

$ sed -i "s/self\.loss_D_fake\.data\[0]/self.loss_D_fake.item()/g" models/pix2pix_model.py

5. 测试训练结果

$ bash scripts/test_pix2pix.sh

# 观察结果需要打开 ./results/facades_pix2pix/test_latest/index.html


在ubuntu 18.04(GeForce GTX 760 4GB显存)使用MaskTextSpotter(CUDA-10.1)进行训练

参考 在ubuntu 18.04(GeForce GTX 760 4GB显存)编译/测试MaskTextSpotter(CUDA-10.1) 建立能运行的测试环境。

由于测试集使用的是 icdar2013 ,因此,务必保证已经可以在 icdar2013 数据集中进行测试。


1. 修改训练脚本,默认情况下,训练脚本中使用了 8 张卡进行训练,我们只有一张卡,因此要调整训练参数

$ cd MaskTextSpotter

$ export ROOT_PATH=`pwd`

$ sed -i 's/nproc_per_node=8/nproc_per_node=1/g' train.sh

2. 下载训练集 MaskTextSpotter 默认使用的是 SynthText 数据集进行训练,需要先下载这个数据集,大约 40GB

$ mkdir datasets

$ cd datasets

$ sudo apt-get install aria2

$ aria2c -c -j16 -s16 -x16 --follow-torrent=mem -o 'hyperai.torrent' 'https://hyper.ai/tracker/download?torrent=7783'

# 也可下载种子文件 wget https://www.mobibrw.com/wp-content/uploads/2019/11/SynthText.zip

3. 解压缩 SynthText 数据集到指定目录

$ mkdir synthtext

$ unzip SynthText/data/SynthText.zip -d synthtext

# 目录改名
$ mv synthtext/SynthText synthtext/train_images

4. 下载转换后的 SynthText 数据集索引文件,上面解压缩出来的索引是 .mat 扩展名的文件,我们需要转换成 MaskTextSpotter 需要的数据索引文件,作者提供了一份已经转换好的文件,我们直接下载并使用这个文件即可,这个文件大概要 1.6GB 的样子。

$ aria2 -c https://1drv.ms/u/s!ArsnjfK83FbXgb5vgOOVPYywgCWuQw?e=UPuNTa

# 解压缩到指定目录
$ tar -xvf SynthText_GT_E2E.tar.gz -C synthtext

# 目录改名
$ mv synthtext/SynthText_GT_E2E synthtext/train_gts

5. 生成训练文件 train_list.txt

import os

path = 'train_images'

train_list = 'train_list.txt'

tf = open(train_list, 'w')

for root, dirs, files in os.walk(path):
    files = [f for f in files if not f[0] == '.']
    dirs[:] = [d for d in dirs if not d[0] == '.']
    # use files and dirs
    for file_name in files:
        fn = os.path.join(root, file_name)
        fn = fn.replace('./', '')
        fn = fn.replace(path + '/', '')
        ext = os.path.splitext(fn)[1]
        if '.jpg' == ext :  
            tf.write(fn + '\n')



$ cd synthtext

$ python gen_train.py



# 减少一次性加载图片数量,解决“OSError: [Errno 24] Too many open files”
# 参数设置为 0 代表从主进程加载图片资源
$ sed -i "s/NUM_WORKERS: 4/NUM_WORKERS: 0/g" configs/pretrain.yaml

# 调整训练参数,对于单个GPU来说,默认参数太大了,会导致GPU内存不足
# 解决 “RuntimeError: CUDA out of memory.”
$ sed -i "s/IMS_PER_BATCH: 8/IMS_PER_BATCH: 1/g" configs/pretrain.yaml

# 修正错误 “AttributeError: module 'torch' has no attribute 'bool'”
# 从Pytorch 1.2开始,torch.uint8被修改为torch.bool,如果是低于 Pytorch 1.2的版本
# 需要修改为torch.uint8
$ sed -i "s/torch.bool/torch.uint8/g" maskrcnn_benchmark/modeling/rpn/inference.py
$ sed -i "s/torch.bool/torch.uint8/g" maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py

# 修改SOLVER设置上的GPU相关参数
# https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14
# 官方参考建议单个GPU的学习速率是0.0025但是实际运行中会报错,调整为0.0015可以正常运行
$ sed -i "s/BASE_LR: 0.01/BASE_LR: 0.0015/g" configs/pretrain.yaml

# 4GB 显存设置为 8 ,8GB显存可以设置为64/128 
$ sed -i "s/MASK_BATCH_SIZE_PER_IM: 512/MASK_BATCH_SIZE_PER_IM: 8/g" configs/pretrain.yaml

# 目前在RTX 2070 Super 8GB显存版本上测试来看,使用
# “WEIGHT: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl” 
# 的配置情况下,BASE_LR可以设置为 0.0025 , MASK_BATCH_SIZE_PER_IM 可以设置为 128 

# 进入运行环境
$ source activate MaskTextSpotter

$ bash train.sh

注意,我们在 configs/pretrain.yaml 加载的权重文件是 "WEIGHT: "./outputs/finetune/model_finetune.pth" ,这个权重文件是从 SynthText 训练得来的,那么这个"model_finetune.pth"是怎么生成的呢?

作者没有详细介绍,我们从 masktextspotter.caffe2 项目的配置文件中可以知道,这个文件其实是从 " WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl" 开始生成的。这个文件也可以从本站下载 R-50.pkl

R-50.pkl: converted copy of MSRA’s original ResNet-50 model


  TYPE: generalized_rcnn
  CONV_BODY: FPN.add_fpn_ResNet50_conv5_body
  MASK_ON: True
  NAME: shrink++
  WEIGHT_DECAY: 0.0001
  LR_POLICY: steps_with_decay
  BASE_LR: 0.005   #synth
  GAMMA: 0.1
  MAX_ITER: 200000
  STEPS: [0, 120000]
  FPN_ON: True
  RPN_ASPECT_RATIOS: (0.5, 1, 2)
  ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
  ROI_MASK_HEAD: text_mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs
  RESOLUTION: 28  # (output mask resolution) default 14
  ROI_XFORM_RESOLUTION: 14  # default 7
  ROI_XFORM_SAMPLING_RATIO: 2  # default 0
  DILATION: 1  # default 2
  CONV_INIT: MSRAFill  # default GaussianFill
  IS_E2E: True
  WEIGHT_WH: True  ## default is false 

  RPN_PRE_NMS_TOP_N: 2000  # Per FPN level

  ##################### pre-train on synth ##########################
  WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
  DATASETS: ('synth_train', )
  SCALES: (800,)
  MAX_SIZE: 1333
  MIX_TRAIN: False

  ######################  Fine tune   #################################
  # MIX_TRAIN: True
  # WEIGHTS: ./train/synth_train/pretrain_model/model_iter159999.pkl
  # DATASETS: ('totaltext_train', 'scut-eng-char_train', 'synth_train', 'icdar2013_train', 'icdar2015_train')
  # USE_CHARANNS: [False, True, True, True, False]
  # # the ratios of synth, icdar2013, icdar2015 is 2:1:1, defaultly
  # # MIX_RATIOS: [0.125, 0.125, 0.5, 0.125, 0.125]
  # MIX_RATIOS: [1.0 / 6, 1.0 / 6, 1.0 / 3, 1.0 / 6, 1.0 / 6]
  # SCALES: (600, 800, 1000)
  # MAX_SIZE: 1333
  # # # SCALES: (800,)
  # # # MAX_SIZE: 1333

  aug: False
  saturation_prob: 0.5
  saturation_lower: 0.5
  saturation_upper: 1.5
  hue_prob: 0.5
  hue_delta: 18
  lighting_noise_prob: 0.5
  contrast_prob: 0.5
  contrast_lower: 0.5
  contrast_upper: 1.5
  brightness_prob: 0.5
  brightness_delta: 32
  rotate_prob: 0.5
  rotate_delta: 15
  OUTPUT_POLYGON: False # only set to True for totaltext
  WEIGHTS: ./train/shrink++_finetune/model_iter79999.pkl
  DATASETS: ('icdar2015_test',)
  SCALES: (1000,)
  MAX_SIZE: 3333
  NMS: 0.5
  RPN_PRE_NMS_TOP_N: 1000  # Per FPN level
  VIS: False
    ENABLED: False
    SCORE_HEUR: UNION  # AVG NOTE: cannot use AVG for e2e model
    COORD_HEUR: UNION  # AVG NOTE: cannot use AVG for e2e model
    H_FLIP: False
    SCALES: (800,)
    MAX_SIZE: 2000
    SCALE_H_FLIP: False
    AREA_TH_LO: 2500   # 50^2
    AREA_TH_HI: 32400  # 180^2
    ENABLED: False
    H_FLIP: False
    SCALES: (1600,)
    MAX_SIZE: 3333
    SCALE_H_FLIP: False
    AREA_TH: 32400  # 180^2
    ENABLED: True
    VOTE_TH: 0.9
    ENABLED: False


对于 4GB 显存的机器来说,由于显存非常有限,导致非常可能在运行的途中出现 "RuntimeError: CUDA out of memory." ,目前测试来看,继续执行命令即可。

训练结果存储在 outputs/pretrain 目录下,训练结果会在训练到一定阶段之后,存储到这个目录下。

如果出现类似如下错误,请适当减少学习速率 BASE_LR

Traceback (most recent call last):
  File "tools/train_net.py", line 173, in <module>
  File "tools/train_net.py", line 166, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 76, in train
  File "~/MaskTextSpotter/maskrcnn_benchmark/engine/trainer.py", line 66, in do_train
    loss_dict = model(images, targets)
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 50, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/rpn.py", line 94, in forward
    return self._forward_train(anchors, objectness, rpn_box_regression, targets)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/rpn.py", line 110, in _forward_train
    anchors, objectness, rpn_box_regression, targets
  File "~/.conda/envs/MaskTextSpotter/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/inference.py", line 138, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "~/MaskTextSpotter/maskrcnn_benchmark/modeling/rpn/inference.py", line 113, in forward_for_single_feature_map
    boxlist = remove_small_boxes(boxlist, self.min_size)
  File "~/MaskTextSpotter/maskrcnn_benchmark/structures/boxlist_ops.py", line 46, in remove_small_boxes
    (ws >= min_size) & (hs >= min_size)
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered


PyTorch运行时提示'ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.'

在测试编译FOTS 的时候,出现如下错误:

(FOTS) $~/Source/FOTS.PyTorch$ bash build.sh 
Compiling crop_and_resize kernels by nvcc...
Traceback (most recent call last):
  File "build.py", line 3, in <module>
    from torch.utils.ffi import create_extension
  File "~/.conda/envs/FOTS/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 1, in <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

