继续阅读CNN 基础之卷积及其矩阵加速
CNN 基础之卷积及其矩阵加速
继续阅读CNN 基础之卷积及其矩阵加速
本教程小编和大家分享 Windows 7 系统电脑安装RNDIS驱动的正确方法,RNDIS驱动是什么? Windows 7 系统驱动RNDIS是远端网络驱动接口协议,设备通过USB方式同主机连接,模拟网络连接以便用于下载和调试工作。但是很多 Windows 7 系统用户安装RNDIS的设备时失败,遇到无法安装的问题,所以小编给大家介绍 Windows 7 系统电脑安装RNDIS驱动的正确方法。
GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory
radeon x800 has it anyways
EDIT:
Quote:
ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions
在Android Studio 2.2开始的Android Gradle Plugin版本中,Google集成了对cmake的完美支持,而原先的ndkBuild的方式支持也变得更加良好。这篇文章就来说说Android Gradle Plugin与交叉编译之间的一些事,即externalNativeBuild相关的task,主要是解读一下gradle构建系统相关的源码。
子 CMakeLists.txt
option(BUILD_FOR_ANDROID "Build For Android" OFF) if(SYSTEM.Android AND NOT BUILD_FOR_ANDROID) set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${NATIVE_LIBRARY_OUTPUT}/${ANDROID_ABI}) endif()
父 CMakeLists.txt
set(BUILD_FOR_ANDROID ON) add_subdirectory(${CHILD_ROOT_DIR}/ ${CMAKE_CURRENT_SOURCE_DIR}/build)
执行如下命令的时候:
/Users/xxxx/Library/Android/sdk/cmake/3.6.4111459/bin/cmake --trace-expand \ -H/Users/xxxx/Source/example/demo/android/app \ -B/Users/xxxx/Source/example/demo/android/app/.externalNativeBuild/cmake/debug/arm64-v8a \ -DANDROID_ABI=arm64-v8a \ -DANDROID_PLATFORM=android-21 \ -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/Users/xxxx/Source/example/demo/android/app/build/intermediates/cmake/debug/obj/arm64-v8a \ -DCMAKE_BUILD_TYPE=Debug \ -DANDROID_NDK=/Users/xxxx/Library/Android/android-ndk-r16b \ -DCMAKE_TOOLCHAIN_FILE=/Users/xxxx/Library/Android/android-ndk-r16b/build/cmake/android.toolchain.cmake \ -DCMAKE_MAKE_PROGRAM=/Users/xxxx/Library/Android/sdk/cmake/3.6.4111459/bin/ninja \ -G"Android Gradle - Ninja" \ -DANDROID_ARM_NEON=TRUE \ -DANDROID_TOOLCHAIN=gcc \ -DANDROID_PLATFORM=android-21 \ -DANDROID_STL=gnustl_shared
会观察到生成的配置文件中 BUILD_FOR_ANDROID
不一定能生效。
需要如下配置才行:
父 CMakeLists.txt
set(BUILD_FOR_ANDROID ON CACHE BOOL "" FORCE) add_subdirectory(${CHILD_ROOT_DIR}/ ${CMAKE_CURRENT_SOURCE_DIR}/build)
C and C++ compilers aren’t the fastest pieces of software out there and there’s no lack of programmer jokes based on tedium of waiting for their work to complete.
There are ways to fix the pain though - one of them is ccache. CCache improves compilation times by caching previously built object files in private cache and reusing them when you’re recompiling same objects with same parameters. Obviously it will not help if you’re compiling the code for the first time and it also won’t help if you often change compilation flags. Most C/C++ development however involves recompiling same object files with the same parameters and ccache helps alot.
For illustration, here’s the comparison of first and subsequent compilation times of a largish C++ project:
Original run with empty cache:
$ make -j9 ... real 0m56.684s user 5m31.996s sys 0m41.638s
Recompilation with warm cache:
$ make -j9 ... real 0m5.929s user 0m11.896s sys 0m8.722s
CCache is available in repositories on pretty much all distributions. On OS X use homebrew:
$ brew install ccache
and on Debian-based distros use apt:
$ apt-get install ccache
After ccache is installed, you need to tell CMake to use it as a wrapper for the compiler. Add these lines to your CMakeLists.txt
:
# Configure CCache if available find_program(CCACHE_FOUND ccache) if(CCACHE_FOUND) set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE ccache) set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK ccache) endif(CCACHE_FOUND)
Rerun cmake
and next make
should use ccache for wrapper.
CCache can even be used on Android NDK - you just need to export NDK_CCACHE
environment variable with path to ccache binary. ndk-build
script will automatically use it. E.g.
$ export NDK_CCACHE=/usr/local/bin/ccache $ ndk-build -j9
(Note that on Debian/Ubuntu the path will probably be /usr/bin/ccache
)
To see if ccache is really working, you can use ccache -s
command, which will display ccache statistics:
cache directory /Users/jernej/.ccache primary config /Users/jernej/.ccache/ccache.conf secondary config (readonly) /usr/local/Cellar/ccache/3.2.2/etc/ccache.conf cache hit (direct) 77826 cache hit (preprocessed) 17603 cache miss 46999 called for link 18 compile failed 45 ccache internal error 1 preprocessor error 62 unsupported source language 204 files in cache 48189 cache size 1.2 GB max cache size 20.0 GB
On second and all subsequent compilations the “cache hit” values should increase and thus show that ccache is working.
# 安装编译工具macOS Mojave(10.14.3) $ brew install arm-linux-gnueabihf-binutils # bison on macOS is too old $ brew install bison $ export PATH="/usr/local/opt/bison/bin:$PATH" # 安装 crosstool-ng 构建GCC编译环境 $ brew install crosstool-ng $ export CT_NG_VER=$(brew list --versions crosstool-ng | tr ' ' '\n' | tail -1) $ export CT_NG_VER_SHORT=${CT_NG_VER%_*} # 安装的 crosstool-ng 的脚本文件缺少执行权限,导致无法执行,我们需要手工增加执行权限 $ chmod +x "$(brew --cellar crosstool-ng)/${CT_NG_VER}/lib/crosstool-ng-${CT_NG_VER_SHORT}/scripts/crosstool-NG.sh" # 默认情况下,macOS的文件系统不区分大小写,我们需要手工创建一个区分大小写的分区 $ hdiutil create -volname "ClockworkOS" -type SPARSE -fs 'Case-sensitive Journaled HFS+' -size 30g ClockworkOS.dmg $ hdiutil attach ClockworkOS.dmg.sparseimage -mountpoint /Volumes/ClockworkOS $ cd /Volumes/ClockworkOS $ mkdir arm-cortexa9_neon-linux $ cd arm-cortexa9_neon-linux $ ct-ng list-samples # 变更x-tools存储目录 $ export HOME=/Volumes/ClockworkOS $ ct-ng arm-cortexa9_neon-linux-gnueabihf # 修复BUG Build failed in step 'Installing m4 for build' $ brew uninstall --ignore-dependencies binutils $ brew install binutils # 安装依赖工具 $ brew install automake $ brew uninstall --ignore-dependencies gawk $ brew install gawk # 目前编译gettext-0.19.8.1的时候写死依赖automake-1.15,但是最新的已经是automake-1.16,我们通过手工编译安装automake-1.15规避这个问题 $ wget http://ftp.gnu.org/gnu/automake/automake-1.15.tar.gz # 也可从本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/03/automake-1.15.tar.gz $ tar xvf automake-1.15.tar.gz $ cd automake-1.15 $ bash configure $ make && make install $ cd .. # 修改文件打开数量限制,修正错误 “extra-module.mk:11: *** Too many open files.” $ ulimit -n 2048 # 'scm_new_port_table_entry' was not declared in this scope $ sed -i "" "s/CT_GDB_CROSS_EXTRA_CONFIG_ARRAY=.*/CT_GDB_CROSS_EXTRA_CONFIG_ARRAY=\"--with-guile=no\"/g" .config $ export PATH="/usr/local/bin:$PATH" $ ct-ng build -j8
编译 u-boot
$ cd /Volumes/ClockworkOS # 下载u-boot代码 $ git clone https://github.com/qemu/u-boot.git $ cd u-boot $ git checkout v2019.01 -b v2019.01 $ export PATH="/Volumes/ClockworkOS/x-tools/arm-cortexa9_neon-linux-gnueabihf/bin:$PATH" $ export CROSS_COMPILE=arm-cortexa9_neon-linux-gnueabihf- $ make clean # R16又名A33 ,R16-J 代表包含Jazelle DBX $ make vexpress_ca9x4_defconfig # fix Undefined symbols for architecture x86_64: "_PyArg_ParseTuple" $ export HOSTLDFLAGS="-lpython -dynamclib" $ brew install gnu-sed # fix ./tools/../lib/bch.c:66:10: fatal error: 'endian.h' file not found $ gsed -i "s/#include <sys\/endian.h>/#include <sys\/endian.h>\n#elif defined(__APPLE__)\n#include <machine\/endian.h>\n#include <libkern\/OSByteOrder.h>/g" lib/bch.c $ gsed -i "s/#define cpu_to_be32 htobe32/#if defined(__APPLE__)\n#define cpu_to_be32 OSSwapHostToBigInt32\n#else\n#define cpu_to_be32 htobe32\n#endif/g" lib/bch.c $ gsed -i "s/#if \!defined(__DragonFly__) \&\& \!defined(__FreeBSD__)/#if \!defined(__DragonFly__) \&\& \!defined(__FreeBSD__) \&\& \!defined(__APPLE__)/g" lib/bch.c # 无视最后的失败提示,只要u-boot这个文件生成即可 $ make ARCH=arm -j8
编译 Linux
内核
$ cd /Volumes/ClockworkOS $ brew install aria2 $ aria2c -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.14.2.tar.xz # 也可本站下载 wget https://www.mobibrw.com/wp-content/uploads/2019/03/linux-4.14.2.tar.xz $ tar xvf linux-4.14.2.tar.xz $ cd linux-4.14.2 $ export PATH="/Volumes/ClockworkOS/x-tools/arm-cortexa9_neon-linux-gnueabihf/bin:$PATH" # for mkimage $ export PATH="/Volumes/ClockworkOS/u-boot/tools:$PATH" # 或者 brew install u-boot-tools # elf.h $ brew install libelf $ echo " #include <libelf/libelf.h> #define R_386_NONE 0 #define R_386_32 1 #define R_386_PC32 2 #define R_ARM_NONE 0 #define R_ARM_PC24 1 #define R_ARM_ABS32 2 #define R_MIPS_NONE 0 #define R_MIPS_16 1 #define R_MIPS_32 2 #define R_MIPS_REL32 3 #define R_MIPS_26 4 #define R_MIPS_HI16 5 #define R_MIPS_LO16 6 #define EF_ARM_EABIMASK 0xFF000000 #define EF_ARM_EABI_VERSION(flags) ((flags) & EF_ARM_EABIMASK)" > /usr/local/include/elf.h # xargs: illegal option -- r $ brew install findutils $ export PATH="/usr/local/opt/findutils/libexec/gnubin:$PATH" # stat: illegal option -- c $ ln -s /usr/local/bin/gstat /usr/local/bin/stat $ export PATH="/usr/local/bin:$PATH" $ export CROSS_COMPILE=arm-cortexa9_neon-linux-gnueabihf- $ export ARCH=arm $ make vexpress_defconfig $ make -j8 $ mkimage -A arm -O linux -T kernel -C none -a 0x40008000 -e 0x40008000 -n "Linux kernel" -d arch/arm/boot/zImage uImage
$ cd /Volumes/ClockworkOS $ brew install aria2 # 官方给出的这个地址下不到,只能用镜像地址 http://106.185.33.196/clockworkos_v0.3.img.bz2 $ aria2c -c http://clockworkpi.k15.net/clockworkos_v0.3.img.bz2 $ rm -rf clockworkos_v0.3.img $ bzip2 -d -k -vvvv clockworkos_v0.3.img.bz2 # 替换镜像中的内核文件 $ hdiutil attach clockworkos_v0.3.img -mountpoint /Volumes/clockworkos_v0.3 $ echo y | cp -i /Volumes/ClockworkOS/linux-4.14.2/uImage /Volumes/clockworkos_v0.3/uImage $ hdiutil detach /Volumes/clockworkos_v0.3 $ brew install qemu $ qemu-img convert -f raw -O qcow2 clockworkos_v0.3.img clockworkos_v0.3.qcow2
手工编译 qemu
$ cd /Volumes/ClockworkOS $ git clone https://github.com/qemu/qemu.git $ cd qemu # 从 qemu v2.1.0-rc1 开始,内存需要被映射到0x60000000开始的地址,更低的地址被映射为只读闪存,我们需要取消这种映射行为,否则执行的时候会报告错误 $ sed -i "" "s/\[VE_NORFLASHALIAS\] = 0/\[VE_NORFLASHALIAS\] = -1/g" hw/arm/vexpress.c $ bash configure $ make -j8 $ cd .. # list supported machine `qemu-system-arm -machine help` $ /Volumes/ClockworkOS/qemu/arm-softmmu/qemu-system-arm -M vexpress-a9 -m 1024M -kernel /Volumes/ClockworkOS/u-boot/u-boot -serial mon:stdio -nographic -sd clockworkos_v0.3.qcow2 -net nic,model=lan9118 -net user
可惜到这一步了,还是没办法成功运行系统。
If you're building software for the Raspberry Pi (like I sometimes do), it can be a pain to have to constantly keep Pi hardware around and spotting Pi-specific problems can be difficult until too late.
One option (and the one I most like) is to emulate a Raspberry Pi locally before ever hitting the device. Why?
Given I'm next-to-useless at Python, that last one is pretty important as it allows me to install every Python debugging and testing tool known to man on my virtual Pi while my end-product hardware stays comparatively pristine.
First, you'll need a few prerequisites:
qemu-system-arm
)You can find all the packages for your chosen platform on the QEMU website and is installable across Linux, macOS and even Windows.
Simply download the copy of Raspbian you need from the official site. Personally, I used the 2018-11-13
version of Raspbian Lite, since I don't need an X server.
Since the standard RPi kernel can't be booted out of the box on QEMU, we'll need a custom kernel. We'll cover that in the next step.
First, you'll need to download a kernel. Personally, I (along with most people) use the dhruvvyas90/qemu-rpi-kernel repository's kernels. Either clone the repo:
$ git clone https://github.com/dhruvvyas90/qemu-rpi-kernel.git
or download a kernel directly:
$ wget https://github.com/dhruvvyas90/qemu-rpi-kernel/raw/master/kernel-qemu-4.4.34-jessie
or download a snapshot from my website directly:
$ wget https://www.mobibrw.com/wp-content/uploads/2019/03/qemu-rpi-kernel.zip
For the rest of these steps I'm going to be using the kernel-qemu-4.4.34-jessie
kernel, so update the commands as needed if you're using another version.
This step is optional, but recommended
When you download the Raspbian image it will be in the raw format, a plain disk image (generally with an .img
extension).
A more efficient option is to convert this to a qcow2 image first. Use the qemu-img
command to do this:
$ qemu-img convert -f raw -O qcow2 2018-11-13-raspbian-stretch-lite.img raspbian-stretch-lite.qcow
Now we can also easily expand the image:
$ qemu-img resize raspbian-stretch-lite.qcow +6G
You can check on your image using the
qemu-img info
command
You've got everything you need now: a kernel, a disk image, and QEMU!
Actually running the virtual Pi is done using the qemu-system-arm
command and it can be quite complicated. The full command is this (don't worry it's explained below):
$ sudo qemu-system-arm \ -kernel ./kernel-qemu-4.4.34-jessie \ -append "root=/dev/sda2 panic=1 rootfstype=ext4 rw" \ -hda raspbian-stretch-lite.qcow \ -cpu arm1176 -m 256 \ -M versatilepb \ -no-reboot \ -serial stdio \ -net nic -net user
如果需要指定上网方式的话,执行如下命令:
$ sudo qemu-system-arm \ -kernel ./kernel-qemu-4.4.34-jessie \ -append "root=/dev/sda2 panic=1 rootfstype=ext4 rw" \ -hda raspbian-stretch-lite.qcow \ -cpu arm1176 -m 256 \ -M versatilepb \ -no-reboot \ -serial stdio \ -net nic -net user \ -net tap,ifname=vnet0,script=no,downscript=no
So, in order:
sudo qemu-system-arm
: you need to run QEMU as root
-kernel
: this is the path to the QEMU kernel we downloaded in the previous step-append
: here we are providing the boot args direct to the kernel, telling it where to find it's root filesytem and what type it is-hda
: here we're attaching the disk image itself-cpu
/-m
: this sets the CPU type and RAM limit to match a Raspberry Pi-M
: this sets the machine we are emulating. versatilepb
is the 'ARM Versatile/PB' machine-no-reboot
: just tells QEMU to exit rather than rebooting the machine-serial
: redirects the machine's virtual serial port to our host's stdio-net
: this configures the machine's network stack to attach a NIC, use the user-mode stack, connect the host's vnet0
TAP device to the new NIC and don't use config scripts.If it's all gone well, you should now have a QEMU window pop up and you should see the familiar Raspberry Pi boot screen show up.
Now, go get yourself a drink to celebrate, because it might take a little while.
Now, that's all well and good, but without networking, we may as well be back on hardware. When the machine started, it will have attached a NIC and connected it to the host's vnet0
TAP device. If we configure that device with an IP and add it to a bridge on our host, you should be able to reliably access it like any other virtual machine.
This will vary by host, but on my Fedora machine, for example, there is a pre-configured virbr0
bridge interface with an address in the 192.168.122.0/24
space:
virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 00:00:00:1e:77:43 txqueuelen 1000 (Ethernet)
I'm going to use this bridge and just pick a static address for my Pi: 192.168.122.200
Reusing an existing (pre-configured) bridge means you won't need to sort your own routing
NOTE: I'm assuming Stretch here.
Open /etc/dhcpcd.conf
in your new virtual Pi and configure the eth0
interface with a static address in your bridge's subnet. For example, for my bridge:
# in /etc/dhcpcd.conf interface eth0 static ip_address=192.168.122.200/24 static routers=192.168.122.254 static domain_name_servers=8.8.8.8 8.8.4.4
You may need to reboot for this to take effect
Finally, add the machine's TAP interface to your chosen bridge with the brctl
command:
$ sudo brctl addif virbr0 vnet0
Now, on your host, you should be able to ping 192.168.122.200
(or your Pi's address).
Now, in your machine, you can run sudo raspi-config
and enable the SSH server (in the "Interfacing Options" menu at time of writing).
Make sure you change the password from default while you're there!
Finally, on your host, run ssh-copy-id pi@192.168.122.200
to copy your SSH key into the Pi's pi
user and you can now SSH directly into your Pi without a password prompt.
This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.
command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm
exp([ -1000, -100, 100, 1000]) = [ 0, 0, 2.4061436e+38, 2.4061436e+38] exp([ -nan, inf, -inf, nan]) = [ nan, 2.4061436e+38, 0, nan] log([ 0, -10, 1e+30, 1.0005271e-42]) = [ -nan, -nan, 69.077553, -nan] log([ -nan, inf, -inf, nan]) = [ 89.128304, 88.722839, -nan, 89.128304] sin([ -nan, inf, -inf, nan]) = [ nan, nan, -nan, nan] cos([ -nan, inf, -inf, nan]) = [ nan, nan, nan, nan] sin([ -1e+30, -100000, 1e+30, 100000]) = [ inf, -0.035749275, -inf, 0.035749275] cos([ -1e+30, -100000, 1e+30, 100000]) = [ nan, -0.9993608, nan, -0.9993608] benching sinf .. -> 2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer benching cosf .. -> 1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer benching expf .. -> 1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer benching logf .. -> 1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer benching cephes_sinf .. -> 2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer benching cephes_cosf .. -> 2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer benching cephes_expf .. -> 1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer benching cephes_logf .. -> 1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer benching sin_ps .. -> 5.8 millions of vector evaluations/second -> 43 cycles/value on a 1000MHz computer benching cos_ps .. -> 5.9 millions of vector evaluations/second -> 42 cycles/value on a 1000MHz computer benching sincos_ps .. -> 6.0 millions of vector evaluations/second -> 41 cycles/value on a 1000MHz computer benching exp_ps .. -> 5.6 millions of vector evaluations/second -> 44 cycles/value on a 1000MHz computer benching log_ps .. -> 5.3 millions of vector evaluations/second -> 47 cycles/value on a 1000MHz computer
So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.
Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.
command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)
benching sinf .. -> 1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer benching cosf .. -> 1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer benching sincos (x87) .. -> 1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer benching expf .. -> 1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer benching logf .. -> 1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer benching cephes_sinf .. -> 1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer benching cephes_cosf .. -> 1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer benching cephes_expf .. -> 0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer benching cephes_logf .. -> 0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer benching sin_ps .. -> 9.2 millions of vector evaluations/second -> 43 cycles/value on a 1600MHz computer benching cos_ps .. -> 9.5 millions of vector evaluations/second -> 42 cycles/value on a 1600MHz computer benching sincos_ps .. -> 8.8 millions of vector evaluations/second -> 45 cycles/value on a 1600MHz computer benching exp_ps .. -> 9.8 millions of vector evaluations/second -> 41 cycles/value on a 1600MHz computer benching log_ps .. -> 8.6 millions of vector evaluations/second -> 46 cycles/value on a 1600MHz computer
git 的钩子放在 git 项目下的 .git/hooks
目录。
$ ls -l .git/hooks
如果我们所有项目都需要一个通用的钩子,那么我们需要在所有的项目中都放置钩子文件。挨个复制显然不是一个可行的方案。
我们可用模板目录来解决这个问题。
在 git init
或者 git clone
时,如果指定有模板目录,会使用拷贝模板目录下的文件到 .git/
目录下。
$ git init --template "path-to-template-dir" $ git clone --template "path-to-template-dir"
好了,那么解决方案就是:把统一的钩子文件放到模板目录,然后在 git init
/ git clone
时候指定模板目录?
不行,这样还是太麻烦了。
模板目录固定在一个地方,我们可以把模板目录写入全局配置。
# 定义模板目录,模板目录下的钩子目录 $ template_dir=$HOME/.git-templates $ tempalte_hooks_dir=$template_dir/hooks # 拷贝全局钩子文件目录到模板目录下 $ mkdir -p $template_dir $ cp -rf $root_dir/sample/git-template/hooks/ $template_dir/ # 修改模板目录下钩子目录权限 $ chmod -R a+x $tempalte_hooks_dir # 设置全局模板目录 $ git config --global init.templatedir $template_dir
在 git init
或者 git clone
时,会自动拷贝钩子文件到项目的钩子目录。 已有项目,执行 git init
重新初始化项目即可。
代码提交时候,自动格式化的参考代码如下:
#!/bin/bash # https://stackoverflow.com/questions/12881975/git-pre-commit-hook-failing-in-github-for-mac-works-on-command-line export PATH=$PATH:/usr/local/bin:/usr/local/sbin STYLE=$(git config --get hooks.clangformat.style) if [ -n "${STYLE}" ] ; then STYLEARG="-style=${STYLE}" else # try source root dir STYLE=$(git rev-parse --show-toplevel)/.clang-format if [ -n "${STYLE}" ] ; then STYLEARG="-style=file" else STYLEARG="" fi fi format_file() { file="${1}" clang-format -i ${STYLEARG} ${1} git add ${1} } case "${1}" in --about ) echo "Runs clang-format on source files" ;; * ) for file in `git diff-index --cached --name-only HEAD` ; do format_file "${file}" done ;; esac
使用的时候,简单的拷贝到.git/hooks
目录下,并重新命名为pre-commit
。然后执行:
# macOS Mojave (10.14.3) $ brew install clang-format $ chmod +x .git/hooks/pre-commit
这样,每次提交代码的时候,都会自动格式化代码了。