Windows 7 系统电脑安装RNDIS驱动

本教程小编和大家分享 Windows 7 系统电脑安装RNDIS驱动的正确方法,RNDIS驱动是什么? Windows 7 系统驱动RNDIS是远端网络驱动接口协议,设备通过USB方式同主机连接,模拟网络连接以便用于下载和调试工作。但是很多 Windows 7 系统用户安装RNDIS的设备时失败,遇到无法安装的问题,所以小编给大家介绍 Windows 7 系统电脑安装RNDIS驱动的正确方法。

继续阅读Windows 7 系统电脑安装RNDIS驱动

粗略判断Shader每条代码的成本

GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory

radeon x800 has it anyways
EDIT:

Quote:

ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions

继续阅读粗略判断Shader每条代码的成本

Android Gradle Plugin源码解析之externalNativeBuild

在Android Studio 2.2开始的Android Gradle Plugin版本中,Google集成了对cmake的完美支持,而原先的ndkBuild的方式支持也变得更加良好。这篇文章就来说说Android Gradle Plugin与交叉编译之间的一些事,即externalNativeBuild相关的task,主要是解读一下gradle构建系统相关的源码。

继续阅读Android Gradle Plugin源码解析之externalNativeBuild

Overriding a default option(…) value in CMake from a parent CMakeLists.txt

CMakeLists.txt

option(BUILD_FOR_ANDROID "Build For Android" OFF)

if(SYSTEM.Android AND NOT BUILD_FOR_ANDROID)
    set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${NATIVE_LIBRARY_OUTPUT}/${ANDROID_ABI})
endif()

CMakeLists.txt

set(BUILD_FOR_ANDROID ON)
add_subdirectory(${CHILD_ROOT_DIR}/ ${CMAKE_CURRENT_SOURCE_DIR}/build)

执行如下命令的时候:

/Users/xxxx/Library/Android/sdk/cmake/3.6.4111459/bin/cmake --trace-expand \
-H/Users/xxxx/Source/example/demo/android/app \
-B/Users/xxxx/Source/example/demo/android/app/.externalNativeBuild/cmake/debug/arm64-v8a \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-21 \
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/Users/xxxx/Source/example/demo/android/app/build/intermediates/cmake/debug/obj/arm64-v8a \
-DCMAKE_BUILD_TYPE=Debug \
-DANDROID_NDK=/Users/xxxx/Library/Android/android-ndk-r16b \
-DCMAKE_TOOLCHAIN_FILE=/Users/xxxx/Library/Android/android-ndk-r16b/build/cmake/android.toolchain.cmake \
-DCMAKE_MAKE_PROGRAM=/Users/xxxx/Library/Android/sdk/cmake/3.6.4111459/bin/ninja \
-G"Android Gradle - Ninja" \
-DANDROID_ARM_NEON=TRUE \
-DANDROID_TOOLCHAIN=gcc \
-DANDROID_PLATFORM=android-21 \
-DANDROID_STL=gnustl_shared

会观察到生成的配置文件中 BUILD_FOR_ANDROID 不一定能生效。

需要如下配置才行:
CMakeLists.txt

set(BUILD_FOR_ANDROID ON CACHE BOOL "" FORCE)
add_subdirectory(${CHILD_ROOT_DIR}/ ${CMAKE_CURRENT_SOURCE_DIR}/build)

参考链接


Use ccache with CMake for faster compilation

C and C++ compilers aren’t the fastest pieces of software out there and there’s no lack of programmer jokes based on tedium of waiting for their work to complete.

There are ways to fix the pain though - one of them is ccache. CCache improves compilation times by caching previously built object files in private cache and reusing them when you’re recompiling same objects with same parameters. Obviously it will not help if you’re compiling the code for the first time and it also won’t help if you often change compilation flags. Most C/C++ development however involves recompiling same object files with the same parameters and ccache helps alot.

For illustration, here’s the comparison of first and subsequent compilation times of a largish C++ project:

Original run with empty cache:

$ make -j9
...
real    0m56.684s
user    5m31.996s
sys     0m41.638s

Recompilation with warm cache:

$ make -j9
...
real    0m5.929s
user    0m11.896s
sys     0m8.722s

Installation

CCache is available in repositories on pretty much all distributions. On OS X use homebrew:

$ brew install ccache

and on Debian-based distros use apt:

$ apt-get install ccache

CMake configuration

After ccache is installed, you need to tell CMake to use it as a wrapper for the compiler. Add these lines to your CMakeLists.txt:

# Configure CCache if available
find_program(CCACHE_FOUND ccache)
if(CCACHE_FOUND)
        set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE ccache)
        set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK ccache)
endif(CCACHE_FOUND)

Rerun cmake and next make should use ccache for wrapper.

Usage with Android NDK

CCache can even be used on Android NDK - you just need to export NDK_CCACHE environment variable with path to ccache binary. ndk-build script will automatically use it. E.g.

$ export NDK_CCACHE=/usr/local/bin/ccache

$ ndk-build -j9

(Note that on Debian/Ubuntu the path will probably be /usr/bin/ccache)

CCache statistics

To see if ccache is really working, you can use ccache -s command, which will display ccache statistics:

cache directory                     /Users/jernej/.ccache
primary config                      /Users/jernej/.ccache/ccache.conf
secondary config      (readonly)    /usr/local/Cellar/ccache/3.2.2/etc/ccache.conf
cache hit (direct)                 77826
cache hit (preprocessed)           17603
cache miss                         46999
called for link                       18
compile failed                        45
ccache internal error                  1
preprocessor error                    62
unsupported source language          204
files in cache                     48189
cache size                           1.2 GB
max cache size                      20.0 GB

On second and all subsequent compilations the “cache hit” values should increase and thus show that ccache is working.

参考链接


Use ccache with CMake for faster compilation

macOS Mojave(10.14.3)系统QEMU虚拟机运行Clockwork OS

# 安装编译工具macOS Mojave(10.14.3)
$ brew install arm-linux-gnueabihf-binutils

# bison on macOS is too old
$ brew install bison
$ export PATH="/usr/local/opt/bison/bin:$PATH" 

# 安装 crosstool-ng 构建GCC编译环境
$ brew install crosstool-ng

$ export CT_NG_VER=$(brew list --versions crosstool-ng | tr ' ' '\n' | tail -1)

$ export CT_NG_VER_SHORT=${CT_NG_VER%_*}

# 安装的 crosstool-ng 的脚本文件缺少执行权限,导致无法执行,我们需要手工增加执行权限
$ chmod +x "$(brew --cellar crosstool-ng)/${CT_NG_VER}/lib/crosstool-ng-${CT_NG_VER_SHORT}/scripts/crosstool-NG.sh"

# 默认情况下,macOS的文件系统不区分大小写,我们需要手工创建一个区分大小写的分区
$ hdiutil create -volname "ClockworkOS" -type SPARSE -fs 'Case-sensitive Journaled HFS+' -size 30g ClockworkOS.dmg

$ hdiutil attach ClockworkOS.dmg.sparseimage -mountpoint /Volumes/ClockworkOS

$ cd /Volumes/ClockworkOS

$ mkdir arm-cortexa9_neon-linux

$ cd arm-cortexa9_neon-linux

$ ct-ng list-samples

# 变更x-tools存储目录
$ export HOME=/Volumes/ClockworkOS

$ ct-ng arm-cortexa9_neon-linux-gnueabihf

# 修复BUG Build failed in step 'Installing m4 for build' 
$ brew uninstall --ignore-dependencies binutils
$ brew install binutils

# 安装依赖工具
$ brew install automake

$ brew uninstall --ignore-dependencies gawk
$ brew install gawk

# 目前编译gettext-0.19.8.1的时候写死依赖automake-1.15,但是最新的已经是automake-1.16,我们通过手工编译安装automake-1.15规避这个问题
$ wget http://ftp.gnu.org/gnu/automake/automake-1.15.tar.gz

# 也可从本站下载 wget http://www.mobibrw.com/wp-content/uploads/2019/03/automake-1.15.tar.gz

$ tar xvf automake-1.15.tar.gz

$ cd automake-1.15

$ bash configure

$ make && make install

$ cd ..

# 修改文件打开数量限制,修正错误 “extra-module.mk:11: *** Too many open files.”
$ ulimit -n 2048

# 'scm_new_port_table_entry' was not declared in this scope
$ sed -i "" "s/CT_GDB_CROSS_EXTRA_CONFIG_ARRAY=.*/CT_GDB_CROSS_EXTRA_CONFIG_ARRAY=\"--with-guile=no\"/g" .config

$ export PATH="/usr/local/bin:$PATH"

$ ct-ng build -j8

编译 u-boot

$ cd /Volumes/ClockworkOS

# 下载u-boot代码
$ git clone https://github.com/qemu/u-boot.git

$ cd u-boot

$ git checkout v2019.01 -b v2019.01

$ export PATH="/Volumes/ClockworkOS/x-tools/arm-cortexa9_neon-linux-gnueabihf/bin:$PATH"

$ export CROSS_COMPILE=arm-cortexa9_neon-linux-gnueabihf-

$ make clean

# R16又名A33 ,R16-J 代表包含Jazelle DBX 
$ make vexpress_ca9x4_defconfig

# fix Undefined symbols for architecture x86_64: "_PyArg_ParseTuple"
$ export HOSTLDFLAGS="-lpython -dynamclib"

$ brew install gnu-sed

# fix ./tools/../lib/bch.c:66:10: fatal error: 'endian.h' file not found
$ gsed -i "s/#include <sys\/endian.h>/#include <sys\/endian.h>\n#elif defined(__APPLE__)\n#include <machine\/endian.h>\n#include <libkern\/OSByteOrder.h>/g" lib/bch.c

$ gsed -i "s/#define cpu_to_be32 htobe32/#if defined(__APPLE__)\n#define cpu_to_be32 OSSwapHostToBigInt32\n#else\n#define cpu_to_be32 htobe32\n#endif/g" lib/bch.c

$ gsed -i "s/#if \!defined(__DragonFly__) \&\& \!defined(__FreeBSD__)/#if \!defined(__DragonFly__) \&\& \!defined(__FreeBSD__) \&\& \!defined(__APPLE__)/g" lib/bch.c

# 无视最后的失败提示,只要u-boot这个文件生成即可
$ make ARCH=arm -j8

编译 Linux 内核

$ cd /Volumes/ClockworkOS

$ brew install aria2

$ aria2c -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.14.2.tar.xz

# 也可本站下载 wget http://www.mobibrw.com/wp-content/uploads/2019/03/linux-4.14.2.tar.xz

$ tar xvf linux-4.14.2.tar.xz

$ cd linux-4.14.2

$ export PATH="/Volumes/ClockworkOS/x-tools/arm-cortexa9_neon-linux-gnueabihf/bin:$PATH"

# for mkimage
$ export PATH="/Volumes/ClockworkOS/u-boot/tools:$PATH"
# 或者 brew install u-boot-tools

# elf.h
$ brew install libelf

$ echo "
#include <libelf/libelf.h>
#define R_386_NONE 0
#define R_386_32 1
#define R_386_PC32 2
#define R_ARM_NONE 0
#define R_ARM_PC24 1
#define R_ARM_ABS32 2
#define R_MIPS_NONE 0
#define R_MIPS_16 1
#define R_MIPS_32 2
#define R_MIPS_REL32 3
#define R_MIPS_26 4
#define R_MIPS_HI16 5
#define R_MIPS_LO16 6
#define EF_ARM_EABIMASK 0xFF000000
#define EF_ARM_EABI_VERSION(flags) ((flags) & EF_ARM_EABIMASK)" > /usr/local/include/elf.h

# xargs: illegal option -- r
$ brew install findutils

$ export PATH="/usr/local/opt/findutils/libexec/gnubin:$PATH"

# stat: illegal option -- c
$ ln -s /usr/local/bin/gstat /usr/local/bin/stat

$ export PATH="/usr/local/bin:$PATH"

$ export CROSS_COMPILE=arm-cortexa9_neon-linux-gnueabihf-

$ export ARCH=arm

$ make vexpress_defconfig

$ make -j8

$ mkimage -A arm -O linux -T kernel -C none -a 0x40008000 -e 0x40008000 -n "Linux kernel" -d arch/arm/boot/zImage uImage
$ cd /Volumes/ClockworkOS

$ brew install aria2

# 官方给出的这个地址下不到,只能用镜像地址 http://106.185.33.196/clockworkos_v0.3.img.bz2
$ aria2c -c http://clockworkpi.k15.net/clockworkos_v0.3.img.bz2

$ rm -rf clockworkos_v0.3.img

$ bzip2 -d -k -vvvv clockworkos_v0.3.img.bz2

# 替换镜像中的内核文件
$ hdiutil attach clockworkos_v0.3.img -mountpoint /Volumes/clockworkos_v0.3

$ echo y | cp -i /Volumes/ClockworkOS/linux-4.14.2/uImage /Volumes/clockworkos_v0.3/uImage

$ hdiutil detach /Volumes/clockworkos_v0.3

$ brew install qemu

$ qemu-img convert -f raw -O qcow2 clockworkos_v0.3.img clockworkos_v0.3.qcow2

手工编译 qemu

$ cd /Volumes/ClockworkOS

$ git clone https://github.com/qemu/qemu.git

$ cd qemu

# 从 qemu v2.1.0-rc1 开始,内存需要被映射到0x60000000开始的地址,更低的地址被映射为只读闪存,我们需要取消这种映射行为,否则执行的时候会报告错误
$ sed -i "" "s/\[VE_NORFLASHALIAS\] = 0/\[VE_NORFLASHALIAS\] = -1/g" hw/arm/vexpress.c

$ bash configure

$ make -j8

$ cd ..

# list supported machine `qemu-system-arm -machine help`

$ /Volumes/ClockworkOS/qemu/arm-softmmu/qemu-system-arm -M vexpress-a9 -m 1024M -kernel /Volumes/ClockworkOS/u-boot/u-boot -serial mon:stdio -nographic -sd clockworkos_v0.3.qcow2 -net nic,model=lan9118 -net user

可惜到这一步了,还是没办法成功运行系统。

参考链接


Using QEMU to emulate a Raspberry Pi

If you're building software for the Raspberry Pi (like I sometimes do), it can be a pain to have to constantly keep Pi hardware around and spotting Pi-specific problems can be difficult until too late.

One option (and the one I most like) is to emulate a Raspberry Pi locally before ever hitting the device. Why?

  • Works anywhere you can install QEMU
  • No hardware setup needed (no more scratching around for a power supply)
  • Faster feedback cycle compared to hardware
  • I can use Pi software (like Raspbian) in a virtual context
  • I can prep my "virtual Pi" with all the tools I need regardless of my physical Pi's use case

Given I'm next-to-useless at Python, that last one is pretty important as it allows me to install every Python debugging and testing tool known to man on my virtual Pi while my end-product hardware stays comparatively pristine.

Getting started

First, you'll need a few prerequisites:

QEMU (more specifically qemu-system-arm)

You can find all the packages for your chosen platform on the QEMU website and is installable across Linux, macOS and even Windows.

Raspbian

Simply download the copy of Raspbian you need from the official site. Personally, I used the 2018-11-13 version of Raspbian Lite, since I don't need an X server.

Kernel

Since the standard RPi kernel can't be booted out of the box on QEMU, we'll need a custom kernel. We'll cover that in the next step.

Preparing

Get your kernel

First, you'll need to download a kernel. Personally, I (along with most people) use the dhruvvyas90/qemu-rpi-kernel repository's kernels. Either clone the repo:

$ git clone https://github.com/dhruvvyas90/qemu-rpi-kernel.git

or download a kernel directly:

$ wget https://github.com/dhruvvyas90/qemu-rpi-kernel/raw/master/kernel-qemu-4.4.34-jessie

or download a snapshot from my website directly:

$ wget http://www.mobibrw.com/wp-content/uploads/2019/03/qemu-rpi-kernel.zip

For the rest of these steps I'm going to be using the kernel-qemu-4.4.34-jessiekernel, so update the commands as needed if you're using another version.

Filesystem image

This step is optional, but recommended

When you download the Raspbian image it will be in the raw format, a plain disk image (generally with an .img extension).

A more efficient option is to convert this to a qcow2 image first. Use the qemu-imgcommand to do this:

$ qemu-img convert -f raw -O qcow2 2018-11-13-raspbian-stretch-lite.img raspbian-stretch-lite.qcow

Now we can also easily expand the image:

$ qemu-img resize raspbian-stretch-lite.qcow +6G

You can check on your image using the qemu-img info command

Starting

You've got everything you need now: a kernel, a disk image, and QEMU!

Actually running the virtual Pi is done using the qemu-system-arm command and it can be quite complicated. The full command is this (don't worry it's explained below):

$ sudo qemu-system-arm \
-kernel ./kernel-qemu-4.4.34-jessie \
-append "root=/dev/sda2 panic=1 rootfstype=ext4 rw" \
-hda raspbian-stretch-lite.qcow \
-cpu arm1176 -m 256 \
-M versatilepb \
-no-reboot \
-serial stdio \
-net nic -net user

如果需要指定上网方式的话,执行如下命令:

$ sudo qemu-system-arm \
-kernel ./kernel-qemu-4.4.34-jessie \
-append "root=/dev/sda2 panic=1 rootfstype=ext4 rw" \
-hda raspbian-stretch-lite.qcow \
-cpu arm1176 -m 256 \
-M versatilepb \
-no-reboot \
-serial stdio \
-net nic -net user \
-net tap,ifname=vnet0,script=no,downscript=no

So, in order:

  • sudo qemu-system-arm: you need to run QEMU as root
  • -kernel: this is the path to the QEMU kernel we downloaded in the previous step
  • -append: here we are providing the boot args direct to the kernel, telling it where to find it's root filesytem and what type it is
  • -hda: here we're attaching the disk image itself
  • -cpu/-m: this sets the CPU type and RAM limit to match a Raspberry Pi
  • -M: this sets the machine we are emulating. versatilepb is the 'ARM Versatile/PB' machine
  • -no-reboot: just tells QEMU to exit rather than rebooting the machine
  • -serial: redirects the machine's virtual serial port to our host's stdio
  • -net: this configures the machine's network stack to attach a NIC, use the user-mode stack, connect the host's vnet0 TAP device to the new NIC and don't use config scripts.

If it's all gone well, you should now have a QEMU window pop up and you should see the familiar Raspberry Pi boot screen show up.

Now, go get yourself a drink to celebrate, because it might take a little while.

Networking

Now, that's all well and good, but without networking, we may as well be back on hardware. When the machine started, it will have attached a NIC and connected it to the host's vnet0 TAP device. If we configure that device with an IP and add it to a bridge on our host, you should be able to reliably access it like any other virtual machine.

(on host) Find a bridge and address

This will vary by host, but on my Fedora machine, for example, there is a pre-configured virbr0 bridge interface with an address in the 192.168.122.0/24 space:

virbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 00:00:00:1e:77:43  txqueuelen 1000  (Ethernet)

I'm going to use this bridge and just pick a static address for my Pi: 192.168.122.200

Reusing an existing (pre-configured) bridge means you won't need to sort your own routing

(in guest) Configure interface

NOTE: I'm assuming Stretch here.

Open /etc/dhcpcd.conf in your new virtual Pi and configure the eth0 interface with a static address in your bridge's subnet. For example, for my bridge:

# in /etc/dhcpcd.conf
interface eth0
static ip_address=192.168.122.200/24
static routers=192.168.122.254
static domain_name_servers=8.8.8.8 8.8.4.4

You may need to reboot for this to take effect

(in host) Add TAP to bridge

Finally, add the machine's TAP interface to your chosen bridge with the brctl command:

$ sudo brctl addif virbr0 vnet0

Now, on your host, you should be able to ping 192.168.122.200 (or your Pi's address).

Set up SSH

Now, in your machine, you can run sudo raspi-config and enable the SSH server (in the "Interfacing Options" menu at time of writing).

Make sure you change the password from default while you're there!

Finally, on your host, run ssh-copy-id pi@192.168.122.200 to copy your SSH key into the Pi's pi user and you can now SSH directly into your Pi without a password prompt.

参考链接


Using QEMU to emulate a Raspberry Pi

Simple ARM NEON optimized sin, cos, log and exp

This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.

The code

The functions below are licensed under the zlib license, so you can do basically what you want with them.

  • neon_mathfun.h source code for sin_ps, cos_ps, sincos_ps, exp_ps, log_ps, as straight C.
  • neon_mathfun_test.c Validation+Bench program for those function. Do not forget to run it once.

Performance

Results on a pandaboard with a 1GHz dual-core ARM Cortex A9 (OMAP4), using gcc 4.6.1

command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm

exp([        -1000,          -100,           100,          1000]) = [            0,             0, 2.4061436e+38, 2.4061436e+38]
exp([         -nan,           inf,          -inf,           nan]) = [          nan, 2.4061436e+38,             0,           nan]
log([            0,           -10,         1e+30, 1.0005271e-42]) = [         -nan,          -nan,     69.077553,          -nan]
log([         -nan,           inf,          -inf,           nan]) = [    89.128304,     88.722839,          -nan,     89.128304]
sin([         -nan,           inf,          -inf,           nan]) = [          nan,           nan,          -nan,           nan]
cos([         -nan,           inf,          -inf,           nan]) = [          nan,           nan,           nan,           nan]
sin([       -1e+30,       -100000,         1e+30,        100000]) = [          inf,  -0.035749275,          -inf,   0.035749275]
cos([       -1e+30,       -100000,         1e+30,        100000]) = [          nan,    -0.9993608,           nan,    -0.9993608]
benching                 sinf .. ->    2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer
benching                 cosf .. ->    1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer
benching                 expf .. ->    1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer
benching                 logf .. ->    1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer
benching          cephes_sinf .. ->    2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer
benching          cephes_cosf .. ->    2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer
benching          cephes_expf .. ->    1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer
benching          cephes_logf .. ->    1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer
benching               sin_ps .. ->    5.8 millions of vector evaluations/second ->  43 cycles/value on a 1000MHz computer
benching               cos_ps .. ->    5.9 millions of vector evaluations/second ->  42 cycles/value on a 1000MHz computer
benching            sincos_ps .. ->    6.0 millions of vector evaluations/second ->  41 cycles/value on a 1000MHz computer
benching               exp_ps .. ->    5.6 millions of vector evaluations/second ->  44 cycles/value on a 1000MHz computer
benching               log_ps .. ->    5.3 millions of vector evaluations/second ->  47 cycles/value on a 1000MHz computer

So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.

Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.

Comparison with an Intel Atom

For comparison purposes, here is the performance of the SSE version on a single core Intel Atom N270 running at 1.66GHz

command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)

benching                 sinf .. ->    1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer
benching                 cosf .. ->    1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer
benching         sincos (x87) .. ->    1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer
benching                 expf .. ->    1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer
benching                 logf .. ->    1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer
benching          cephes_sinf .. ->    1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer
benching          cephes_cosf .. ->    1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer
benching          cephes_expf .. ->    0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer
benching          cephes_logf .. ->    0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer
benching               sin_ps .. ->    9.2 millions of vector evaluations/second ->  43 cycles/value on a 1600MHz computer
benching               cos_ps .. ->    9.5 millions of vector evaluations/second ->  42 cycles/value on a 1600MHz computer
benching            sincos_ps .. ->    8.8 millions of vector evaluations/second ->  45 cycles/value on a 1600MHz computer
benching               exp_ps .. ->    9.8 millions of vector evaluations/second ->  41 cycles/value on a 1600MHz computer
benching               log_ps .. ->    8.6 millions of vector evaluations/second ->  46 cycles/value on a 1600MHz computer

The number of cycles is quite similar -- but the atom has a higher clock..

Last modified: 2011/05/29

参考链接


Simple ARM NEON optimized sin, cos, log and exp

使用Git Hooks(Pre-Commit)实现代码提交的时候自动格式化代码

钩子文件在项目目录下

git 的钩子放在 git 项目下的 .git/hooks 目录。

$ ls -l .git/hooks

如果我们所有项目都需要一个通用的钩子,那么我们需要在所有的项目中都放置钩子文件。挨个复制显然不是一个可行的方案。

模板目录

我们可用模板目录来解决这个问题。

在 git init 或者 git clone时,如果指定有模板目录,会使用拷贝模板目录下的文件到 .git/ 目录下。

$ git init --template "path-to-template-dir"

$ git clone --template "path-to-template-dir"

好了,那么解决方案就是:把统一的钩子文件放到模板目录,然后在 git init / git clone 时候指定模板目录?

不行,这样还是太麻烦了。

模板目录写入全局配置

模板目录固定在一个地方,我们可以把模板目录写入全局配置。

# 定义模板目录,模板目录下的钩子目录
$ template_dir=$HOME/.git-templates

$ tempalte_hooks_dir=$template_dir/hooks

# 拷贝全局钩子文件目录到模板目录下
$ mkdir -p $template_dir

$ cp -rf $root_dir/sample/git-template/hooks/ $template_dir/

# 修改模板目录下钩子目录权限
$ chmod -R a+x $tempalte_hooks_dir

# 设置全局模板目录
$ git config --global init.templatedir $template_dir

在 git init 或者 git clone 时,会自动拷贝钩子文件到项目的钩子目录。 已有项目,执行 git init 重新初始化项目即可。

代码提交时候,自动格式化的参考代码如下:

#!/bin/bash

# https://stackoverflow.com/questions/12881975/git-pre-commit-hook-failing-in-github-for-mac-works-on-command-line
export PATH=$PATH:/usr/local/bin:/usr/local/sbin

STYLE=$(git config --get hooks.clangformat.style)
if [ -n "${STYLE}" ] ; then
  STYLEARG="-style=${STYLE}"
else
  # try source root dir
  STYLE=$(git rev-parse --show-toplevel)/.clang-format
  if [ -n "${STYLE}" ] ; then
    STYLEARG="-style=file"
  else
    STYLEARG=""
  fi
fi

format_file() {
  file="${1}"
  clang-format -i ${STYLEARG} ${1}
  git add ${1}
}

case "${1}" in
  --about )
    echo "Runs clang-format on source files"
    ;;
  * )
    for file in `git diff-index --cached --name-only HEAD` ; do
      format_file "${file}"
    done
    ;;
esac

使用的时候,简单的拷贝到.git/hooks 目录下,并重新命名为pre-commit。然后执行:

# macOS Mojave (10.14.3)
$ brew install clang-format

$ chmod +x .git/hooks/pre-commit

这样,每次提交代码的时候,都会自动格式化代码了。

参考链接


Git Hooks

Git 钩子

和其它版本控制系统一样,Git 能在特定的重要动作发生时触发自定义脚本。 有两组这样的钩子:客户端的和服务器端的。 客户端钩子由诸如提交和合并这样的操作所调用,而服务器端钩子作用于诸如接收被推送的提交这样的联网操作。 你可以随心所欲地运用这些钩子。

安装一个钩子

钩子都被存储在 Git 目录下的 hooks 子目录中。 也即绝大部分项目中的 .git/hooks 。 当你用 git init 初始化一个新版本库时,Git 默认会在这个目录中放置一些示例脚本。这些脚本除了本身可以被调用外,它们还透露了被触发时所传入的参数。 所有的示例都是 shell 脚本,其中一些还混杂了 Perl 代码,不过,任何正确命名的可执行脚本都可以正常使用 —— 你可以用 Ruby 或 Python,或其它语言编写它们。 这些示例的名字都是以 .sample 结尾,如果你想启用它们,得先移除这个后缀。

把一个正确命名且可执行的文件放入 Git 目录下的 hooks 子目录中,即可激活该钩子脚本。 这样一来,它就能被 Git 调用。 接下来,我们会讲解常用的钩子脚本类型。

客户端钩子

客户端钩子分为很多种。 下面把它们分为:提交工作流钩子、电子邮件工作流钩子和其它钩子。

Note
需要注意的是,克隆某个版本库时,它的客户端钩子 并不 随同复制。 如果需要靠这些脚本来强制维持某种策略,建议你在服务器端实现这一功能。(请参照 使用强制策略的一个例子 中的例子。)
提交工作流钩子

前四个钩子涉及提交的过程。

pre-commit 钩子在键入提交信息前运行。 它用于检查即将提交的快照,例如,检查是否有所遗漏,确保测试运行,以及核查代码。 如果该钩子以非零值退出,Git 将放弃此次提交,不过你可以用 git commit --no-verify 来绕过这个环节。 你可以利用该钩子,来检查代码风格是否一致(运行类似 lint 的程序)、尾随空白字符是否存在(自带的钩子就是这么做的),或新方法的文档是否适当。

prepare-commit-msg 钩子在启动提交信息编辑器之前,默认信息被创建之后运行。 它允许你编辑提交者所看到的默认信息。 该钩子接收一些选项:存有当前提交信息的文件的路径、提交类型和修补提交的提交的 SHA-1 校验。 它对一般的提交来说并没有什么用;然而对那些会自动产生默认信息的提交,如提交信息模板、合并提交、压缩提交和修订提交等非常实用。 你可以结合提交模板来使用它,动态地插入信息。

commit-msg 钩子接收一个参数,此参数即上文提到的,存有当前提交信息的临时文件的路径。 如果该钩子脚本以非零值退出,Git 将放弃提交,因此,可以用来在提交通过前验证项目状态或提交信息。 在本章的最后一节,我们将展示如何使用该钩子来核对提交信息是否遵循指定的模板。

post-commit 钩子在整个提交过程完成后运行。 它不接收任何参数,但你可以很容易地通过运行 git log -1 HEAD 来获得最后一次的提交信息。 该钩子一般用于通知之类的事情。

电子邮件工作流钩子

你可以给电子邮件工作流设置三个客户端钩子。 它们都是由 git am 命令调用的,因此如果你没有在你的工作流中用到这个命令,可以跳到下一节。 如果你需要通过电子邮件接收由 git format-patch 产生的补丁,这些钩子也许用得上。

第一个运行的钩子是 applypatch-msg 。 它接收单个参数:包含请求合并信息的临时文件的名字。 如果脚本返回非零值,Git 将放弃该补丁。 你可以用该脚本来确保提交信息符合格式,或直接用脚本修正格式错误。

下一个在 git am 运行期间被调用的是 pre-applypatch 。 有些难以理解的是,它正好运行于应用补丁 之后,产生提交之前,所以你可以用它在提交前检查快照。 你可以用这个脚本运行测试或检查工作区。 如果有什么遗漏,或测试未能通过,脚本会以非零值退出,中断 git am 的运行,这样补丁就不会被提交。

post-applypatch 运行于提交产生之后,是在 git am 运行期间最后被调用的钩子。 你可以用它把结果通知给一个小组或所拉取的补丁的作者。 但你没办法用它停止打补丁的过程。

其它客户端钩子

pre-rebase 钩子运行于变基之前,以非零值退出可以中止变基的过程。 你可以使用这个钩子来禁止对已经推送的提交变基。 Git 自带的 pre-rebase 钩子示例就是这么做的,不过它所做的一些假设可能与你的工作流程不匹配。

post-rewrite 钩子被那些会替换提交记录的命令调用,比如 git commit --amend 和 git rebase(不过不包括 git filter-branch)。 它唯一的参数是触发重写的命令名,同时从标准输入中接受一系列重写的提交记录。 这个钩子的用途很大程度上跟 post-checkout 和 post-merge 差不多。

在 git checkout 成功运行后,post-checkout 钩子会被调用。你可以根据你的项目环境用它调整你的工作目录。 其中包括放入大的二进制文件、自动生成文档或进行其他类似这样的操作。

在 git merge 成功运行后,post-merge 钩子会被调用。 你可以用它恢复 Git 无法跟踪的工作区数据,比如权限数据。 这个钩子也可以用来验证某些在 Git 控制之外的文件是否存在,这样你就能在工作区改变时,把这些文件复制进来。

pre-push 钩子会在 git push 运行期间, 更新了远程引用但尚未传送对象时被调用。 它接受远程分支的名字和位置作为参数,同时从标准输入中读取一系列待更新的引用。 你可以在推送开始之前,用它验证对引用的更新操作(一个非零的退出码将终止推送过程)。

Git 的一些日常操作在运行时,偶尔会调用 git gc --auto 进行垃圾回收。 pre-auto-gc 钩子会在垃圾回收开始之前被调用,可以用它来提醒你现在要回收垃圾了,或者依情形判断是否要中断回收。

服务器端钩子

除了客户端钩子,作为系统管理员,你还可以使用若干服务器端的钩子对项目强制执行各种类型的策略。 这些钩子脚本在推送到服务器之前和之后运行。 推送到服务器前运行的钩子可以在任何时候以非零值退出,拒绝推送并给客户端返回错误消息,还可以依你所想设置足够复杂的推送策略。

pre-receive

处理来自客户端的推送操作时,最先被调用的脚本是 pre-receive。 它从标准输入获取一系列被推送的引用。如果它以非零值退出,所有的推送内容都不会被接受。 你可以用这个钩子阻止对引用进行非快进(non-fast-forward)的更新,或者对该推送所修改的所有引用和文件进行访问控制。

update

update 脚本和 pre-receive 脚本十分类似,不同之处在于它会为每一个准备更新的分支各运行一次。 假如推送者同时向多个分支推送内容,pre-receive 只运行一次,相比之下 update 则会为每一个被推送的分支各运行一次。 它不会从标准输入读取内容,而是接受三个参数:引用的名字(分支),推送前的引用指向的内容的 SHA-1 值,以及用户准备推送的内容的 SHA-1 值。 如果 update 脚本以非零值退出,只有相应的那一个引用会被拒绝;其余的依然会被更新。

post-receive

post-receive 挂钩在整个过程完结以后运行,可以用来更新其他系统服务或者通知用户。 它接受与 pre-receive 相同的标准输入数据。 它的用途包括给某个邮件列表发信,通知持续集成(continous integration)的服务器,或者更新问题追踪系统(ticket-tracking system) —— 甚至可以通过分析提交信息来决定某个问题(ticket)是否应该被开启,修改或者关闭。 该脚本无法终止推送进程,不过客户端在它结束运行之前将保持连接状态,所以如果你想做其他操作需谨慎使用它,因为它将耗费你很长的一段时间。

参考链接