Simple ARM NEON optimized sin, cos, log and exp

This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.

The code

The functions below are licensed under the zlib license, so you can do basically what you want with them.

  • neon_mathfun.h source code for sin_ps, cos_ps, sincos_ps, exp_ps, log_ps, as straight C.
  • neon_mathfun_test.c Validation+Bench program for those function. Do not forget to run it once.

Performance

Results on a pandaboard with a 1GHz dual-core ARM Cortex A9 (OMAP4), using gcc 4.6.1

command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm

So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.

Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.

Comparison with an Intel Atom

For comparison purposes, here is the performance of the SSE version on a single core Intel Atom N270 running at 1.66GHz

command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)

The number of cycles is quite similar -- but the atom has a higher clock..

Last modified: 2011/05/29

参考链接


Simple ARM NEON optimized sin, cos, log and exp

Matlab调用C程序

有时需要用Matlab调试某些C语言开发的函数库,需要在Matlab里面查看执行效果。

整个的参考例子如下:

尤其注意上面例子里我们如何隐藏一个C里申请的指针并传递给Matlab

Matlab的调用例子如下:

参考链接


泰勒公式

泰勒公式是将一个在x=x0处具有n阶导数的函数f(x)利用关于(x-x0)n次多项式来逼近函数的方法。

若函数f(x)在包含x0的某个闭区间[a,b]上具有n阶导数,且在开区间(a,b)上具有(n+1)阶导数,则对闭区间[a,b]上任意一点x,成立下式:

其中,表示f(x)n阶导数,等号后的多项式称为函数f(x)x0处的泰勒展开式,剩余的Rn(x)是泰勒公式的余项,是(x-x0)n的高阶无穷小。

这里需要注意的是,我们规定0的阶乘 " 0!=1 "

参考链接


常用数学符号希腊字母表

 

希腊字母表
序号
大写
小写
英文注音
国际音标注音
中文读音
意义
1
Α
α
alpha
a:lf
阿尔法
角度;系数
2
Β
β
beta
bet
贝塔
磁通系数;角度;系数
3
Γ
γ
gamma
ga:m
伽马
电导系数(小写)
4
Δ
δ
delta
delt
德尔塔
变动;密度;屈光度
5
Ε
ε
epsilon
epsilon
艾普西龙
对数之基数
6
Ζ
ζ
zeta
zat
截塔
系数;方位角;阻抗;相对粘度;原子序数
7
Η
η
eta
eit
艾塔
磁滞系数;效率(小写)
8
Θ
θ
thet
θit
西塔
温度;相位角
9
Ι
ι
iot
aiot
约塔
微小,一点儿
10
Κ
κ
kappa
kap
卡帕
介质常数
11
Λ
λ
lambda
lambd
兰布达
波长(小写);体积
12
Μ
μ
mu
mju
磁导系数微(千分之一)放大因数(小写)
13
Ν
ν
nu
nju
磁阻系数
14
Ξ
ξ
xi
ksi
克西
数学上的随机变量
15
Ο
ο
omicron
omikron
奥密克戎
16
Π
π
pi
pai
圆周率=圆周÷直径=3.14159 26535 89793
17
Ρ
ρ
rho
rou
电阻系数(小写)
18
Σ
σ
sigma
sigma
西格马
总和(大写),表面密度;跨导(小写)
19
Τ
τ
tau
tau
时间常数
20
Υ
υ
upsilon
jupsilon
伊普西龙
位移
21
Φ
φ
phi
fai
佛爱
磁通;角
22
Χ
χ
chi
phai
西
23
Ψ
ψ
psi
psai
普西
角速;介质电通量(静电力线);角
24
Ω
ω
omega
o`miga
欧米伽
欧姆(大写);角速(小写);角