Compatibility Between SPIR-V Image Formats And Vulkan Formats

SPIR-V Image Format Compatible Vulkan Format

Rgba32f

VK_FORMAT_R32G32B32A32_SFLOAT

Rgba16f

VK_FORMAT_R16G16B16A16_SFLOAT

R32f

VK_FORMAT_R32_SFLOAT

Rgba8

VK_FORMAT_R8G8B8A8_UNORM

Rgba8Snorm

VK_FORMAT_R8G8B8A8_SNORM

Rg32f

VK_FORMAT_R32G32_SFLOAT

Rg16f

VK_FORMAT_R16G16_SFLOAT

R11fG11fB10f

VK_FORMAT_B10G11R11_UFLOAT_PACK32

R16f

VK_FORMAT_R16_SFLOAT

Rgba16

VK_FORMAT_R16G16B16A16_UNORM

Rgb10A2

VK_FORMAT_A2B10G10R10_UNORM_PACK32

Rg16

VK_FORMAT_R16G16_UNORM

Rg8

VK_FORMAT_R8G8_UNORM

R16

VK_FORMAT_R16_UNORM

R8

VK_FORMAT_R8_UNORM

Rgba16Snorm

VK_FORMAT_R16G16B16A16_SNORM

Rg16Snorm

VK_FORMAT_R16G16_SNORM

Rg8Snorm

VK_FORMAT_R8G8_SNORM

R16Snorm

VK_FORMAT_R16_SNORM

R8Snorm

VK_FORMAT_R8_SNORM

Rgba32i

VK_FORMAT_R32G32B32A32_SINT

Rgba16i

VK_FORMAT_R16G16B16A16_SINT

Rgba8i

VK_FORMAT_R8G8B8A8_SINT

R32i

VK_FORMAT_R32_SINT

Rg32i

VK_FORMAT_R32G32_SINT

Rg16i

VK_FORMAT_R16G16_SINT

Rg8i

VK_FORMAT_R8G8_SINT

R16i

VK_FORMAT_R16_SINT

R8i

VK_FORMAT_R8_SINT

Rgba32ui

VK_FORMAT_R32G32B32A32_UINT

Rgba16ui

VK_FORMAT_R16G16B16A16_UINT

Rgba8ui

VK_FORMAT_R8G8B8A8_UINT

R32ui

VK_FORMAT_R32_UINT

Rgb10a2ui

VK_FORMAT_A2B10G10R10_UINT_PACK32

Rg32ui

VK_FORMAT_R32G32_UINT

Rg16ui

VK_FORMAT_R16G16_UINT

Rg8ui

VK_FORMAT_R8G8_UINT

R16ui

VK_FORMAT_R16_UINT

R8ui

VK_FORMAT_R8_UINT

参考链接


Compatibility Between SPIR-V Image Formats And Vulkan Formats

粗略判断Shader每条代码的成本

GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory

radeon x800 has it anyways
EDIT:

Quote:

ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions

继续阅读粗略判断Shader每条代码的成本

macOS Mojave(10.14.3)编译Vulkan例子项目


编译 assimp的时候出现如下错误:

这个原因是由于代码的BUG导致的,修改~/Vulkan/xcode/assimp/assimp-mac/code/D3MFImporter.cpp230代码即可。

其他编译错误无视即可,只要能编译出 libassimp.3.3.1.dylib 即可。

修复方式如下图:
继续阅读macOS Mojave(10.14.3)编译Vulkan例子项目

macOS Mojave(10.14.3)编译使用MoltenVK运行Vulkan应用

MoltenVK是一个软件库,允许Vulkan应用程序在ApplemacOSiOS操作系统上运行在Metal之上。它是Vulkan Portability Initiative发布的第一个软件组件,该项目是在没有本地Vulkan驱动程序的平台上运行Vulkan子集的项目。

下载并编译 MoltenVK 的代码:

下载并编译 vuh 的代码:

执行测试:

参考链接


Vulkan Memory Management

Vulkan offers another key difference to OpenGL with respect to memory allocation. When it comes to managing memory allocations as well as assigning it to individual resources, the OpenGL driver does most of the work for the developer. This allows applications to be developed, tested and deployed very quickly. In Vulkan however, the programmer takes responsibility meaning that many operations that OpenGL orchestrates heuristically can be orchestrated based on an absolute knowledge of the resource lifecycle.

继续阅读Vulkan Memory Management

Vulkan直接使用CPU内存指针

Depending on the target platform, some recently published EXT extensions allow sharing memory between different physical devices.

VK_EXT_external_memory_host enables importing host allocations or host-mapped foreign device memory using a host pointer as the handle.

VK_EXT_external_memory_dma_buf enables importing dma_buf handles on Linux which can possibly come from another physical device.

The spec now also has a table where it's listed which external memory handle types require a matching physical device and which don't.

Additionally, I'd also like to draw your attention to additional features which enable execution control across multiple physical devices. At least on Linux (and possibly other POSIX based systems) semaphores and fences can be shared across physical devices if the FENCE_FD and SYNC_FD handle types are used. These are part of the KHR external semaphore/fence extensions.

扩展 VK_EXT_external_memory_host2018404被合并到Android主分支,后续的版本可能可以使用这个插件了,这个使得显卡设备可以直接使用CPU创建的内存指针,减少内存的拷贝操作。

参考链接


Vulkan Device Memory

This post serves as a guide on how to best use the various Memory Heaps and Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.

  • GPU Bulk Data
    Place GPU-side allocations in DEVICE_LOCAL without HOST_VISIBLE. Make sure to allocate the highest priority resources first like Render Targets and resources which get accessed more often. Once DEVICE_LOCAL fills up and allocations fail, have the lower priority allocations fall back to CPU-side memory if required via HOST_VISIBLE with HOST_COHERENT but without HOST_CACHED. When doing in-game reallocations (say for display resolution changes), make sure to fully free all allocations involved before attempting to make any new allocations. This can minimize the possibility that an allocation can fail to fit in the GPU-side heap.
  • CPU-to-GPU Data Flow
    For relatively small total allocation size (under 256 MB) the DEVICE_LOCAL with HOST_VISIBLE is the perfect Memory Type for CPU upload to GPU cases: the CPU can directly write into GPU memory which the GPU can then access without reading across the PCIe bus. This is great for upload of constant data, etc.
  • GPU-to-CPU Data Flow
    Use HOST_VISIBLE with HOST_COHERENT and HOST_CACHED. This is the only Memory Type which supports cached reads by the CPU. Great for cases like recording screen-captures, feeding back Hierarchical Z-Buffer occlusion tests, etc.

Pooling Allocations

EDIT: Great reminder from Axel Gneiting (leading Vulkan implementation in DOOM® at id Software), make sure to pool a group of resources, like textures and buffers, into a single memory allocation. On Windows® 7 for example, Vulkan memory allocations map to WDDM Allocations (the same lists seen in GPUView), and there is a relatively high cost associated for a WDDM Allocation as command buffers flow through the WDDM based driver stack. Having 256 MB per DEVICE_LOCAL allocation can be a good target, takes only 16 allocations to fill 4 GB.

Hidden Paging

When an application starts over-subscribing GPU-side memory, DEVICE_LOCAL memory allocations will fail. It is also possible that later during application execution, another application in the system increases its usage of GPU-side memory, resulting in dynamic over-subscribing of GPU-side memory. This case can result in an OS (for instance Windows® 7) to silently migrate or page GPU-side allocations to/from CPU-side as it time-slices execution of each application on the GPU. This can result in visible “hitching”. There is currently no method to directly query if the OS is migrating allocations in Vulkan. One possible workaround is for the app to detect hitching by looking at time-stamps, and then actively attempting to reduce DEVICE_LOCAL memory consumption when hitching is detected. For example, the application could manually move around resources to fully empty DEVICE_LOCAL allocations which can then be freed.

EDIT: Targeting Low-Memory GPUs

When targeting a memory surplus, using DEVICE_LOCAL+HOST_VISIBLE for CPU-write cases can bypass the need to schedule an extra copy. However in memory constrained situations it is much better to use DEVICE_LOCAL+HOST_VISIBLE as an extension to the DEVICE_LOCAL heap and use it for GPU Resources like Textures and Buffers. CPU-write cases can switch to HOST_VISIBLE+COHERENT. The number one priority for performance is keeping the high bandwidth access resources in GPU-side memory.

Memory Heap and  Memory Type – Technical Details

Driver Device Memory Heaps and Memory Types can be inspected using the Vulkan Hardware Database. For Windows AMD drivers, below is a breakdown of the characteristics and best usage models for all the Memory Types. Heap and Memory Type numbering is not guaranteed by the Vulkan Spec, so make sure to work from the Property Flags directly. Also note memory sizes reported in Vulkan represent the maximum amount which is shared across applications and driver.

  • Heap 0
    • VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
    • Represents memory on the GPU device which can not be mapped into Host system memory
    • Using 256 MB per vkAllocateMemory() allocation is a good starting point for collections of buffers and images
    • Suggest using separate allocations for large allocations which might need to be resized (freed and reallocated) at run-time
    • Memory Type 0
      • VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
      • Full speed read/write/atomic by GPU
      • No ability to use vkMapMemory() to map into Host system address space
      • Use for standard GPU-side data
  • Heap 1
    • VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
    • Represents memory on the GPU device which can be mapped into Host system memory
    • Limited on Windows to 256 MB
      • Best to allocate at most 64 MB per vkAllocateMemory() allocation
      • Fall back to smaller allocations if necessary
    • Memory Type 1
      • VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
      • VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
      • VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
      • Full speed read/write/atomic by GPU
      • Ability to use vkMapMemory() to map into Host system address space
      • CPU writes are write-combined and write directly into GPU memory
        • Best to write full aligned cacheline sized chunks
      • CPU reads are uncached
        • Best to use Memory Type 3 instead for GPU write and CPU read cases
      • Use for dynamic buffer data to avoid an extra Host to Device copy
      • Use for a fall-back when Heap 0 runs out of space before resorting to Heap 2
  • Heap 2
    • Represents memory on the Host system which can be accessed by the GPU
    • Suggest using similar allocation size strategy as Heap 0
    • Ability to use vkMapMemory()
    • GPU reads for textures and buffers are cached in GPU L2
      • GPU L2 misses read across the PCIe bus to Host system memory
      • Higher latency and lower throughput on an L2 miss
    • GPU reads for index buffers are cached in GPU L2 in Tonga and later GPUs like FuryX
    • Memory Type 2
      • VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
      • VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
      • CPU writes are write-combined
      • CPU reads are uncached
      • Use for staging for upload to GPU device
      • Can use as a fall-back when GPU device runs out of memory in Heap 0 and Heap 1
    • Memory Type 3
      • VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
      • VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
      • VK_MEMORY_PROPERTY_HOST_CACHED_BIT
      • CPU reads and writes go through CPU cache hierarchy
      • GPU reads snoop CPU cache
      • Use for staging for download from GPU device

Choosing the correct Memory Heap and Memory Type is a critical task in optimization. A GPU like Radeon™ Fury X for instance has 512 GB/s of DEVICE_LOCAL bandwidth (sum of any ratio of read and write) but the PCIe bus supports at most 16 GB/s read and at most 16 GB/s write for a sum of 32 GB/s in both directions.

Timothy Lottes is a member of the Developer Technology Group at AMD. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

参考链接


Vulkan Device Memory

常见GPU的浮点性能

Game Consoles GPU

Consoles Name GPU Name Fab Clock GFlops
NDS ARM946E-S (CPU) 180/130nm 67 MHz 0.6
N3DS PICA 200 45nm 200 MHz 4.8
PSP R4000 x 2 90nm 333 MHz 2.6
PS VITA SGX543 MP4+ 45nm 222 MHz 28.4
Dreamcast PowerVR2 CLX2 250nm 100 MHz 2.1
XBOX XGPU (NV2A) 150nm 233 MHz 20
XBOX 360 ATI R500 Xenos 90/65/45nm 500 MHz 240
XBOX ONE
XBOX ONE S
AMD Radeon GCN
(12CU 768 Cores)
28/16nm 853 MHz
914 MHz
1311.5
1405.2
XBOX ONE X AMD Radeon GCN
(40CU 2560 Cores)
16nm 1172 MHz 6000
PlayStation 2 GS 180/150/90nm 147 MHz 6.2 (EE+GS)
PlayStation 3 RSX (NVIDIA G70) 90/65/45nm 550 MHz 228.8
PlayStation 4
PlayStation 4 Slim
AMD Radeon GCN
(18CU 1152 Cores)
28/16nm 800 MHz 1840
PlayStation 4 Pro AMD Radeon GCN
(36CU 2304 Cores)
16nm 911 MHz 4200
N64 SGI RCP 350nm 62.5 MHz 0.1~0.2
GameCube Flipper 180nm 162 MHz 9.4
Wii ATI HollyWood 90nm 243 MHz 12
Wii U ATI RV770 40nm 550 MHz 176
Switch Tegra X1 (Undocked) 20nm 307.2 MHz 157.2
Tegra X1 (Docked) 20nm 768 MHz 393.2
Ouya Tegra 3
(Geforce ULP x 12)
40nm 520 Mhz 12.5
SHIELD portable Tegra 4
(Geforce ULP x 72)
28nm 672 MHz 96.8
SHIELD TV Tegra X1
(Maxwell Cores x 256 (2xSMM))
20nm 1000 MHz 512

Imagination PowerVR
GPU Name Chip Clock GFlops
SGX530 OMAP 3530 110 MHz 0.88
DM3730 200 MHz 1.6
--- 300 MHz 2.4
SGX531 MT6513
MT6573
MT6575M
281 MHz 2.25
R-Car E1 400 MHz 3.2
SGX531 Ultra MT6515
MT6575
MT6517
MT6517T
MT6577
MT6577T
MT8317
MT8317T
MT8377
522 MHz 4.2
SGX535 S5PC100
Apple A4
200 MHz 1.6
Apple A4 (iPad) 250 MHz 2.0
--- 300 MHz 2.4
SGX540 Jz4780 ??? MHz ???
Exynos 3110 200 MHz 3.2
OMAP 4430 307 MHz 4.9
OMAP 4460 384 MHz 6.1
Atom Z2420
R-Car E2
R-Car M1A、M1S
400 MHz 6.4
ATM7021
ATM7021A
ATM7029B
500 MHz 8.0
RK3168 600 MHz 9.6
SGX543 --- 200 MHz 6.4
SGX543 MP2 Apple A5 200 MHz 12.8
Apple A5 (iPad2) 250 MHz 16.0
MT5327 400 MHz 25.6
R-Car H1 520 MHz 33.28
SGX543 MP3 Apple A6 266 MHz 25.5
SGX543 MP4 Apple A5X 250 MHz 32.0
SGX544 MT6589M
MT8117
MT8121
156 MHz 5
MT6589
MT8389
286 MHz 9.2
MT8125 300 MHz 9.6
MT6589T
MT8389T
357 MHz 11.4
OMAP 4470 384 MHz 12.3
Broadcom M320
Broadcom M340
??? ???
ATM7039 450 MHz 14.4
SGX544 MP2 Atom Z2520 300 MHZ 19.2
Allwinner A31
Allwinner A31s
350 MHz 22.4
Atom Z2560 400 MHz 25.6
R-Car M2 520 MHz 33.28
Atom Z2580 533 MHz 34.1
Allwinner A83T
Allwinner H8
700 MHz 44.8
SGX544 MP3 Exynos 5410 533 MHz 51.1
SGX545 --- 300 MHz 4.8
Atom Z2460
Atom Z2760
533 MHz 8.5
SGX554 --- 300 MHz 19.2
SGX554 MP2 --- 300 MHz 38.4
SGX554 MP4 Apple A6X 266 MHz 68.1
G6020
(0.25 Clusters)
--- 300 MHz 4.8
G6050
G6060
(0.5 Clusters)
--- 300 MHz 9.6
G6100
G6110
(1 Clusters)
RK3368 600 MHz 38.4
G6200
(2 Clusters)
MT6595M
MT8135
450 MHz 57.6
MT6795M 550 MHz 70.4
MT6595
MT6595T
600 MHz 76.8
MT6793
Helio X10 (MT6795、MT6795T)
700 MHz 89.6
G6230
(2 Clusters)
Allwinner A80
Allwinner A80T
533 MHz 68.0
ATM9009 600 MHz 76.8
GX6240
(2 Clusters)
--- 650 MHz 83.2
GX6250
(2 Clusters)
MT8173
MT8176
600 MHz 76.8
MT8693 700 MHz 89.6
--- 750 MHz 96
G6400
(4 Clusters)
--- 300 MHz 76.8
Atom Z3460
Atom Z3480
533 MHz 136.4
R-Car H2 600 MHz 153.6
G6430
(4 Clusters)
--- 300 MHz 76.8
Apple A7
Apple A7 (iPad Air)
450 MHz 115.2
Atom Z3530 457 MHz 117
Atom Z3560
Atom Z3580
533 MHz 136.4
Atom Z3570
Atom Z3590
640 MHz 163.8
GX6450
(4 Clusters)
Apple A8 450 MHz 115.2
--- 600 MHz 153.6
G6630
(6 Clusters)
--- 450 MHz 172.8
--- 600 MHz 230.4
GX6650
(6 Clusters)
R-Car H3 600 MHz 230.4
GX6850
(8 Clusters)
Apple A8X 450 MHz 230.4
--- 600 MHz 307.2
GE7400
(0.5 Clusters)
--- 600 MHz 19.2
GE7800
(1 Clusters)
--- 600 MHz 38.4
GT7200
(2 Clusters)
--- 650 MHz 83.2
SC9861G-IA ??? MHz ???
GT7400
(4 Clusters)
--- 650 MHz 166.4
GT7400 Plus
(4 Clusters)
Helio X30 800 MHz 204.8
GT7600
(6 Clusters)
Apple A9 450 MHz 172.8
GT7600 Plus
(6 Clusters)
Apple A10 Fusion 650 MHz? 249.6?
GT7800
(8 Clusters)
--- 650 MHz 332.8
GT7800+
(12 Clusters)
Apple A9X 450 MHz 345.6
GT7800?
(12 Clusters)
Apple A10X Fusion 650 MHz? 499.2?
GT7900
(16 Clusters)
--- 650 MHz 665.6
--- 800 MHz 819.2
GT8525
(2 Clusters)
--- 1000 MHz 192

Qualcomm Adreno
GPU Name Chip Fab Clock GFlops
Adreno 130 MSM7x00
MSM7x00A
MSM7x01
MSM7x01A
??nm 133 MHz 1.2
Adreno 200 Snapdragon S1

  • MSM7225
  • MSM7625
  • MSM7227
  • MSM7627
  • QSD8250
  • QSD8650
65nm 133 MHz 2.1
Snapdragon S1

  • MSM7225A
  • MSM7625A
45nm 200 MHz 3.2
Snapdragon S1

  • MSM7227A
  • MSM7627A
45nm 245 MHz 3.92
Adreno 203 Snapdragon S4 Play

  • MSM8225
  • MSM8625
45nm 245 MHz 7.84
Snapdragon 200

  • MSM8225Q
  • MSM8625Q
45nm 294 MHz 9.4
Adreno 205 Snapdragon S2

  • MSM7230
  • MSM7630
  • MSM8255
  • MSM8655
  • APQ8055
45nm 266 MHz 8.5
Adreno 220 Snapdragon S3

  • MSM8260
  • MSM8660
  • APQ8060
45nm 266MHz 17
Adreno 225 Snapdragon S4 Plus

  • APQ8060A
  • MSM8260A
28nm 200 MHz 12.8
Snapdragon S4 Plus (MSM8660A) 28nm 300 MHz 19.2
Snapdragon S4 Plus (MSM8960) 28nm 400 MHz 25.6
Adreno 302 Snapdragon 200

  • MSM8210
  • MSM8610
  • MSM8212
  • MSM8612
28nm 400 MHz 19.2
Adreno 304 Snapdragon 208
Snapdragon 210
Snapdragon 212
Snapdragon Wear 2100
28nm 400 MHz 19.2
Adreno 305 Snapdragon S4 Plus

  • MSM8227
  • MSM8627

Snapdragon 400

  • MSM8226
  • MSM8626
  • MSM8230
  • MSM8630
  • MSM8930
  • MSM8030AB
  • MSM8230AB
  • MSM8630AB
  • MSM8930AB
  • MSM8228
  • MSM8628
  • MSM8928
  • APQ8026
  • APQ8030
28nm 400~450 MHz 19.2~21.6
Adreno 306 Snapdragon 410 (MSM8916)
Snapdragon 412 (MSM8916v2)
28nm 400 MHz 21.6
Adreno 308 Snapdragon 425 (MSM8917)
Snapdragon 427
28nm 500 MHz 27
Adreno 320
(64 ALU)
Snapdragon S4 Pro

  • MSM8960T
  • APQ8064
  • APQ8064 1AA

Snapdragon S4 Prime (MPQ8064)

28nm 400 MHz 57.6
Adreno 320
(96 ALU)
Snapdragon 600 (APQ8064T) 28nm 400 MHz 86.4
Snapdragon 600 (APQ8064AB) 28nm 450 MHz 97.2
Adreno 330 Snapdragon 800

  • APQ8074
  • MSM8974AA
28nm 450 MHz 129.8
Snapdragon 801

  • MSM8274AB
  • MSM8974AB
28nm 550 MHz 158.4
Snapdragon 801 (MSM8974AC) 28nm 578 MHz 166.5
Adreno 405 Snapdragon 415 (MSM8929)
Snapdragon 615 (MSM8939)
Snapdragon 616 (MSM8939v2)
Snapdragon 617 (MSM8952)
28nm 550 MHz 59.4
Adreno 418 Snapdragon 808 (MSM8992) 20nm 600 MHz 172.8
Adreno 420 Snapdragon 805 (APQ8084) 28nm 500~600 MHz 144~172.8
Adreno 430 Snapdragon 810

  • APQ8094
  • MSM8994
20nm 500~650 MHz 324~420
Adreno 505 Snapdragon 430 (MSM8937)
Snapdragon 435
28nm 450 MHz 48.6
Adreno 506 Snapdragon 450 14nm 600 MHz 120
Snapdragon 625
Snapdragon 626
14nm 650 MHz 130
Adreno 508 Snapdragon 630 14nm 800 MHz? 160?
Adreno 510 Snapdragon 650 (MSM8956)
Snapdragon 652 (MSM8976)
Snapdragon 653 (MSM8976PRO)
28nm 600 MHz 180
Adreno 512 Snapdragon 660 (MSM8976 Plus) 14nm 800 MHz? 240?
Adreno 530 Snapdragon 820 (MSM8996) 14nm 510~624 MHz 407.4~498.5
Snapdragon 821 (MSM8996PRO) 14nm 650 MHz 519.2
Adreno 540 Snapdragon 835 (MSM8998) 10nm 710 MHz 567
Adreno 608 --- 10nm ??? MHz ???
Adreno 615 --- 10nm ??? MHz ???
Adreno 630 Snapdragon 845 10nm ??? MHz ???
More Qualcomm Adreno Information in wiki

Nvidia Tegra
GPU Name Chip Fab Clock GFlops
Geforce ULP x 8 Tegra 2
(AP20H)
40nm 300 MHz 4.8
Tegra 2
(T20)
40nm 333 MHz 5.6
Tegra 2
(AP25、T25)
40nm 400 MHz 6.7
Geforce ULP x 12 Tegra 3
(T30L、AP33)
40nm 416 MHz 10
Tegra 3 40nm 450 MHz 10.8
Tegra 3
(T30、T33、AP37)
40nm 520 MHz 12.5
Geforce ULP x 60 Tegra 4i 28nm 660 MHz 79.2
Geforce ULP x 72 Tegra 4 28nm 672 MHz 96.8
Kepler Cores x 192
(1xSMX)
Tegra K1
Tegra K1 (Denver)
28nm 850 MHz 326.4
Maxwell Cores x 256
(2xSMM)
Tegra X1 20nm 850 MHz
1000 MHz
435.2
512
Pascal Cores x 256
(2xSMM)
Tegra Parker 16nm 1465 MHz 750
Volta Cores x 512 Tegra Xavier 12nm ???? MHz ????

Arm Mali
GPU Name Chip Clock GFlops
Mali-400 --- 200 MHz 1.8
AML8726-M3 250 MHz 2.25
ST-E U8500 275 MHz 2.48
WM8850
WM8950
SC6815A
SC7710
SC8810
SC9820
Allwinner A10
Allwinner A10s
Allwinner A13
300 MHz 2.7
RK292X 330 MHz 2.97
SC7715
SC7727S
ST-E U8520
Telechips TCC892x-i
Rk2926
RK2928
MT6290
MT8638T
MT6572M
400 MHz 3.6
MT6570
MT6572
MT8312
MT8321
XMM6321
S5P4418
500 MHz 4.5
--- 533 MHz 4.8
Mali-400 MP2 LC1810
LC1811
300 MHz 5.4
WM8880
WM8980
SC6825
SC8825
Allwinner A20
Allwinner A23
Allwinner A33
350 MHz 6.3
SC5735A
SC7730A
SC7730S
SC7731G
SC8830
SC8830A
SC8831G
SC9830A
SC9830I
SC9836
MT6582M
AML7366-M6C
AML8726-MX
AML8726-MXS
AML8726-MXL
NS115
LC1813
LC1913
RTD1195
Exynos 3250
400 MHz 7.2
SC8831G 480 MHz 8.64
MT6580
MT6582
MT8382
RK3026
RK3036
500 MHz 9.0
RK3126
RK3128
RK3228
RK3229
Allwinner H3
Atom x3-C3130
600 MHz 10.8
Mali-400 MP4 RK3066
Exynos 4210
266 MHz 9.6
Exynos 4212
SC7735S
SC8735S
SC8835S
Hi3716
Hi3718
Hi3719
Rockchip PX2
AML7366-M6L
400 MHz 14.4
Exynos 4412 440 MHz 15.84
Exynos 3470 450 MHz 16.2
Exynos 4412 v2
RK3188
S5P6818
533 MHz 19.2
Mali-450 WM8860 300 MHz 4.5
Mali-450 MP2 AML7366-M6D 400 MHz 12
Amlogic M803
Amlogic M805
Amlogic M805T
Amlogic M806
Amlogic S805
500 MHz 15
Mali-450 MP3 Amlogic S905
Amlogic S905X
750 MHz 33.75
Mali-450 MP4 MT8685 416 MHz 24.8
Kirin 620
Mstar 6A908
Mstar 6A918
500 MHz 29.8
Kirin 910 533 MHz 32
MT6588
MT6592M
MT8127
MT6591
MT6591H
Atom x3-C3230RK
Hi3796M V100
Hi3798M V100
600 MHz 35.8
MT6592
MT8392
Kirin 910T
700 MHz 41.8
Mali-450 MP6 Amlogic M801
Amlogic M802
Amlogic S801
Amlogic S802
Amlogic S802H
Amlogic S812
Amlogic T866
Hi3796
Hi3798
600 MHz 53.8
Mali-450 MP8 --- 600 MHz 71.7
Mali-T604 --- 533 MHz 17
Mali-T604 MP2 --- 533 MHz 34
Mali-T604 MP4 Exynos 5250 533 MHz 68.2
Mali-T622 --- 533 MHz 8.5
Mali-T624 --- 533 MHz 17
Mali-T624 MP4 Kirin 920(K3V3)
Kirin 925
Kirin 928
Exynos 5260
600 MHz 76.8
Mali-T628 --- 533 MHz 17
Mali-T628 MP2 LC1860
LC1860C
LC1960
600 MHz 38.4
Mali-T628 MP3 --- 533 MHz 51.2
Mali-T628 MP4 Kirin 930
Kirin 935
680 MHz 87
Mali-T628 MP6 Exynos 5420
Exynos 5422
533 MHz 102.4
Exynos 5430 600 MHz 115.2
Mali-T720 --- 450 MHz 7.65
Exynos 7270
Exynos 7570
??? MHz ???
Mali-T720 MP2 MT6735P
MT8735P
400 MHz 13.6
MT6735M
MT8735M
500 MHz 17
MT8163V/B 520 MHz 17.68
MT6737
MT8735D
MT8735B
550 MHz 18.7
Atom x3-C3440
Exynos 3475
MT6735
MT6737T
MT8163V/A
600 MHz 20.4
Exynos 7580 668 MHz 22.7
Mali-T720 MP3 MT6753
MT6753T
MT8783
700 MHz 35.7
Mali-T720 MP6 LC1980 ??? ???
Mali-T720 MP8 --- 600 MHz 81.6
Mali-T720 MP? Hi3798C V200 ??? 103
Mali-T760 --- 600 MHz 20.4
Mali-T760 MP2 MT6732
MT6732M
MT8732
500 MHz 34
MT6752
MT6752M
MT8752
700 MHz 47.6
Mali-T760 MP4 Mstar 6A928 552 MHz 75
RK3288
RK3288-C
600 MHz 81.6
Mali-T760 MP6 Exynos 5433
(Exynos 7410)
700 MHz 142.8
Mali-T760 MP8 Exynos 7420 772 MHz 210
Mali-T820 --- 600 MHz 10.2
SC9850 ??? MHz ???
Mali-T820 MP3 Amlogic S912
Mstar 6A938
600 MHz 30.6
Mali-T830 --- 600 MHz 20.4
Mali-T830 MP2 Amlogic S966
Amlogic T966
Amlogic T968
650 MHz 44.2
Kirin 650
Kirin 655
Kirin 658
900 MHz 61.2
Exynos 7870 700 MHz 47.6
Mali-T830 MP3 Exynos 7880 950 MHz 71.4
Mali-T860 --- 700 MHz 23.8
Mali-T860 MP2 MT6738 350 MHz 23.8
MT6750
MT6738T
520 MHz 35.3
Helio P10 (MT6755M) 550 MHz 37.4
MT6750T 650 MHz 44.2
Helio P10 (MT6755)
MT8785
700 MHz 47.6
MT6739
Helio P15 (MT6755T)
800 MHz 54.4
Mali-T860 MP3 Exynos 7650 700 MHz 71.4
Mali-T860 MP4 RK3399 600 MHz 81.6
Pinecone S1 (V670) 922 MHz 125.4
Mali-T880 --- 850 MHz 28.9
Mali-T880 MP?? LG Nuclun 2 ??? MHz ???
Mali-T880 MP2 Helio P20 (MT6757) 900 MHz 61.2
Helio P25 (MT6757CD) 1000 MHz 68
Mali-T880 MP4 SC9860GV ??? MHz ???
Helio X20 (MT6797)
Helio X23 (MT6797D)
780 MHz 106
Helio X25 (MT6797T) 850 MHz 115.6
Helio X27 (MT6797X) 875 MHz 119
Kirin 950 (Boost)
Kirin 955 (Boost)
900 MHz 122.4
Mali-T880 MP10 Exynos 8890 (Lite) 650 MHz 221
Mali-T880 MP12 Exynos 8890 650 MHz 265.2
Mali-G51 --- ??? MHz ???
Mali-G71 --- 850 MHz 28.9
Mali-G71 MP2 MT6763
Helio P23 (MT6763T)
770 MHz 52.36
Helio P30 (MT6758) 950 MHz 64.6
Mali-G71 MP8 Kirin 960 1037 MHz 282
Mali-G71 MP12 Pinecone S2? (V970) 900 MHz? 367.2?
Mali-G71 MP18 Exynos 8895 (Lite) 546 MHz 334
Mali-G71 MP20 Exynos 8895 546 MHz 371.2
Mali-G72 --- 850 MHz 28.9
Mali-G72 MP3 Exynos 9610 ??? MHz ???
Mali-G72 MP12 Kirin 970 850 MHz 346.8

Vivante Graphics & Broadcom VideoCore
GPU Name Chip Clock GFlops
GC200 Jz4760 ??? MHz ???
GC400 i.MX6 SoloX ??? MHz ???
GC500 PXA920 315 MHz 0.96
GC800 RK2918
ATM7013
ATM7019
575 MHz 4.6
GC860 Jz4770 ??? MHz ???
GC880 i.MX6S
i.MX6DL
??? MHz ???
GC1000 PXA986
PXA988
PXA1088
600 MHz 9.6
GC1000 Plus ATM7029 630 MHz 10.1
GC2000 i.MX6D
i.MX6Q
600 MHz 19.2
GC4000 K3V2 480 MHz 30.7
GC3000 S32V234 800 MHz 32
GC5000 PXA1928 800 MHz 64
GC6000
GC6400
--- 800 MHz 128
GC7000UL PXA1908 800 MHz 16
GC7000L PXA1936 800 MHz 32
GC7000 --- 800 MHz 64
GC7200 --- 800 MHz 128
GC7400 --- 800 MHz 256
GC7600 --- 800 MHz 512
GC8000 --- --- ---
VideoCore1 VC01 --- ---
VideoCore2 BCM2702
BCM2705
BCM2722
BCM2724
--- ---
VideoCore3 BCM2727
BCM11181
--- ---
VideoCore4 BCM2763
BCM2820
BCM2835
BCM2836
BCM11182
BCM11311
BCM21533
BCM21654
BCM21663
BCM21664
BCM21664T
BCM28145
BCM28150
BCM28155
BCM23550
250 MHz 24
BCM2837 300 MHz 28.8

Intel Hd Graphics
Name type EUs Chip Fab Clock(MHz) GFlops
GMA 4500 Series Gen 4 10 G41、G43、G45... 65nm 533~800 21~32
HD Graphics Gen 5 12 Clarkdale
Arrandale
45nm 533~900 25.6~43.2
HD Graphics
HD Graphics 2000
Gen 6 6 SandyBridge GT1 32nm 950~1350 45.6~64.8
HD Graphics 3000 Gen 6 12 SandyBridge GT2 32nm 1000~1350 96~129.6
HD Graphics Gen 7 4 Bay Trail-T

  • Atom Z37xx
  • Atom E38xx

Bay Trail-M

  • Pentium N35xx
  • Celeron N2xxx

Bay Trail-D

  • Pentium J2xxx
  • Celeron J1xxx
22nm 400~896 25.6~57.3
HD Graphics
HD Graphics 2500
Gen 7 6 IvyBridge GT1 22nm 800~1150 76.8~110.4
HD Graphics 4000
HD Graphics P4000
Gen 7 16 IvyBridge GT2 22nm 850~1300 217.6~332.8
HD Graphics Gen 7.5 10 Haswell GT1 22nm 850~1150 136~184
HD Graphics 4400 Gen 7.5 12 Haswell GT1.5 22nm 1150~1300 220.8~249.6
HD Graphics 4200
HD Graphics 4400 (Mobile)
HD Graphics 4600
HD Graphics P4600
HD Graphics P4700
Gen 7.5 20 Haswell GT2 22nm 850~1350 272~432
HD Graphics 5000
Iris Graphics 5100
Gen 7.5 40 Haswell GT3 22nm 1000~1100 640~704
Iris Pro 5200
(with 128MB eDRAM)
Gen 7.5 40 Haswell GT3e 22nm 1200~1300 768~832
HD Graphics
HD Graphics 400
Gen 8 12 Cherry Trail

  • Atom x5-Z83xx
  • Atom x5-Z85xx

Braswell

  • Celeron N30xx
  • Celeron N31xx
  • Celeron J30xx
  • Celeron J31xx
14nm 500~700 96~134.4
HD Graphics
HD Graphics 405
Gen 8 16 Cherry Trail

  • Atom x7-Z87xx

Braswell

  • Pentium N37xx
14nm 600~700 153.6~179.2
HD Graphics 405 Gen 8 18 Braswell

  • Pentium J3710
14nm 740 213.12
HD Graphics (Broadwell) Gen 8 12 Broadwell-U GT1 14nm 800~850 153.6~163.2
HD Graphics 5300 Gen 8 24 Broadwell-Y GT2

  • Core M-5Yxx
14nm 800~850 307.2~326.4
HD Graphics 5500 Gen 8 23 Broadwell-U GT2 14nm 850~900 312.8~331.2
Gen 8 24 Broadwell-U GT2 14nm 900~950 345.6~364.8
HD Graphics 5600
HD Graphics P5700
Gen 8 24 Broadwell-U GT2 14nm 1000~1050 384~403.2
HD Graphics 6000 Gen 8 48 Broadwell-U GT3 14nm 950~1000 729.6~768
Iris Graphics 6100 Gen 8 48 Broadwell-U GT3 14nm 1050~1100 806.4~844.8
Iris Pro Graphics 6200
Iris Pro Graphics P6300
(with 128MB eDRAM)
Gen 8 48 Broadwell GT3e 14nm 1000~1150 768~883.2
HD Graphics 500 Gen 9 12 Apollo Lake

  • Celeron N3350
  • Celeron N3450
  • Celeron J3355
  • Celeron J3455
14nm 650~750 124.8~144
HD Graphics 505 Gen 9 18 Apollo Lake

  • Pentium N4200
  • Pentium J4205
14nm 750~800 216~230.4
HD Graphics 510 Gen 9 12 Skylake GT1 14nm 900~1000 172.8~192
HD Graphics 515 Gen 9 24 Skylake-Y GT2

  • Core M3
  • Core M5
  • Core M7
14nm 800~1000 307.2~384
HD Graphics 520 Gen 9 24 Skylake-U GT2 14nm 1000~1050 384~403.2
HD Graphics 530
HD Graphics P530
Gen 9 24 Skylake GT2 14nm 900~1150 345.6~441.6
Iris Graphics 540
Iris Graphics 550
(with 64MB eDRAM)
Gen 9 48 Skylake GT3e 14nm 950~1100 729.6~844.8
Iris Pro Graphics 580
Iris Pro Graphics P580
(with 128MB eDRAM)
Gen 9 72 Skylake GT4e 14nm 1000 1152
HD Graphics 610 Gen 9+ 12 Kaby Lake GT1 14nm 900~1050 172.8~201.6
HD Graphics 615 Gen 9+ 24 Kaby Lake-Y GT2

  • Pentium 4410Y
  • Core M3-7Yxx
  • Core i5-7Yxx
  • Core i7-7Yxx
14nm 850~1050 326.4~403.2
HD Graphics 620 Gen 9+ 24 Kaby Lake-U GT2 14nm 1000~1150 384~441.6
HD Graphics 630
HD Graphics P630
Gen 9+ 24 Kaby Lake GT2 14nm 950~1150 364.8~441.6
Iris Plus Graphics 640
Iris Plus Graphics 650
(with 64MB eDRAM)
Gen 9+ 48 Kaby Lake GT3e 14nm 950~1150 729.6~883.2

Nvidia Geforce Gtx 600 Series
GPU Name Card Core Clock(MHz) DDR Bus(bit) GFlops
GK110 GTX Titan 2688 837~876 GDDR5 384 4500
GK104 GTX 680 1536 1006~1110 GDDR5 256 3250
GTX 670 1344 915~1084 GDDR5 256 2760
GTX 660Ti 1344 915~1058 GDDR5 192 2460
GK106 GTX 660 960 980~1032 GDDR5 192 1881.6
GTX 650Ti Boost 768 980~1032 GDDR5 192 1505.2
GTX 650Ti 768 928 GDDR5 128 1425.4
GK107 GTX 650 384 1058 GDDR5 128 812.5
GT 640 384 900 DDR3 128 691.2
More nVIDIA Geforce Information in wiki

AMD Radeon Hd 7000 Series
GPU Name Card Core Clock(MHz) DDR Bus(bit) GFlops
Tahiti XT2 HD 7970 GHZ 2048 1000~1050 GDDR5 384 4096~4300
Tahiti XT HD 7970 2048 925 GDDR5 384 3788.8
Tahiti Pro HD 7950 Boost 1792 850~925 GDDR5 384 3046.4~3315.2
HD 7950 1792 800 GDDR5 384 2867.2
Tahiti LE HD 7870 XT 1536 925~975 GDDR5 256 2841.6~2995.2
Pitcairn XT HD 7870 GHZ 1280 1000 GDDR5 256 2560
Pitcairn Pro HD 7850 1024 860 GDDR5 256 1761.28
Bonaire XT HD 7790 896 1000 GDDR5 128 1792
Cape Verde XT HD 7770 GHZ ver.2 640 1100 GDDR5 128 1408
HD 7770 GHZ 640 1000 GDDR5 128 1280
Cape Verde Pro HD 7750 ver.2 512 900 GDDR5 128 921.6
HD 7750 512 800 GDDR5 128 819.2

More AMD Radeon Information in wiki

Snapdragon 820
三星14nm FinFET工艺
2.2Ghz四核Kryo构架(自主黑科技)
GPU为Adreno 530@510~624 MHzMHz,GPU浮点性能407.4~498.5 GFlops
内存带宽28.8GB/s(LPDDR4双通道)

Exynos 8890
三星14nm FinFET工艺
1.6Ghz四核 ARM A-53加上2.3~2.6Ghz四核 Exynos M1核心构架(自主黑科技)
GPU为Mali-T880MP12(12核心)@650 MHz,GPU浮点性能265.2 GFlops
内存带宽28.7GB/s(LPDDR4双通道)

Helio X20
台积电 20nm工艺
十核心2x Cortex-A72 @ 2.5GHz加上4x Cortex-A53 @ 2.0GHz加上4x Cortex-A53 @ 1.4GHz
GPU为Mali-T880 MP4@780 MHz,GPU浮点性能106 GFlops
内存带宽14.9GB/s(LPDDR3双通道)

Kirin 950
台积电16nm工艺
八核心 4X ARM Cortex-A72 @ 2.3GHz加上4X ARM Cortex-A53@1.8GHz
GPU为Mali-T880 MP4@900 MHz,GPU浮点性能122.4 GFlops
内存带宽25.6GB/s(LPDDR4双通道)
继续阅读常见GPU的浮点性能

macOS Mojave (10.14.3) Android Studio 3.3.1 NDK 19.1.5304403 导入并构建Vuh项目

以前在 Android Studio 3.2.1上vuh库使用的例子 中实现了一个使用 vuh 库的例子。 那个例子中的 vuh 库是我们编译好 libvuh.so 之后直接引用的,我们下面实现通过直接编译代码实现整合。

尝试过使用 ExternalProject_addinclude 的方式包含 vuh 库,但是都不是很成功。

其中 ExternalProject_add 导入的项目只能编译一次,即使指定 BUILD_ALWAYS 1 也没用,这个应该是 Ninja 导致的问题,导致当出现多个 ABI 或者 vuh 库代码变动之后,不能重新编译,出现各种编译错误。

使用 include 包含的项目会导致路径信息不正确,无法找到源代码文件。

最后使用 add_subdirectory实现。

修改之后的几个关键文件如下:

注意: VUH_ROOT_DIR 这个变量中指定 vuh 库代码的位置

注意:由于 vuh 库需要 CMake 3.8 。因此,我们需要手工指定CMake版本为3.10.2 。

如下:

如果出现如下错误:

则执行如下操作:

如果出现如下错误:

则删除代码中的 jniLibs/armeabi-v7a/libvuh.so 即可解决问题。

完整的例子点击此处下载 vuhAndroid

参考链接