Vulkan offers another key difference to OpenGL with respect to memory allocation. When it comes to managing memory allocations as well as assigning it to individual resources, the OpenGL driver does most of the work for the developer. This allows applications to be developed, tested and deployed very quickly. In Vulkan however, the programmer takes responsibility meaning that many operations that OpenGL orchestrates heuristically can be orchestrated based on an absolute knowledge of the resource lifecycle.
每日归档: 2019年3月11日
Vulkan直接使用CPU内存指针
Depending on the target platform, some recently published EXT extensions allow sharing memory between different physical devices.
VK_EXT_external_memory_host enables importing host allocations or host-mapped foreign device memory using a host pointer as the handle.
VK_EXT_external_memory_dma_buf enables importing dma_buf handles on Linux which can possibly come from another physical device.
The spec now also has a table where it's listed which external memory handle types require a matching physical device and which don't.
Additionally, I'd also like to draw your attention to additional features which enable execution control across multiple physical devices. At least on Linux (and possibly other POSIX based systems) semaphores and fences can be shared across physical devices if the FENCE_FD and SYNC_FD handle types are used. These are part of the KHR external semaphore/fence extensions.
扩展 VK_EXT_external_memory_host 在2018
年4
月04
被合并到Android
主分支,后续的版本可能可以使用这个插件了,这个使得显卡设备可以直接使用CPU
创建的内存指针,减少内存的拷贝操作。
参考链接
Vulkan Device Memory
This post serves as a guide on how to best use the various Memory Heaps and Memory Types exposed in Vulkan on AMD drivers, starting with some high-level tips.
- GPU Bulk Data
Place GPU-side allocations in DEVICE_LOCAL without HOST_VISIBLE. Make sure to allocate the highest priority resources first like Render Targets and resources which get accessed more often. Once DEVICE_LOCAL fills up and allocations fail, have the lower priority allocations fall back to CPU-side memory if required via HOST_VISIBLE with HOST_COHERENT but without HOST_CACHED. When doing in-game reallocations (say for display resolution changes), make sure to fully free all allocations involved before attempting to make any new allocations. This can minimize the possibility that an allocation can fail to fit in the GPU-side heap. - CPU-to-GPU Data Flow
For relatively small total allocation size (under 256 MB) the DEVICE_LOCAL with HOST_VISIBLE is the perfect Memory Type for CPU upload to GPU cases: the CPU can directly write into GPU memory which the GPU can then access without reading across the PCIe bus. This is great for upload of constant data, etc. - GPU-to-CPU Data Flow
Use HOST_VISIBLE with HOST_COHERENT and HOST_CACHED. This is the only Memory Type which supports cached reads by the CPU. Great for cases like recording screen-captures, feeding back Hierarchical Z-Buffer occlusion tests, etc.
Pooling Allocations
EDIT: Great reminder from Axel Gneiting (leading Vulkan implementation in DOOM® at id Software), make sure to pool a group of resources, like textures and buffers, into a single memory allocation. On Windows® 7 for example, Vulkan memory allocations map to WDDM Allocations (the same lists seen in GPUView), and there is a relatively high cost associated for a WDDM Allocation as command buffers flow through the WDDM based driver stack. Having 256 MB per DEVICE_LOCAL allocation can be a good target, takes only 16 allocations to fill 4 GB.
Hidden Paging
When an application starts over-subscribing GPU-side memory, DEVICE_LOCAL memory allocations will fail. It is also possible that later during application execution, another application in the system increases its usage of GPU-side memory, resulting in dynamic over-subscribing of GPU-side memory. This case can result in an OS (for instance Windows® 7) to silently migrate or page GPU-side allocations to/from CPU-side as it time-slices execution of each application on the GPU. This can result in visible “hitching”. There is currently no method to directly query if the OS is migrating allocations in Vulkan. One possible workaround is for the app to detect hitching by looking at time-stamps, and then actively attempting to reduce DEVICE_LOCAL memory consumption when hitching is detected. For example, the application could manually move around resources to fully empty DEVICE_LOCAL allocations which can then be freed.
EDIT: Targeting Low-Memory GPUs
When targeting a memory surplus, using DEVICE_LOCAL+HOST_VISIBLE for CPU-write cases can bypass the need to schedule an extra copy. However in memory constrained situations it is much better to use DEVICE_LOCAL+HOST_VISIBLE as an extension to the DEVICE_LOCAL heap and use it for GPU Resources like Textures and Buffers. CPU-write cases can switch to HOST_VISIBLE+COHERENT. The number one priority for performance is keeping the high bandwidth access resources in GPU-side memory.
Memory Heap and Memory Type – Technical Details
Driver Device Memory Heaps and Memory Types can be inspected using the Vulkan Hardware Database. For Windows AMD drivers, below is a breakdown of the characteristics and best usage models for all the Memory Types. Heap and Memory Type numbering is not guaranteed by the Vulkan Spec, so make sure to work from the Property Flags directly. Also note memory sizes reported in Vulkan represent the maximum amount which is shared across applications and driver.
- Heap 0
- VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
- Represents memory on the GPU device which can not be mapped into Host system memory
- Using 256 MB per
vkAllocateMemory()
allocation is a good starting point for collections of buffers and images - Suggest using separate allocations for large allocations which might need to be resized (freed and reallocated) at run-time
- Memory Type 0
- VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
- Full speed read/write/atomic by GPU
- No ability to use
vkMapMemory()
to map into Host system address space - Use for standard GPU-side data
- Heap 1
- VK_MEMORY_HEAP_DEVICE_LOCAL_BIT
- Represents memory on the GPU device which can be mapped into Host system memory
- Limited on Windows to 256 MB
- Best to allocate at most 64 MB per
vkAllocateMemory()
allocation - Fall back to smaller allocations if necessary
- Best to allocate at most 64 MB per
- Memory Type 1
- VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
- VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
- VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
- Full speed read/write/atomic by GPU
- Ability to use
vkMapMemory()
to map into Host system address space - CPU writes are write-combined and write directly into GPU memory
- Best to write full aligned cacheline sized chunks
- CPU reads are uncached
- Best to use Memory Type 3 instead for GPU write and CPU read cases
- Use for dynamic buffer data to avoid an extra Host to Device copy
- Use for a fall-back when Heap 0 runs out of space before resorting to Heap 2
- Heap 2
- Represents memory on the Host system which can be accessed by the GPU
- Suggest using similar allocation size strategy as Heap 0
- Ability to use
vkMapMemory()
- GPU reads for textures and buffers are cached in GPU L2
- GPU L2 misses read across the PCIe bus to Host system memory
- Higher latency and lower throughput on an L2 miss
- GPU reads for index buffers are cached in GPU L2 in Tonga and later GPUs like FuryX
- Memory Type 2
- VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
- VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
- CPU writes are write-combined
- CPU reads are uncached
- Use for staging for upload to GPU device
- Can use as a fall-back when GPU device runs out of memory in Heap 0 and Heap 1
- Memory Type 3
- VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT
- VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
- VK_MEMORY_PROPERTY_HOST_CACHED_BIT
- CPU reads and writes go through CPU cache hierarchy
- GPU reads snoop CPU cache
- Use for staging for download from GPU device
Choosing the correct Memory Heap and Memory Type is a critical task in optimization. A GPU like Radeon™ Fury X for instance has 512 GB/s of DEVICE_LOCAL bandwidth (sum of any ratio of read and write) but the PCIe bus supports at most 16 GB/s read and at most 16 GB/s write for a sum of 32 GB/s in both directions.
参考链接
常见GPU的浮点性能
Game Consoles GPU
Consoles Name | GPU Name | Fab | Clock | GFlops |
NDS | ARM946E-S (CPU) | 180/130nm | 67 MHz | 0.6 |
N3DS | PICA 200 | 45nm | 200 MHz | 4.8 |
PSP | R4000 x 2 | 90nm | 333 MHz | 2.6 |
PS VITA | SGX543 MP4+ | 45nm | 222 MHz | 28.4 |
Dreamcast | PowerVR2 CLX2 | 250nm | 100 MHz | 2.1 |
XBOX | XGPU (NV2A) | 150nm | 233 MHz | 20 |
XBOX 360 | ATI R500 Xenos | 90/65/45nm | 500 MHz | 240 |
XBOX ONE XBOX ONE S |
AMD Radeon GCN (12CU 768 Cores) |
28/16nm | 853 MHz 914 MHz |
1311.5 1405.2 |
XBOX ONE X | AMD Radeon GCN (40CU 2560 Cores) |
16nm | 1172 MHz | 6000 |
PlayStation 2 | GS | 180/150/90nm | 147 MHz | 6.2 (EE+GS) |
PlayStation 3 | RSX (NVIDIA G70) | 90/65/45nm | 550 MHz | 228.8 |
PlayStation 4 PlayStation 4 Slim |
AMD Radeon GCN (18CU 1152 Cores) |
28/16nm | 800 MHz | 1840 |
PlayStation 4 Pro | AMD Radeon GCN (36CU 2304 Cores) |
16nm | 911 MHz | 4200 |
N64 | SGI RCP | 350nm | 62.5 MHz | 0.1~0.2 |
GameCube | Flipper | 180nm | 162 MHz | 9.4 |
Wii | ATI HollyWood | 90nm | 243 MHz | 12 |
Wii U | ATI RV770 | 40nm | 550 MHz | 176 |
Switch | Tegra X1 (Undocked) | 20nm | 307.2 MHz | 157.2 |
Tegra X1 (Docked) | 20nm | 768 MHz | 393.2 | |
Ouya | Tegra 3 (Geforce ULP x 12) |
40nm | 520 Mhz | 12.5 |
SHIELD portable | Tegra 4 (Geforce ULP x 72) |
28nm | 672 MHz | 96.8 |
SHIELD TV | Tegra X1 (Maxwell Cores x 256 (2xSMM)) |
20nm | 1000 MHz | 512 |
GPU Name | Chip | Clock | GFlops |
SGX530 | OMAP 3530 | 110 MHz | 0.88 |
DM3730 | 200 MHz | 1.6 | |
--- | 300 MHz | 2.4 | |
SGX531 | MT6513 MT6573 MT6575M |
281 MHz | 2.25 |
R-Car E1 | 400 MHz | 3.2 | |
SGX531 Ultra | MT6515 MT6575 MT6517 MT6517T MT6577 MT6577T MT8317 MT8317T MT8377 |
522 MHz | 4.2 |
SGX535 | S5PC100 Apple A4 |
200 MHz | 1.6 |
Apple A4 (iPad) | 250 MHz | 2.0 | |
--- | 300 MHz | 2.4 | |
SGX540 | Jz4780 | ??? MHz | ??? |
Exynos 3110 | 200 MHz | 3.2 | |
OMAP 4430 | 307 MHz | 4.9 | |
OMAP 4460 | 384 MHz | 6.1 | |
Atom Z2420 R-Car E2 R-Car M1A、M1S |
400 MHz | 6.4 | |
ATM7021 ATM7021A ATM7029B |
500 MHz | 8.0 | |
RK3168 | 600 MHz | 9.6 | |
SGX543 | --- | 200 MHz | 6.4 |
SGX543 MP2 | Apple A5 | 200 MHz | 12.8 |
Apple A5 (iPad2) | 250 MHz | 16.0 | |
MT5327 | 400 MHz | 25.6 | |
R-Car H1 | 520 MHz | 33.28 | |
SGX543 MP3 | Apple A6 | 266 MHz | 25.5 |
SGX543 MP4 | Apple A5X | 250 MHz | 32.0 |
SGX544 | MT6589M MT8117 MT8121 |
156 MHz | 5 |
MT6589 MT8389 |
286 MHz | 9.2 | |
MT8125 | 300 MHz | 9.6 | |
MT6589T MT8389T |
357 MHz | 11.4 | |
OMAP 4470 | 384 MHz | 12.3 | |
Broadcom M320 Broadcom M340 |
??? | ??? | |
ATM7039 | 450 MHz | 14.4 | |
SGX544 MP2 | Atom Z2520 | 300 MHZ | 19.2 |
Allwinner A31 Allwinner A31s |
350 MHz | 22.4 | |
Atom Z2560 | 400 MHz | 25.6 | |
R-Car M2 | 520 MHz | 33.28 | |
Atom Z2580 | 533 MHz | 34.1 | |
Allwinner A83T Allwinner H8 |
700 MHz | 44.8 | |
SGX544 MP3 | Exynos 5410 | 533 MHz | 51.1 |
SGX545 | --- | 300 MHz | 4.8 |
Atom Z2460 Atom Z2760 |
533 MHz | 8.5 | |
SGX554 | --- | 300 MHz | 19.2 |
SGX554 MP2 | --- | 300 MHz | 38.4 |
SGX554 MP4 | Apple A6X | 266 MHz | 68.1 |
G6020 (0.25 Clusters) |
--- | 300 MHz | 4.8 |
G6050 G6060 (0.5 Clusters) |
--- | 300 MHz | 9.6 |
G6100 G6110 (1 Clusters) |
RK3368 | 600 MHz | 38.4 |
G6200 (2 Clusters) |
MT6595M MT8135 |
450 MHz | 57.6 |
MT6795M | 550 MHz | 70.4 | |
MT6595 MT6595T |
600 MHz | 76.8 | |
MT6793 Helio X10 (MT6795、MT6795T) |
700 MHz | 89.6 | |
G6230 (2 Clusters) |
Allwinner A80 Allwinner A80T |
533 MHz | 68.0 |
ATM9009 | 600 MHz | 76.8 | |
GX6240 (2 Clusters) |
--- | 650 MHz | 83.2 |
GX6250 (2 Clusters) |
MT8173 MT8176 |
600 MHz | 76.8 |
MT8693 | 700 MHz | 89.6 | |
--- | 750 MHz | 96 | |
G6400 (4 Clusters) |
--- | 300 MHz | 76.8 |
Atom Z3460 Atom Z3480 |
533 MHz | 136.4 | |
R-Car H2 | 600 MHz | 153.6 | |
G6430 (4 Clusters) |
--- | 300 MHz | 76.8 |
Apple A7 Apple A7 (iPad Air) |
450 MHz | 115.2 | |
Atom Z3530 | 457 MHz | 117 | |
Atom Z3560 Atom Z3580 |
533 MHz | 136.4 | |
Atom Z3570 Atom Z3590 |
640 MHz | 163.8 | |
GX6450 (4 Clusters) |
Apple A8 | 450 MHz | 115.2 |
--- | 600 MHz | 153.6 | |
G6630 (6 Clusters) |
--- | 450 MHz | 172.8 |
--- | 600 MHz | 230.4 | |
GX6650 (6 Clusters) |
R-Car H3 | 600 MHz | 230.4 |
GX6850 (8 Clusters) |
Apple A8X | 450 MHz | 230.4 |
--- | 600 MHz | 307.2 | |
GE7400 (0.5 Clusters) |
--- | 600 MHz | 19.2 |
GE7800 (1 Clusters) |
--- | 600 MHz | 38.4 |
GT7200 (2 Clusters) |
--- | 650 MHz | 83.2 |
SC9861G-IA | ??? MHz | ??? | |
GT7400 (4 Clusters) |
--- | 650 MHz | 166.4 |
GT7400 Plus (4 Clusters) |
Helio X30 | 800 MHz | 204.8 |
GT7600 (6 Clusters) |
Apple A9 | 450 MHz | 172.8 |
GT7600 Plus (6 Clusters) |
Apple A10 Fusion | 650 MHz? | 249.6? |
GT7800 (8 Clusters) |
--- | 650 MHz | 332.8 |
GT7800+ (12 Clusters) |
Apple A9X | 450 MHz | 345.6 |
GT7800? (12 Clusters) |
Apple A10X Fusion | 650 MHz? | 499.2? |
GT7900 (16 Clusters) |
--- | 650 MHz | 665.6 |
--- | 800 MHz | 819.2 | |
GT8525 (2 Clusters) |
--- | 1000 MHz | 192 |
GPU Name | Chip | Fab | Clock | GFlops |
Adreno 130 | MSM7x00 MSM7x00A MSM7x01 MSM7x01A |
??nm | 133 MHz | 1.2 |
Adreno 200 | Snapdragon S1
|
65nm | 133 MHz | 2.1 |
Snapdragon S1
|
45nm | 200 MHz | 3.2 | |
Snapdragon S1
|
45nm | 245 MHz | 3.92 | |
Adreno 203 | Snapdragon S4 Play
|
45nm | 245 MHz | 7.84 |
Snapdragon 200
|
45nm | 294 MHz | 9.4 | |
Adreno 205 | Snapdragon S2
|
45nm | 266 MHz | 8.5 |
Adreno 220 | Snapdragon S3
|
45nm | 266MHz | 17 |
Adreno 225 | Snapdragon S4 Plus
|
28nm | 200 MHz | 12.8 |
Snapdragon S4 Plus (MSM8660A) | 28nm | 300 MHz | 19.2 | |
Snapdragon S4 Plus (MSM8960) | 28nm | 400 MHz | 25.6 | |
Adreno 302 | Snapdragon 200
|
28nm | 400 MHz | 19.2 |
Adreno 304 | Snapdragon 208 Snapdragon 210 Snapdragon 212 Snapdragon Wear 2100 |
28nm | 400 MHz | 19.2 |
Adreno 305 | Snapdragon S4 Plus
Snapdragon 400
|
28nm | 400~450 MHz | 19.2~21.6 |
Adreno 306 | Snapdragon 410 (MSM8916) Snapdragon 412 (MSM8916v2) |
28nm | 400 MHz | 21.6 |
Adreno 308 | Snapdragon 425 (MSM8917) Snapdragon 427 |
28nm | 500 MHz | 27 |
Adreno 320 (64 ALU) |
Snapdragon S4 Pro
Snapdragon S4 Prime (MPQ8064) |
28nm | 400 MHz | 57.6 |
Adreno 320 (96 ALU) |
Snapdragon 600 (APQ8064T) | 28nm | 400 MHz | 86.4 |
Snapdragon 600 (APQ8064AB) | 28nm | 450 MHz | 97.2 | |
Adreno 330 | Snapdragon 800
|
28nm | 450 MHz | 129.8 |
Snapdragon 801
|
28nm | 550 MHz | 158.4 | |
Snapdragon 801 (MSM8974AC) | 28nm | 578 MHz | 166.5 | |
Adreno 405 | Snapdragon 415 (MSM8929) Snapdragon 615 (MSM8939) Snapdragon 616 (MSM8939v2) Snapdragon 617 (MSM8952) |
28nm | 550 MHz | 59.4 |
Adreno 418 | Snapdragon 808 (MSM8992) | 20nm | 600 MHz | 172.8 |
Adreno 420 | Snapdragon 805 (APQ8084) | 28nm | 500~600 MHz | 144~172.8 |
Adreno 430 | Snapdragon 810
|
20nm | 500~650 MHz | 324~420 |
Adreno 505 | Snapdragon 430 (MSM8937) Snapdragon 435 |
28nm | 450 MHz | 48.6 |
Adreno 506 | Snapdragon 450 | 14nm | 600 MHz | 120 |
Snapdragon 625 Snapdragon 626 |
14nm | 650 MHz | 130 | |
Adreno 508 | Snapdragon 630 | 14nm | 800 MHz? | 160? |
Adreno 510 | Snapdragon 650 (MSM8956) Snapdragon 652 (MSM8976) Snapdragon 653 (MSM8976PRO) |
28nm | 600 MHz | 180 |
Adreno 512 | Snapdragon 660 (MSM8976 Plus) | 14nm | 800 MHz? | 240? |
Adreno 530 | Snapdragon 820 (MSM8996) | 14nm | 510~624 MHz | 407.4~498.5 |
Snapdragon 821 (MSM8996PRO) | 14nm | 650 MHz | 519.2 | |
Adreno 540 | Snapdragon 835 (MSM8998) | 10nm | 710 MHz | 567 |
Adreno 608 | --- | 10nm | ??? MHz | ??? |
Adreno 615 | --- | 10nm | ??? MHz | ??? |
Adreno 630 | Snapdragon 845 | 10nm | ??? MHz | ??? |
GPU Name | Chip | Fab | Clock | GFlops |
Geforce ULP x 8 | Tegra 2 (AP20H) |
40nm | 300 MHz | 4.8 |
Tegra 2 (T20) |
40nm | 333 MHz | 5.6 | |
Tegra 2 (AP25、T25) |
40nm | 400 MHz | 6.7 | |
Geforce ULP x 12 | Tegra 3 (T30L、AP33) |
40nm | 416 MHz | 10 |
Tegra 3 | 40nm | 450 MHz | 10.8 | |
Tegra 3 (T30、T33、AP37) |
40nm | 520 MHz | 12.5 | |
Geforce ULP x 60 | Tegra 4i | 28nm | 660 MHz | 79.2 |
Geforce ULP x 72 | Tegra 4 | 28nm | 672 MHz | 96.8 |
Kepler Cores x 192 (1xSMX) |
Tegra K1 Tegra K1 (Denver) |
28nm | 850 MHz | 326.4 |
Maxwell Cores x 256 (2xSMM) |
Tegra X1 | 20nm | 850 MHz 1000 MHz |
435.2 512 |
Pascal Cores x 256 (2xSMM) |
Tegra Parker | 16nm | 1465 MHz | 750 |
Volta Cores x 512 | Tegra Xavier | 12nm | ???? MHz | ???? |
GPU Name | Chip | Clock | GFlops |
Mali-400 | --- | 200 MHz | 1.8 |
AML8726-M3 | 250 MHz | 2.25 | |
ST-E U8500 | 275 MHz | 2.48 | |
WM8850 WM8950 SC6815A SC7710 SC8810 SC9820 Allwinner A10 Allwinner A10s Allwinner A13 |
300 MHz | 2.7 | |
RK292X | 330 MHz | 2.97 | |
SC7715 SC7727S ST-E U8520 Telechips TCC892x-i Rk2926 RK2928 MT6290 MT8638T MT6572M |
400 MHz | 3.6 | |
MT6570 MT6572 MT8312 MT8321 XMM6321 S5P4418 |
500 MHz | 4.5 | |
--- | 533 MHz | 4.8 | |
Mali-400 MP2 | LC1810 LC1811 |
300 MHz | 5.4 |
WM8880 WM8980 SC6825 SC8825 Allwinner A20 Allwinner A23 Allwinner A33 |
350 MHz | 6.3 | |
SC5735A SC7730A SC7730S SC7731G SC8830 SC8830A SC8831G SC9830A SC9830I SC9836 MT6582M AML7366-M6C AML8726-MX AML8726-MXS AML8726-MXL NS115 LC1813 LC1913 RTD1195 Exynos 3250 |
400 MHz | 7.2 | |
SC8831G | 480 MHz | 8.64 | |
MT6580 MT6582 MT8382 RK3026 RK3036 |
500 MHz | 9.0 | |
RK3126 RK3128 RK3228 RK3229 Allwinner H3 Atom x3-C3130 |
600 MHz | 10.8 | |
Mali-400 MP4 | RK3066 Exynos 4210 |
266 MHz | 9.6 |
Exynos 4212 SC7735S SC8735S SC8835S Hi3716 Hi3718 Hi3719 Rockchip PX2 AML7366-M6L |
400 MHz | 14.4 | |
Exynos 4412 | 440 MHz | 15.84 | |
Exynos 3470 | 450 MHz | 16.2 | |
Exynos 4412 v2 RK3188 S5P6818 |
533 MHz | 19.2 | |
Mali-450 | WM8860 | 300 MHz | 4.5 |
Mali-450 MP2 | AML7366-M6D | 400 MHz | 12 |
Amlogic M803 Amlogic M805 Amlogic M805T Amlogic M806 Amlogic S805 |
500 MHz | 15 | |
Mali-450 MP3 | Amlogic S905 Amlogic S905X |
750 MHz | 33.75 |
Mali-450 MP4 | MT8685 | 416 MHz | 24.8 |
Kirin 620 Mstar 6A908 Mstar 6A918 |
500 MHz | 29.8 | |
Kirin 910 | 533 MHz | 32 | |
MT6588 MT6592M MT8127 MT6591 MT6591H Atom x3-C3230RK Hi3796M V100 Hi3798M V100 |
600 MHz | 35.8 | |
MT6592 MT8392 Kirin 910T |
700 MHz | 41.8 | |
Mali-450 MP6 | Amlogic M801 Amlogic M802 Amlogic S801 Amlogic S802 Amlogic S802H Amlogic S812 Amlogic T866 Hi3796 Hi3798 |
600 MHz | 53.8 |
Mali-450 MP8 | --- | 600 MHz | 71.7 |
Mali-T604 | --- | 533 MHz | 17 |
Mali-T604 MP2 | --- | 533 MHz | 34 |
Mali-T604 MP4 | Exynos 5250 | 533 MHz | 68.2 |
Mali-T622 | --- | 533 MHz | 8.5 |
Mali-T624 | --- | 533 MHz | 17 |
Mali-T624 MP4 | Kirin 920(K3V3) Kirin 925 Kirin 928 Exynos 5260 |
600 MHz | 76.8 |
Mali-T628 | --- | 533 MHz | 17 |
Mali-T628 MP2 | LC1860 LC1860C LC1960 |
600 MHz | 38.4 |
Mali-T628 MP3 | --- | 533 MHz | 51.2 |
Mali-T628 MP4 | Kirin 930 Kirin 935 |
680 MHz | 87 |
Mali-T628 MP6 | Exynos 5420 Exynos 5422 |
533 MHz | 102.4 |
Exynos 5430 | 600 MHz | 115.2 | |
Mali-T720 | --- | 450 MHz | 7.65 |
Exynos 7270 Exynos 7570 |
??? MHz | ??? | |
Mali-T720 MP2 | MT6735P MT8735P |
400 MHz | 13.6 |
MT6735M MT8735M |
500 MHz | 17 | |
MT8163V/B | 520 MHz | 17.68 | |
MT6737 MT8735D MT8735B |
550 MHz | 18.7 | |
Atom x3-C3440 Exynos 3475 MT6735 MT6737T MT8163V/A |
600 MHz | 20.4 | |
Exynos 7580 | 668 MHz | 22.7 | |
Mali-T720 MP3 | MT6753 MT6753T MT8783 |
700 MHz | 35.7 |
Mali-T720 MP6 | LC1980 | ??? | ??? |
Mali-T720 MP8 | --- | 600 MHz | 81.6 |
Mali-T720 MP? | Hi3798C V200 | ??? | 103 |
Mali-T760 | --- | 600 MHz | 20.4 |
Mali-T760 MP2 | MT6732 MT6732M MT8732 |
500 MHz | 34 |
MT6752 MT6752M MT8752 |
700 MHz | 47.6 | |
Mali-T760 MP4 | Mstar 6A928 | 552 MHz | 75 |
RK3288 RK3288-C |
600 MHz | 81.6 | |
Mali-T760 MP6 | Exynos 5433 (Exynos 7410) |
700 MHz | 142.8 |
Mali-T760 MP8 | Exynos 7420 | 772 MHz | 210 |
Mali-T820 | --- | 600 MHz | 10.2 |
SC9850 | ??? MHz | ??? | |
Mali-T820 MP3 | Amlogic S912 Mstar 6A938 |
600 MHz | 30.6 |
Mali-T830 | --- | 600 MHz | 20.4 |
Mali-T830 MP2 | Amlogic S966 Amlogic T966 Amlogic T968 |
650 MHz | 44.2 |
Kirin 650 Kirin 655 Kirin 658 |
900 MHz | 61.2 | |
Exynos 7870 | 700 MHz | 47.6 | |
Mali-T830 MP3 | Exynos 7880 | 950 MHz | 71.4 |
Mali-T860 | --- | 700 MHz | 23.8 |
Mali-T860 MP2 | MT6738 | 350 MHz | 23.8 |
MT6750 MT6738T |
520 MHz | 35.3 | |
Helio P10 (MT6755M) | 550 MHz | 37.4 | |
MT6750T | 650 MHz | 44.2 | |
Helio P10 (MT6755) MT8785 |
700 MHz | 47.6 | |
MT6739 Helio P15 (MT6755T) |
800 MHz | 54.4 | |
Mali-T860 MP3 | Exynos 7650 | 700 MHz | 71.4 |
Mali-T860 MP4 | RK3399 | 600 MHz | 81.6 |
Pinecone S1 (V670) | 922 MHz | 125.4 | |
Mali-T880 | --- | 850 MHz | 28.9 |
Mali-T880 MP?? | LG Nuclun 2 | ??? MHz | ??? |
Mali-T880 MP2 | Helio P20 (MT6757) | 900 MHz | 61.2 |
Helio P25 (MT6757CD) | 1000 MHz | 68 | |
Mali-T880 MP4 | SC9860GV | ??? MHz | ??? |
Helio X20 (MT6797) Helio X23 (MT6797D) |
780 MHz | 106 | |
Helio X25 (MT6797T) | 850 MHz | 115.6 | |
Helio X27 (MT6797X) | 875 MHz | 119 | |
Kirin 950 (Boost) Kirin 955 (Boost) |
900 MHz | 122.4 | |
Mali-T880 MP10 | Exynos 8890 (Lite) | 650 MHz | 221 |
Mali-T880 MP12 | Exynos 8890 | 650 MHz | 265.2 |
Mali-G51 | --- | ??? MHz | ??? |
Mali-G71 | --- | 850 MHz | 28.9 |
Mali-G71 MP2 | MT6763 Helio P23 (MT6763T) |
770 MHz | 52.36 |
Helio P30 (MT6758) | 950 MHz | 64.6 | |
Mali-G71 MP8 | Kirin 960 | 1037 MHz | 282 |
Mali-G71 MP12 | Pinecone S2? (V970) | 900 MHz? | 367.2? |
Mali-G71 MP18 | Exynos 8895 (Lite) | 546 MHz | 334 |
Mali-G71 MP20 | Exynos 8895 | 546 MHz | 371.2 |
Mali-G72 | --- | 850 MHz | 28.9 |
Mali-G72 MP3 | Exynos 9610 | ??? MHz | ??? |
Mali-G72 MP12 | Kirin 970 | 850 MHz | 346.8 |
GPU Name | Chip | Clock | GFlops |
GC200 | Jz4760 | ??? MHz | ??? |
GC400 | i.MX6 SoloX | ??? MHz | ??? |
GC500 | PXA920 | 315 MHz | 0.96 |
GC800 | RK2918 ATM7013 ATM7019 |
575 MHz | 4.6 |
GC860 | Jz4770 | ??? MHz | ??? |
GC880 | i.MX6S i.MX6DL |
??? MHz | ??? |
GC1000 | PXA986 PXA988 PXA1088 |
600 MHz | 9.6 |
GC1000 Plus | ATM7029 | 630 MHz | 10.1 |
GC2000 | i.MX6D i.MX6Q |
600 MHz | 19.2 |
GC4000 | K3V2 | 480 MHz | 30.7 |
GC3000 | S32V234 | 800 MHz | 32 |
GC5000 | PXA1928 | 800 MHz | 64 |
GC6000 GC6400 |
--- | 800 MHz | 128 |
GC7000UL | PXA1908 | 800 MHz | 16 |
GC7000L | PXA1936 | 800 MHz | 32 |
GC7000 | --- | 800 MHz | 64 |
GC7200 | --- | 800 MHz | 128 |
GC7400 | --- | 800 MHz | 256 |
GC7600 | --- | 800 MHz | 512 |
GC8000 | --- | --- | --- |
VideoCore1 | VC01 | --- | --- |
VideoCore2 | BCM2702 BCM2705 BCM2722 BCM2724 |
--- | --- |
VideoCore3 | BCM2727 BCM11181 |
--- | --- |
VideoCore4 | BCM2763 BCM2820 BCM2835 BCM2836 BCM11182 BCM11311 BCM21533 BCM21654 BCM21663 BCM21664 BCM21664T BCM28145 BCM28150 BCM28155 BCM23550 |
250 MHz | 24 |
BCM2837 | 300 MHz | 28.8 |
Name | type | EUs | Chip | Fab | Clock(MHz) | GFlops |
GMA 4500 Series | Gen 4 | 10 | G41、G43、G45... | 65nm | 533~800 | 21~32 |
HD Graphics | Gen 5 | 12 | Clarkdale Arrandale |
45nm | 533~900 | 25.6~43.2 |
HD Graphics HD Graphics 2000 |
Gen 6 | 6 | SandyBridge GT1 | 32nm | 950~1350 | 45.6~64.8 |
HD Graphics 3000 | Gen 6 | 12 | SandyBridge GT2 | 32nm | 1000~1350 | 96~129.6 |
HD Graphics | Gen 7 | 4 | Bay Trail-T
Bay Trail-M
Bay Trail-D
|
22nm | 400~896 | 25.6~57.3 |
HD Graphics HD Graphics 2500 |
Gen 7 | 6 | IvyBridge GT1 | 22nm | 800~1150 | 76.8~110.4 |
HD Graphics 4000 HD Graphics P4000 |
Gen 7 | 16 | IvyBridge GT2 | 22nm | 850~1300 | 217.6~332.8 |
HD Graphics | Gen 7.5 | 10 | Haswell GT1 | 22nm | 850~1150 | 136~184 |
HD Graphics 4400 | Gen 7.5 | 12 | Haswell GT1.5 | 22nm | 1150~1300 | 220.8~249.6 |
HD Graphics 4200 HD Graphics 4400 (Mobile) HD Graphics 4600 HD Graphics P4600 HD Graphics P4700 |
Gen 7.5 | 20 | Haswell GT2 | 22nm | 850~1350 | 272~432 |
HD Graphics 5000 Iris Graphics 5100 |
Gen 7.5 | 40 | Haswell GT3 | 22nm | 1000~1100 | 640~704 |
Iris Pro 5200 (with 128MB eDRAM) |
Gen 7.5 | 40 | Haswell GT3e | 22nm | 1200~1300 | 768~832 |
HD Graphics HD Graphics 400 |
Gen 8 | 12 | Cherry Trail
Braswell
|
14nm | 500~700 | 96~134.4 |
HD Graphics HD Graphics 405 |
Gen 8 | 16 | Cherry Trail
Braswell
|
14nm | 600~700 | 153.6~179.2 |
HD Graphics 405 | Gen 8 | 18 | Braswell
|
14nm | 740 | 213.12 |
HD Graphics (Broadwell) | Gen 8 | 12 | Broadwell-U GT1 | 14nm | 800~850 | 153.6~163.2 |
HD Graphics 5300 | Gen 8 | 24 | Broadwell-Y GT2
|
14nm | 800~850 | 307.2~326.4 |
HD Graphics 5500 | Gen 8 | 23 | Broadwell-U GT2 | 14nm | 850~900 | 312.8~331.2 |
Gen 8 | 24 | Broadwell-U GT2 | 14nm | 900~950 | 345.6~364.8 | |
HD Graphics 5600 HD Graphics P5700 |
Gen 8 | 24 | Broadwell-U GT2 | 14nm | 1000~1050 | 384~403.2 |
HD Graphics 6000 | Gen 8 | 48 | Broadwell-U GT3 | 14nm | 950~1000 | 729.6~768 |
Iris Graphics 6100 | Gen 8 | 48 | Broadwell-U GT3 | 14nm | 1050~1100 | 806.4~844.8 |
Iris Pro Graphics 6200 Iris Pro Graphics P6300 (with 128MB eDRAM) |
Gen 8 | 48 | Broadwell GT3e | 14nm | 1000~1150 | 768~883.2 |
HD Graphics 500 | Gen 9 | 12 | Apollo Lake
|
14nm | 650~750 | 124.8~144 |
HD Graphics 505 | Gen 9 | 18 | Apollo Lake
|
14nm | 750~800 | 216~230.4 |
HD Graphics 510 | Gen 9 | 12 | Skylake GT1 | 14nm | 900~1000 | 172.8~192 |
HD Graphics 515 | Gen 9 | 24 | Skylake-Y GT2
|
14nm | 800~1000 | 307.2~384 |
HD Graphics 520 | Gen 9 | 24 | Skylake-U GT2 | 14nm | 1000~1050 | 384~403.2 |
HD Graphics 530 HD Graphics P530 |
Gen 9 | 24 | Skylake GT2 | 14nm | 900~1150 | 345.6~441.6 |
Iris Graphics 540 Iris Graphics 550 (with 64MB eDRAM) |
Gen 9 | 48 | Skylake GT3e | 14nm | 950~1100 | 729.6~844.8 |
Iris Pro Graphics 580 Iris Pro Graphics P580 (with 128MB eDRAM) |
Gen 9 | 72 | Skylake GT4e | 14nm | 1000 | 1152 |
HD Graphics 610 | Gen 9+ | 12 | Kaby Lake GT1 | 14nm | 900~1050 | 172.8~201.6 |
HD Graphics 615 | Gen 9+ | 24 | Kaby Lake-Y GT2
|
14nm | 850~1050 | 326.4~403.2 |
HD Graphics 620 | Gen 9+ | 24 | Kaby Lake-U GT2 | 14nm | 1000~1150 | 384~441.6 |
HD Graphics 630 HD Graphics P630 |
Gen 9+ | 24 | Kaby Lake GT2 | 14nm | 950~1150 | 364.8~441.6 |
Iris Plus Graphics 640 Iris Plus Graphics 650 (with 64MB eDRAM) |
Gen 9+ | 48 | Kaby Lake GT3e | 14nm | 950~1150 | 729.6~883.2 |
GPU Name | Card | Core | Clock(MHz) | DDR | Bus(bit) | GFlops |
GK110 | GTX Titan | 2688 | 837~876 | GDDR5 | 384 | 4500 |
GK104 | GTX 680 | 1536 | 1006~1110 | GDDR5 | 256 | 3250 |
GTX 670 | 1344 | 915~1084 | GDDR5 | 256 | 2760 | |
GTX 660Ti | 1344 | 915~1058 | GDDR5 | 192 | 2460 | |
GK106 | GTX 660 | 960 | 980~1032 | GDDR5 | 192 | 1881.6 |
GTX 650Ti Boost | 768 | 980~1032 | GDDR5 | 192 | 1505.2 | |
GTX 650Ti | 768 | 928 | GDDR5 | 128 | 1425.4 | |
GK107 | GTX 650 | 384 | 1058 | GDDR5 | 128 | 812.5 |
GT 640 | 384 | 900 | DDR3 | 128 | 691.2 |
GPU Name | Card | Core | Clock(MHz) | DDR | Bus(bit) | GFlops |
Tahiti XT2 | HD 7970 GHZ | 2048 | 1000~1050 | GDDR5 | 384 | 4096~4300 |
Tahiti XT | HD 7970 | 2048 | 925 | GDDR5 | 384 | 3788.8 |
Tahiti Pro | HD 7950 Boost | 1792 | 850~925 | GDDR5 | 384 | 3046.4~3315.2 |
HD 7950 | 1792 | 800 | GDDR5 | 384 | 2867.2 | |
Tahiti LE | HD 7870 XT | 1536 | 925~975 | GDDR5 | 256 | 2841.6~2995.2 |
Pitcairn XT | HD 7870 GHZ | 1280 | 1000 | GDDR5 | 256 | 2560 |
Pitcairn Pro | HD 7850 | 1024 | 860 | GDDR5 | 256 | 1761.28 |
Bonaire XT | HD 7790 | 896 | 1000 | GDDR5 | 128 | 1792 |
Cape Verde XT | HD 7770 GHZ ver.2 | 640 | 1100 | GDDR5 | 128 | 1408 |
HD 7770 GHZ | 640 | 1000 | GDDR5 | 128 | 1280 | |
Cape Verde Pro | HD 7750 ver.2 | 512 | 900 | GDDR5 | 128 | 921.6 |
HD 7750 | 512 | 800 | GDDR5 | 128 | 819.2 |
More AMD Radeon Information in wiki
Snapdragon 820
三星14nm FinFET工艺
2.2Ghz四核Kryo构架(自主黑科技)
GPU为Adreno 530@510~624 MHzMHz,GPU浮点性能407.4~498.5 GFlops
内存带宽28.8GB/s(LPDDR4双通道)
Exynos 8890
三星14nm FinFET工艺
1.6Ghz四核 ARM A-53加上2.3~2.6Ghz四核 Exynos M1核心构架(自主黑科技)
GPU为Mali-T880MP12(12核心)@650 MHz,GPU浮点性能265.2 GFlops
内存带宽28.7GB/s(LPDDR4双通道)
Helio X20
台积电 20nm工艺
十核心2x Cortex-A72 @ 2.5GHz加上4x Cortex-A53 @ 2.0GHz加上4x Cortex-A53 @ 1.4GHz
GPU为Mali-T880 MP4@780 MHz,GPU浮点性能106 GFlops
内存带宽14.9GB/s(LPDDR3双通道)
Kirin 950
台积电16nm工艺
八核心 4X ARM Cortex-A72 @ 2.3GHz加上4X ARM Cortex-A53@1.8GHz
GPU为Mali-T880 MP4@900 MHz,GPU浮点性能122.4 GFlops
内存带宽25.6GB/s(LPDDR4双通道)
继续阅读常见GPU的浮点性能