
BUG=None R=laforge@chromium.org,binji@chromium.org,sbc@chromium.org,rockot@chromium.org Review-Url: https://codereview.chromium.org/2875303003 Cr-Commit-Position: refs/heads/master@{#475662}
456 lines
25 KiB
HTML
456 lines
25 KiB
HTML
{{+bindTo:partials.standard_nacl_article}}
|
||
|
||
<b><font color="#cc0000">
|
||
NOTE:
|
||
Deprecation of the technologies described here has been announced
|
||
for platforms other than ChromeOS.<br/>
|
||
Please visit our
|
||
<a href="/native-client/migration">migration guide</a>
|
||
for details.
|
||
</font></b>
|
||
<hr/><section id="pnacl-c-c-language-support">
|
||
<h1 id="pnacl-c-c-language-support">PNaCl C/C++ Language Support</h1>
|
||
<div class="contents local" id="contents" style="display: none">
|
||
<ul class="small-gap">
|
||
<li><p class="first"><a class="reference internal" href="#source-language-support" id="id3">Source language support</a></p>
|
||
<ul class="small-gap">
|
||
<li><a class="reference internal" href="#versions" id="id4">Versions</a></li>
|
||
<li><a class="reference internal" href="#preprocessor-definitions" id="id5">Preprocessor definitions</a></li>
|
||
</ul>
|
||
</li>
|
||
<li><p class="first"><a class="reference internal" href="#memory-model-and-atomics" id="id6">Memory Model and Atomics</a></p>
|
||
<ul class="small-gap">
|
||
<li><a class="reference internal" href="#memory-model-for-concurrent-operations" id="id7">Memory Model for Concurrent Operations</a></li>
|
||
<li><a class="reference internal" href="#atomic-memory-ordering-constraints" id="id8">Atomic Memory Ordering Constraints</a></li>
|
||
<li><a class="reference internal" href="#volatile-memory-accesses" id="id9">Volatile Memory Accesses</a></li>
|
||
</ul>
|
||
</li>
|
||
<li><a class="reference internal" href="#threading" id="id10">Threading</a></li>
|
||
<li><a class="reference internal" href="#setjmp-and-longjmp" id="id11"><code>setjmp</code> and <code>longjmp</code></a></li>
|
||
<li><a class="reference internal" href="#c-exception-handling" id="id12">C++ Exception Handling</a></li>
|
||
<li><a class="reference internal" href="#inline-assembly" id="id13">Inline Assembly</a></li>
|
||
<li><p class="first"><a class="reference internal" href="#portable-simd-vectors" id="id14">Portable SIMD Vectors</a></p>
|
||
<ul class="small-gap">
|
||
<li><a class="reference internal" href="#hand-coding-vector-extensions" id="id15">Hand-Coding Vector Extensions</a></li>
|
||
<li><a class="reference internal" href="#auto-vectorization" id="id16">Auto-Vectorization</a></li>
|
||
</ul>
|
||
</li>
|
||
<li><a class="reference internal" href="#undefined-behavior" id="id17">Undefined Behavior</a></li>
|
||
<li><a class="reference internal" href="#floating-point" id="id18">Floating-Point</a></li>
|
||
<li><a class="reference internal" href="#computed-goto" id="id19">Computed <code>goto</code></a></li>
|
||
<li><p class="first"><a class="reference internal" href="#future-directions" id="id20">Future Directions</a></p>
|
||
<ul class="small-gap">
|
||
<li><a class="reference internal" href="#inter-process-communication" id="id21">Inter-Process Communication</a></li>
|
||
<li><a class="reference internal" href="#posix-style-signal-handling" id="id22">POSIX-style Signal Handling</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div><h2 id="source-language-support">Source language support</h2>
|
||
<p>The currently supported languages are C and C++. The PNaCl toolchain is
|
||
based on recent Clang, which fully supports C++11 and most of C11. A
|
||
detailed status of the language support is available <a class="reference external" href="http://clang.llvm.org/cxx_status.html">here</a>.</p>
|
||
<p>For information on using languages other than C/C++, see the <a class="reference internal" href="/native-client/faq.html#other-languages"><em>FAQ
|
||
section on other languages</em></a>.</p>
|
||
<p>As for the standard libraries, the PNaCl toolchain is currently based on
|
||
<code>libc++</code>, and the <code>newlib</code> standard C library. <code>libstdc++</code> is also
|
||
supported but its use is discouraged; see <a class="reference internal" href="/native-client/devguide/devcycle/building.html#building-cpp-libraries"><em>C++ standard libraries</em></a>
|
||
for more details.</p>
|
||
<h3 id="versions">Versions</h3>
|
||
<p>Version information can be obtained:</p>
|
||
<ul class="small-gap">
|
||
<li>Clang/LLVM: run <code>pnacl-clang -v</code>.</li>
|
||
<li><code>newlib</code>: use the <code>_NEWLIB_VERSION</code> macro.</li>
|
||
<li><code>libc++</code>: use the <code>_LIBCPP_VERSION</code> macro.</li>
|
||
<li><code>libstdc++</code>: use the <code>_GLIBCXX_VERSION</code> macro.</li>
|
||
</ul>
|
||
<h3 id="preprocessor-definitions">Preprocessor definitions</h3>
|
||
<p>When compiling C/C++ code, the PNaCl toolchain defines the <code>__pnacl__</code>
|
||
macro. In addition, <code>__native_client__</code> is defined for compatibility
|
||
with other NaCl toolchains.</p>
|
||
<h2 id="memory-model-and-atomics"><span id="id1"></span>Memory Model and Atomics</h2>
|
||
<h3 id="memory-model-for-concurrent-operations">Memory Model for Concurrent Operations</h3>
|
||
<p>The memory model offered by PNaCl relies on the same coding guidelines
|
||
as the C11/C++11 one: concurrent accesses must always occur through
|
||
atomic primitives (offered by <a class="reference internal" href="/native-client/reference/pnacl-bitcode-abi.html#bitcode-atomicintrinsics"><em>atomic intrinsics</em></a>), and these accesses must always
|
||
occur with the same size for the same memory location. Visibility of
|
||
stores is provided on a happens-before basis that relates memory
|
||
locations to each other as the C11/C++11 standards do.</p>
|
||
<p>Non-atomic memory accesses may be reordered, separated, elided or fused
|
||
according to C and C++’s memory model before the pexe is created as well
|
||
as after its creation. Accessing atomic memory location through
|
||
non-atomic primitives is <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior"><em>Undefined Behavior</em></a>.</p>
|
||
<p>As in C11/C++11 some atomic accesses may be implemented with locks on
|
||
certain platforms. The <code>ATOMIC_*_LOCK_FREE</code> macros will always be
|
||
<code>1</code>, signifying that all types are sometimes lock-free. The
|
||
<code>is_lock_free</code> methods and <code>atomic_is_lock_free</code> will return the
|
||
current platform’s implementation at translation time. These macros,
|
||
methods and functions are in the C11 header <code><stdatomic.h></code> and the
|
||
C++11 header <code><atomic></code>.</p>
|
||
<p>The PNaCl toolchain supports concurrent memory accesses through legacy
|
||
GCC-style <code>__sync_*</code> builtins, as well as through C11/C++11 atomic
|
||
primitives and the underlying <a class="reference external" href="http://gcc.gnu.org/wiki/Atomic/GCCMM">GCCMM</a> <code>__atomic_*</code>
|
||
primitives. <code>volatile</code> memory accesses can also be used, though these
|
||
are discouraged. See <a class="reference internal" href="#volatile-memory-accesses">Volatile Memory Accesses</a>.</p>
|
||
<p>PNaCl supports concurrency and parallelism with some restrictions:</p>
|
||
<ul class="small-gap">
|
||
<li>Threading is explicitly supported and has no restrictions over what
|
||
prevalent implementations offer. See <a class="reference internal" href="#threading">Threading</a>.</li>
|
||
<li><code>volatile</code> and atomic operations are address-free (operations on the
|
||
same memory location via two different addresses work atomically), as
|
||
intended by the C11/C++11 standards. This is critical in supporting
|
||
synchronous “external modifications” such as mapping underlying memory
|
||
at multiple locations.</li>
|
||
<li>Inter-process communication through shared memory is currently not
|
||
supported. See <a class="reference internal" href="#future-directions">Future Directions</a>.</li>
|
||
<li>Signal handling isn’t supported, PNaCl therefore promotes all
|
||
primitives to cross-thread (instead of single-thread). This may change
|
||
at a later date. Note that using atomic operations which aren’t
|
||
lock-free may lead to deadlocks when handling asynchronous
|
||
signals. See <a class="reference internal" href="#future-directions">Future Directions</a>.</li>
|
||
<li>Direct interaction with device memory isn’t supported, and there is no
|
||
intent to support it. The embedding sandbox’s runtime can offer APIs
|
||
to indirectly access devices.</li>
|
||
</ul>
|
||
<p>Setting up the above mechanisms requires assistance from the embedding
|
||
sandbox’s runtime (e.g. NaCl’s Pepper APIs), but using them once setup
|
||
can be done through regular C/C++ code.</p>
|
||
<h3 id="atomic-memory-ordering-constraints">Atomic Memory Ordering Constraints</h3>
|
||
<p>Atomics follow the same ordering constraints as in regular C11/C++11,
|
||
but all accesses are promoted to sequential consistency (the strongest
|
||
memory ordering) at pexe creation time. We plan to support more of the
|
||
C11/C++11 memory orderings in the future.</p>
|
||
<p>Some additional restrictions, following the C11/C++11 standards:</p>
|
||
<ul class="small-gap">
|
||
<li>Atomic accesses must at least be naturally aligned.</li>
|
||
<li>Some accesses may not actually be atomic on certain platforms,
|
||
requiring an implementation that uses global locks.</li>
|
||
<li>An atomic memory location must always be accessed with atomic
|
||
primitives, and these primitives must always be of the same bit size
|
||
for that location.</li>
|
||
<li>Not all memory orderings are valid for all atomic operations.</li>
|
||
</ul>
|
||
<h3 id="volatile-memory-accesses">Volatile Memory Accesses</h3>
|
||
<p>The C11/C++11 standards mandate that <code>volatile</code> accesses execute in
|
||
program order (but are not fences, so other memory operations can
|
||
reorder around them), are not necessarily atomic, and can’t be
|
||
elided. They can be separated into smaller width accesses.</p>
|
||
<p>Before any optimizations occur, the PNaCl toolchain transforms
|
||
<code>volatile</code> loads and stores into sequentially consistent <code>volatile</code>
|
||
atomic loads and stores, and applies regular compiler optimizations
|
||
along the above guidelines. This orders <code>volatiles</code> according to the
|
||
atomic rules, and means that fences (including <code>__sync_synchronize</code>)
|
||
act in a better-defined manner. Regular memory accesses still do not
|
||
have ordering guarantees with <code>volatile</code> and atomic accesses, though
|
||
the internal representation of <code>__sync_synchronize</code> attempts to
|
||
prevent reordering of memory accesses to objects which may escape.</p>
|
||
<p>Relaxed ordering could be used instead, but for the first release it is
|
||
more conservative to apply sequential consistency. Future releases may
|
||
change what happens at compile-time, but already-released pexes will
|
||
continue using sequential consistency.</p>
|
||
<p>The PNaCl toolchain also requires that <code>volatile</code> accesses be at least
|
||
naturally aligned, and tries to guarantee this alignment.</p>
|
||
<p>The above guarantees ease the support of legacy (i.e. non-C11/C++11)
|
||
code, and combined with builtin fences these programs can do meaningful
|
||
cross-thread communication without changing code. They also better
|
||
reflect the original code’s intent and guarantee better portability.</p>
|
||
<h2 id="threading"><span id="language-support-threading"></span>Threading</h2>
|
||
<p>Threading is explicitly supported through C11/C++11’s threading
|
||
libraries as well as POSIX threads.</p>
|
||
<p>Communication between threads should use atomic primitives as described
|
||
in <a class="reference internal" href="#id1">Memory Model and Atomics</a>.</p>
|
||
<h2 id="setjmp-and-longjmp"><code>setjmp</code> and <code>longjmp</code></h2>
|
||
<p>PNaCl and NaCl support <code>setjmp</code> and <code>longjmp</code> without any
|
||
restrictions beyond C’s.</p>
|
||
<h2 id="c-exception-handling"><span id="exception-handling"></span>C++ Exception Handling</h2>
|
||
<p>PNaCl currently supports C++ exception handling through <code>setjmp()</code> and
|
||
<code>longjmp()</code>, which can be enabled with the <code>--pnacl-exceptions=sjlj</code> linker
|
||
flag (set with <code>LDFLAGS</code> when using Make). Exceptions are disabled by default
|
||
so that faster and smaller code is generated, and <code>throw</code> statements are
|
||
replaced with calls to <code>abort()</code>. The usual <code>-fno-exceptions</code> flag is also
|
||
supported, though the default is <code>-fexceptions</code>. PNaCl will support full
|
||
zero-cost exception handling in the future.</p>
|
||
<aside>
|
||
When using <a class="reference external" href="https://chromium.googlesource.com/webports">webports</a> or other prebuilt static libraries, you don’t
|
||
need to recompile because the exception handling support is
|
||
implemented at link time (when all the static libraries are put
|
||
together with your application).
|
||
</aside>
|
||
<p>NaCl supports full zero-cost C++ exception handling.</p>
|
||
<h2 id="inline-assembly">Inline Assembly</h2>
|
||
<p>Inline assembly isn’t supported by PNaCl because it isn’t portable. The
|
||
one current exception is the common compiler barrier idiom
|
||
<code>asm("":::"memory")</code>, which gets transformed to a sequentially
|
||
consistent memory barrier (equivalent to <code>__sync_synchronize()</code>). In
|
||
PNaCl this barrier is only guaranteed to order <code>volatile</code> and atomic
|
||
memory accesses, though in practice the implementation attempts to also
|
||
prevent reordering of memory accesses to objects which may escape.</p>
|
||
<p>PNaCl supports <a class="reference internal" href="#portable-simd-vectors"><em>Portable SIMD Vectors</em></a>,
|
||
which are traditionally expressed through target-specific intrinsics or
|
||
inline assembly.</p>
|
||
<p>NaCl supports a fairly wide subset of inline assembly through GCC’s
|
||
inline assembly syntax, with the restriction that the sandboxing model
|
||
for the target architecture has to be respected.</p>
|
||
<h2 id="portable-simd-vectors"><span id="id2"></span>Portable SIMD Vectors</h2>
|
||
<p>SIMD vectors aren’t part of the C/C++ standards and are traditionally
|
||
very hardware-specific. Portable Native Client offers a portable version
|
||
of SIMD vector datatypes and operations which map well to modern
|
||
architectures and offer performance which matches or approaches
|
||
hardware-specific uses.</p>
|
||
<p>SIMD vector support was added to Portable Native Client for version 37 of Chrome
|
||
and more features, including performance enhancements, have been added in
|
||
subsequent releases, see the <a class="reference internal" href="/native-client/sdk/release-notes.html#sdk-release-notes"><em>Release Notes</em></a> for more
|
||
details.</p>
|
||
<h3 id="hand-coding-vector-extensions">Hand-Coding Vector Extensions</h3>
|
||
<p>The initial vector support in Portable Native Client adds <a class="reference external" href="http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors">LLVM vectors</a>
|
||
and <a class="reference external" href="http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html">GCC vectors</a> since these
|
||
are well supported by different hardware platforms and don’t require any
|
||
new compiler intrinsics.</p>
|
||
<p>Vector types can be used through the <code>vector_size</code> attribute:</p>
|
||
<pre class="prettyprint">
|
||
#define VECTOR_BYTES 16
|
||
typedef int v4s __attribute__((vector_size(VECTOR_BYTES)));
|
||
v4s a = {1,2,3,4};
|
||
v4s b = {5,6,7,8};
|
||
v4s c, d, e;
|
||
c = a + b; /* c = {6,8,10,12} */
|
||
d = b >> a; /* d = {2,1,0,0} */
|
||
</pre>
|
||
<p>Vector comparisons are represented as a bitmask as wide as the compared
|
||
elements of all <code>0</code> or all <code>1</code>:</p>
|
||
<pre class="prettyprint">
|
||
typedef int v4s __attribute__((vector_size(16)));
|
||
v4s snip(v4s in) {
|
||
v4s limit = {32,64,128,256};
|
||
v4s mask = in > limit;
|
||
v4s ret = in & mask;
|
||
return ret;
|
||
}
|
||
</pre>
|
||
<p>Vector datatypes are currently expected to be 128-bit wide with one of the
|
||
following element types, and they’re expected to be aligned to the underlying
|
||
element’s bit width (loads and store will otherwise be broken up into scalar
|
||
accesses to prevent faults):</p>
|
||
<table border="1" class="docutils">
|
||
<colgroup>
|
||
</colgroup>
|
||
<thead valign="bottom">
|
||
<tr class="row-odd"><th class="head">Type</th>
|
||
<th class="head">Num Elements</th>
|
||
<th class="head">Vector Bit Width</th>
|
||
<th class="head">Expected Bit Alignment</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody valign="top">
|
||
<tr class="row-even"><td><code>uint8_t</code></td>
|
||
<td>16</td>
|
||
<td>128</td>
|
||
<td>8</td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>int8_t</code></td>
|
||
<td>16</td>
|
||
<td>128</td>
|
||
<td>8</td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>uint16_t</code></td>
|
||
<td>8</td>
|
||
<td>128</td>
|
||
<td>16</td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>int16_t</code></td>
|
||
<td>8</td>
|
||
<td>128</td>
|
||
<td>16</td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>uint32_t</code></td>
|
||
<td>4</td>
|
||
<td>128</td>
|
||
<td>32</td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>int32_t</code></td>
|
||
<td>4</td>
|
||
<td>128</td>
|
||
<td>32</td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>float</code></td>
|
||
<td>4</td>
|
||
<td>128</td>
|
||
<td>32</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>64-bit integers and double-precision floating point will be supported in
|
||
a future release, as will 256-bit and 512-bit vectors.</p>
|
||
<p>Vector element bit width alignment can be stated explicitly (this is assumed by
|
||
PNaCl, but not necessarily by other compilers), and smaller alignments can also
|
||
be specified:</p>
|
||
<pre class="prettyprint">
|
||
typedef int v4s_element __attribute__((vector_size(16), aligned(4)));
|
||
typedef int v4s_unaligned __attribute__((vector_size(16), aligned(1)));
|
||
</pre>
|
||
<p>The following operators are supported on vectors:</p>
|
||
<table border="1" class="docutils">
|
||
<colgroup>
|
||
</colgroup>
|
||
<tbody valign="top">
|
||
<tr class="row-odd"><td>unary <code>+</code>, <code>-</code></td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>++</code>, <code>--</code></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>%</code></td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>&</code>, <code>|</code>, <code>^</code>, <code>~</code></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>>></code>, <code><<</code></td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>!</code>, <code>&&</code>, <code>||</code></td>
|
||
</tr>
|
||
<tr class="row-odd"><td><code>==</code>, <code>!=</code>, <code>></code>, <code><</code>, <code>>=</code>, <code><=</code></td>
|
||
</tr>
|
||
<tr class="row-even"><td><code>=</code></td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
<p>C-style casts can be used to convert one vector type to another without
|
||
modifying the underlying bits. <code>__builtin_convertvector</code> can be used
|
||
to convert from one type to another provided both types have the same
|
||
number of elements, truncating when converting from floating-point to
|
||
integer.</p>
|
||
<pre class="prettyprint">
|
||
typedef unsigned v4u __attribute__((vector_size(16)));
|
||
typedef float v4f __attribute__((vector_size(16)));
|
||
v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
|
||
v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23} */
|
||
v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */
|
||
</pre>
|
||
<p>It is also possible to use array-style indexing into vectors to extract
|
||
individual elements using <code>[]</code>.</p>
|
||
<pre class="prettyprint">
|
||
typedef unsigned v4u __attribute__((vector_size(16)));
|
||
template<typename T>
|
||
void print(const T v) {
|
||
for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
|
||
std::cout << v[i] << ' ';
|
||
std::cout << std::endl;
|
||
}
|
||
</pre>
|
||
<p>Vector shuffles (often called permutation or swizzle) operations are
|
||
supported through <code>__builtin_shufflevector</code>. The builtin has two
|
||
vector arguments of the same element type, followed by a list of
|
||
constant integers that specify the element indices of the first two
|
||
vectors that should be extracted and returned in a new vector. These
|
||
element indices are numbered sequentially starting with the first
|
||
vector, continuing into the second vector. Thus, if <code>vec1</code> is a
|
||
4-element vector, index <code>5</code> would refer to the second element of
|
||
<code>vec2</code>. An index of <code>-1</code> can be used to indicate that the
|
||
corresponding element in the returned vector is a don’t care and can be
|
||
optimized by the backend.</p>
|
||
<p>The result of <code>__builtin_shufflevector</code> is a vector with the same
|
||
element type as <code>vec1</code> / <code>vec2</code> but that has an element count equal
|
||
to the number of indices specified.</p>
|
||
<pre class="prettyprint">
|
||
// identity operation - return 4-element vector v1.
|
||
__builtin_shufflevector(v1, v1, 0, 1, 2, 3)
|
||
|
||
// "Splat" element 0 of v1 into a 4-element result.
|
||
__builtin_shufflevector(v1, v1, 0, 0, 0, 0)
|
||
|
||
// Reverse 4-element vector v1.
|
||
__builtin_shufflevector(v1, v1, 3, 2, 1, 0)
|
||
|
||
// Concatenate every other element of 4-element vectors v1 and v2.
|
||
__builtin_shufflevector(v1, v2, 0, 2, 4, 6)
|
||
|
||
// Concatenate every other element of 8-element vectors v1 and v2.
|
||
__builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14)
|
||
|
||
// Shuffle v1 with some elements being undefined
|
||
__builtin_shufflevector(v1, v1, 3, -1, 1, -1)
|
||
</pre>
|
||
<p>One common use of <code>__builtin_shufflevector</code> is to perform
|
||
vector-scalar operations:</p>
|
||
<pre class="prettyprint">
|
||
typedef int v4s __attribute__((vector_size(16)));
|
||
v4s shift_right_by(v4s shift_me, int shift_amount) {
|
||
v4s tmp = {shift_amount};
|
||
return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0);
|
||
}
|
||
</pre>
|
||
<h3 id="auto-vectorization">Auto-Vectorization</h3>
|
||
<p>Auto-vectorization is currently not enabled for Portable Native Client,
|
||
but will be in a future release.</p>
|
||
<h2 id="undefined-behavior">Undefined Behavior</h2>
|
||
<p>The C and C++ languages expose some undefined behavior which is
|
||
discussed in <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior"><em>PNaCl Undefined Behavior</em></a>.</p>
|
||
<h2 id="floating-point"><span id="c-cpp-floating-point"></span>Floating-Point</h2>
|
||
<p>PNaCl exposes 32-bit and 64-bit floating point operations which are
|
||
mostly IEEE-754 compliant. There are a few caveats:</p>
|
||
<ul class="small-gap">
|
||
<li>Some <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior-fp"><em>floating-point behavior is currently left as undefined</em></a>.</li>
|
||
<li>The default rounding mode is round-to-nearest and other rounding modes
|
||
are currently not usable, which isn’t IEEE-754 compliant. PNaCl could
|
||
support switching modes (the 4 modes exposed by C99 <code>FLT_ROUNDS</code>
|
||
macros).</li>
|
||
<li>Signaling <code>NaN</code> never fault.</li>
|
||
<li><p class="first">Fast-math optimizations are currently supported before <em>pexe</em> creation
|
||
time. A <em>pexe</em> loses all fast-math information when it is
|
||
created. Fast-math translation could be enabled at a later date,
|
||
potentially at a perf-function granularity. This wouldn’t affect
|
||
already-existing <em>pexe</em>; it would be an opt-in feature.</p>
|
||
<ul class="small-gap">
|
||
<li>Fused-multiply-add have higher precision and often execute faster;
|
||
PNaCl currently disallows them in the <em>pexe</em> because they aren’t
|
||
supported on all platforms and can’t realistically be
|
||
emulated. PNaCl could (but currently doesn’t) only generate them in
|
||
the backend if fast-math were specified and the hardware supports
|
||
the operation.</li>
|
||
<li>Transcendentals aren’t exposed by PNaCl’s ABI; they are part of the
|
||
math library that is included in the <em>pexe</em>. PNaCl could, but
|
||
currently doesn’t, use hardware support if fast-math were provided
|
||
in the <em>pexe</em>.</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<h2 id="computed-goto">Computed <code>goto</code></h2>
|
||
<p>PNaCl supports computed <code>goto</code>, a non-standard GCC extension to C used
|
||
by some interpreters, by lowering them to <code>switch</code> statements. The
|
||
resulting use of <code>switch</code> might not be as fast as the original
|
||
indirect branches. If you are compiling a program that has a
|
||
compile-time option for using computed <code>goto</code>, it’s possible that the
|
||
program will run faster with the option turned off (e.g., if the program
|
||
does extra work to take advantage of computed <code>goto</code>).</p>
|
||
<p>NaCl supports computed <code>goto</code> without any transformation.</p>
|
||
<h2 id="future-directions">Future Directions</h2>
|
||
<h3 id="inter-process-communication">Inter-Process Communication</h3>
|
||
<p>Inter-process communication through shared memory is currently not
|
||
supported by PNaCl/NaCl. When implemented, it may be limited to
|
||
operations which are lock-free on the current platform (<code>is_lock_free</code>
|
||
methods). It will rely on the address-free properly discussed in <a class="reference internal" href="#memory-model-for-concurrent-operations">Memory
|
||
Model for Concurrent Operations</a>.</p>
|
||
<h3 id="posix-style-signal-handling">POSIX-style Signal Handling</h3>
|
||
<p>POSIX-style signal handling really consists of two different features:</p>
|
||
<ul class="small-gap">
|
||
<li><p class="first"><strong>Hardware exception handling</strong> (synchronous signals): The ability
|
||
to catch hardware exceptions (such as memory access faults and
|
||
division by zero) using a signal handler.</p>
|
||
<p>PNaCl currently doesn’t support hardware exception handling.</p>
|
||
<p>NaCl supports hardware exception handling via the
|
||
<code><nacl/nacl_exception.h></code> interface.</p>
|
||
</li>
|
||
<li><p class="first"><strong>Asynchronous interruption of threads</strong> (asynchronous signals): The
|
||
ability to asynchronously interrupt the execution of a thread,
|
||
forcing the thread to run a signal handler.</p>
|
||
<p>A similar feature is <strong>thread suspension</strong>: The ability to
|
||
asynchronously suspend and resume a thread and inspect or modify its
|
||
execution state (such as register state).</p>
|
||
<p>Neither PNaCl nor NaCl currently support asynchronous interruption
|
||
or suspension of threads.</p>
|
||
</li>
|
||
</ul>
|
||
<p>If PNaCl were to support either of these, the interaction of
|
||
<code>volatile</code> and atomics with same-thread signal handling would need
|
||
to be carefully detailed.</p>
|
||
</section>
|
||
|
||
{{/partials.standard_nacl_article}}
|