* static PGO compatible with llvm18 and add CI job to test static PGO on coremark benchmark
* update comments and warning info, bitmaps section in llvm profdata shouldn't be used in PGO
LLVM 19 and later started to use srodata ("small read only data")
sections for RISCV. cf. https://github.com/llvm/llvm-project/pull/82214
this commit makes our aot emitter/loader deal with those sections.
an alternative would be to disable small data sections completely by
setting the "SmallDataLimit" module attribute to zero. however, i feel
this commit is more straightforward and consisitent as we are already
dealing with sdata sections.
while ideally a user should not need to care this kind of
optimization details, in reality i guess it's sometimes useful.
both of clang and GCC expose a similar option. (-fno-jump-tables)
for x86-64, llvm 17 and later sometimes uses "l" prefix
for data sections.
cf. 43249378da
because our aot file emitter/loader doesn't support such
sections, it ends up with load-time errors solving symbols like ".lrodata".
this commit fixes it by avoid placing data in the large data sections.
references:
https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU1feb00a28c
LLVM 18 and later, instcombine perfoms only one iteration.
it performs extra "verify fixpoint" operation when instcombine
is specified in certain ways, including how we do so here.
a problem is that the verification raises a fatal error when it
finds we didn't reach a fixpoint:
LLVM ERROR: Instruction Combining did not reach a fixpoint
after 1 iterations
while it should be rare, it's quite normal not to reach a fixpoint.
this commit fixes the issue by simply disabing the verification.
cf. 41895843b5
* Fix errors on the "i386-windows-msvc" platform
* Refactor symbol name handling for AOT COFF32 binary format
* Fix preprocessor directive placement for Windows compatibility in aot_reloc_x86_32.c
---------
Co-authored-by: liang.he@intel.com <liang.he@intel.com>
* Remove indirect-load for constants on Xtensa Target to improve performance
* Remove const intrinsics flags for xtensa instead of adding too much #ifdef
* Add AOT_INTRINSIC_FLAG_F32_CONST for xtensa frontend, because espressif xtensa llvm backend does not support float-point immediate for now
---------
Co-authored-by: zhanheng1 <Zhanheng.Qin@sony.com>
if using a debug building of wamrc to run spec test. there will be:
core/iwasm/compilation/aot_emit_aot_file.c:1794:13: runtime error: null pointer passed as argument 2, which is declared to never be null
By default, the project() CMake command defaults to C and C++. [1]
Therefore, CMake might perform tests for both C and C++ compilers as
part of the configuration phase.
However, this has the consequence of the configuration phase to fail if
the system does not have a C++ toolchain installed, even if C++ is not
really used by the top-level project under the default settings.
Some configurations might still require a C++ toolchain, so
enable_language is selectively called under such circumstances.
[1]: https://cmake.org/cmake/help/latest/command/project.html
- Clear some compile warnings
- Fix some typos
- Fix llvm LICENSE link error
- Remove unused aot file and binarydump bin
- Add checks when loading AOT exports
Add table64 extension(in Memory64 proposal) support in classic-interp
and AOT running modes, currently still use uint32 to represent table's
initial and maximum size to keep AOT ABI unchanged.
- Implement TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP
and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc.).
- Implement FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds
code at the beginning and at the end of each function for allocating / deallocating stack frame,
whereas in per-call mode the frame is allocated before each call. The exception is call to
the imported function, where frame-per-function mode also allocates the stack before the
`call` instruction (as it can't instrument the imported function).
At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC and perf
profiling are disabled and `values` call stack feature is not requested. In all the other cases
STANDARD + FRAME_PER_CALL is used.
STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not
implemented but possible, and might be enabled in the future.
ps. https://github.com/bytecodealliance/wasm-micro-runtime/issues/3758
In the AOT compiler, allow the user to control stack boundary check when the boundary
check is enabled (e.g. `wamrc --bounds-checks=1`). Now the code logic is:
1. When `--stack-bounds-checks` is not set, it will be the same value as `--bounds-checks`.
2. When `--stack-bounds-checks` is set, it will be the option value no matter what the
status of `--bounds-checks` is.
Make wamrc normalize "arm64" to "aarch64v8". Previously the only way to
make the "arm64" target was to not specify a target on 64 bit arm-based
mac builds. Now arm64 and aarch64v8 are treated as the same.
Make aot_loader accept "aarch64v8" on arm-based apple (as well as
accepting legacy "arm64" based aot targets).
This also removes __APPLE__ and __MACH__ from the block that defaults
size_level to 1 since it doesn't seem to be supported for aarch64:
`LLVM ERROR: Only small, tiny and large code models are allowed on AArch64`
Implement multi-memory for classic-interpreter. Support core spec (and bulk memory) opcodes now,
and will support atomic opcodes, and add multi-memory export APIs in the future.
PS: Multi-memory spec test patched a lot for linking test to adapt for multi-module implementation.
For JIT, we naturally use mach-o on macOS, where the section name
we currently use is not valid and ends up with the errors like:
```
LLVM ERROR: Global variable '__orc_lcl.aot_stack_sizes.0' has an invalid section specifier '.aot_stack_sizes': mach-o section specifier requires a segment and section separated by a comma.
```
Because the dedicated section is not necessary for JIT,
this commit simply stops using it.
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/3730
The table index in the call_indirect/return_call_indirect opcode should be
one byte 0x00 when ref-types/GC isn't enabled, and should be treated as
leb u32 when ref-types/GC is enabled.
And make aot compiler bail out if ref-types/GC is disabled by command line
argument while ref-types instructions are used.
If the value of a float constant is an NaN, the aot compiler creates an alloca,
stores the converted i32 const into it and then loads f32 from it again, which
may introduce a relocation in the AOT file and is not allowed for XIP mode.