I'm not sure we want to use C99 %tu here.
While C99 %zu is more widely used in WAMR, %tu is rare (if any)
and I'm not sure if it's ubiquitously implemented in platforms
we support.
- Implement TINY / STANDARD frame modes - tiny mode is only able to keep track on the IP
and func idx, STANDARD mode provides more capabilities (parameters, stack pointer etc.).
- Implement FRAME_PER_FUNCTION / FRAME_PER_CALL modes - frame per function adds
code at the beginning and at the end of each function for allocating / deallocating stack frame,
whereas in per-call mode the frame is allocated before each call. The exception is call to
the imported function, where frame-per-function mode also allocates the stack before the
`call` instruction (as it can't instrument the imported function).
At the moment TINY + FRAME_PER_FUNCTION is automatically enabled in case GC and perf
profiling are disabled and `values` call stack feature is not requested. In all the other cases
STANDARD + FRAME_PER_CALL is used.
STANDARD + FRAME_PER_FUNCTION and TINY + FRAME_PER_CALL are currently not
implemented but possible, and might be enabled in the future.
ps. https://github.com/bytecodealliance/wasm-micro-runtime/issues/3758
- Only retry on EAGAIN, ENOMEM or EINTR.
- On EINTR, don't count it against the retry budget, just keep retrying.
EINTR can happen in bursts.
- Log the errno on failure, and don't conditionalize that logging on
BH_ENABLE_TRACE_MMAP. In other parts of the code, error logging is not
conditional on that define, while turning on that tracing define makes
things overly verbose.
Mac on aarch64 uses posix_memmap.c os_mmap which doesn't do anything with
the flag MMAP_MAP_32BIT for that build so this condition ends up asserting unless
the mapping ends up in the first 4 gigs worth of addressable space.
Thsi PR changes to call os_mmap with MMAP_MAP_32BIT flag only when the target
is x86-64 or riscv64, and the macro __APPLE__ isn't enabled. The behavior is similar
to what the posix os_mmap does.
In the AOT compiler, allow the user to control stack boundary check when the boundary
check is enabled (e.g. `wamrc --bounds-checks=1`). Now the code logic is:
1. When `--stack-bounds-checks` is not set, it will be the same value as `--bounds-checks`.
2. When `--stack-bounds-checks` is set, it will be the option value no matter what the
status of `--bounds-checks` is.
Fix the compilation error of this CI:
https://github.com/bytecodealliance/wasm-micro-runtime/actions/runs/10575515238
```
/__w/wasm-micro-runtime/wasm-micro-runtime/bloaty/third_party/abseil-cpp/absl/debugging/failure_signal_handler.cc:139:32: error: no matching function for call to 'max(long int, int)'
139 | size_t stack_size = (std::max(SIGSTKSZ, 65536) + page_mask) & ~page_mask;
| ~~~~~~~~^~~~~~~~~~~~~~~~~
```
Make wamrc normalize "arm64" to "aarch64v8". Previously the only way to
make the "arm64" target was to not specify a target on 64 bit arm-based
mac builds. Now arm64 and aarch64v8 are treated as the same.
Make aot_loader accept "aarch64v8" on arm-based apple (as well as
accepting legacy "arm64" based aot targets).
This also removes __APPLE__ and __MACH__ from the block that defaults
size_level to 1 since it doesn't seem to be supported for aarch64:
`LLVM ERROR: Only small, tiny and large code models are allowed on AArch64`
Enable merged os_mmap for aot data sections first, and try enabling merged
os_mmap for them and aot text except on platform nuttx and esp-idf.
This fixes the issue that aarch64 AOT module fails to load on android:
https://github.com/bytecodealliance/wasm-micro-runtime/issues/2274
And also refine os_mmap related code.
Implement multi-memory for classic-interpreter. Support core spec (and bulk memory) opcodes now,
and will support atomic opcodes, and add multi-memory export APIs in the future.
PS: Multi-memory spec test patched a lot for linking test to adapt for multi-module implementation.
When AOT isn't enabled and the input is a wasm file, wasm_runtime_load doesn't
report error. Same when interpreter isn't enabled and the input is AOT file.
This PR makes wasm_runtime_load report error "magic header not detected" for
such situations.
For JIT, we naturally use mach-o on macOS, where the section name
we currently use is not valid and ends up with the errors like:
```
LLVM ERROR: Global variable '__orc_lcl.aot_stack_sizes.0' has an invalid section specifier '.aot_stack_sizes': mach-o section specifier requires a segment and section separated by a comma.
```
Because the dedicated section is not necessary for JIT,
this commit simply stops using it.
Fixes: https://github.com/bytecodealliance/wasm-micro-runtime/issues/3730
The table index in the call_indirect/return_call_indirect opcode should be
one byte 0x00 when ref-types/GC isn't enabled, and should be treated as
leb u32 when ref-types/GC is enabled.
And make aot compiler bail out if ref-types/GC is disabled by command line
argument while ref-types instructions are used.
If the value of a float constant is an NaN, the aot compiler creates an alloca,
stores the converted i32 const into it and then loads f32 from it again, which
may introduce a relocation in the AOT file and is not allowed for XIP mode.
Any use of a table index that isn't exactly a null byte (`0x00`) means that
the module makes use of the reference types proposal. This is important
to track because `aot_compiler.c` will blindly assume that all table indices
are a single byte long otherwise.
This fixes a crash in WAMR for modules that contain multi-byte encodings
of table indices in `call_indirect` but make no other use of reference types
features.
Compilation warnings were reported on mac:
```
core/shared/mem-alloc/ems/ems_gc.c:454:22: warning: passing arguments to 'wasm_runtime_gc_prepare' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
gct_vm_gc_prepare(NULL);
^
core/shared/mem-alloc/ems/ems_gc.c:466:23: warning: passing arguments to 'wasm_runtime_gc_finalize' without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
gct_vm_gc_finished(NULL);
^
2 warnings generated.
```
As reported in #3500, when debug interpreter is enabled, the classic interpreter
performs a lock operation to read `exec_env->current_status->signal_flag` and
do further handling before fetching next opcode, which makes the interpreter
run slower.
This PR atomic loads the `exec_env->current_status->signal_flag` without mutex
lock when 32-bit atomic load is supported, and only adding lock for further
handling when the signal_flag is WAMR_SIG_SINGSTEP, which improves the
performance.
There's probably a number of other places where the bh_leb_read could be used (e.g. aot loader)
but I'm making the change as small as possible. Further refactoring can be done later.
Fix:
```
wamr/core/iwasm/libraries/libc-builtin/libc_builtin_wrapper.c:20:1:
warning: type of 'wasm_runtime_module_realloc' does not match original declaration [-Wlto-type-mismatch]
wamr/core/iwasm/common/wasm_runtime_common.c:3033:1:
note: return value type mismatch
wamr/core/iwasm/common/wasm_runtime_common.c:3033:1:
note: type 'uint64' should match type 'uint32'
wamr/core/iwasm/common/wasm_runtime_common.c:3033:1:
note: 'wasm_runtime_module_realloc' was previously declared here
wamr/core/iwasm/common/wasm_runtime_common.c:3033:1:
note: code may be misoptimized unless '-fno-strict-aliasing' is used
```