Propose two enhancements:
- Shared heap created from preallocated memory buffer: The user can create a shared heap from a pre-allocated buffer and see that memory region as one large chunk; there's no need to dynamically manage it(malloc/free). The user needs to make sure the native address and size of that memory region are valid.
- Introduce shared heap chain: The user can create a shared heap chain, from the wasm app point of view, it's still a continuous memory region in wasm app's point of view while in the native it can consist of multiple shared heaps (each of which is a continuous memory region). For example, one 500MB shared heap 1 and one 500 MB shared heap 2 form a chain, in Wasm's point of view, it's one 1GB shared heap.
After these enhancements, the data sharing between wasm apps, and between hosts can be more efficient and flexible. Admittedly shared heap management can be more complex for users, but it's similar to the zero-overhead principle. No overhead will be imposed for the users who don't use the shared heap enhancement or don't use the shared heap at all.
* Fix vector growth check and typos in core (#9)
* Fix resource cleanup in memory and running modes tests (#10)
* Add end of file empty line in wasm_running_modes_test.cc
```
CMake Error at CMakeLists.txt:4 (cmake_minimum_required):
Compatibility with CMake < 3.5 has been removed from CMake.
Update the VERSION argument <min> value. Or, use the <min>...<max> syntax
to tell CMake that the project requires at least <min> but has been updated
to work with policies introduced by <max> or earlier.
Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.
```
- Clear some compile warnings
- Fix some typos
- Fix llvm LICENSE link error
- Remove unused aot file and binarydump bin
- Add checks when loading AOT exports
There's probably a number of other places where the bh_leb_read could be used (e.g. aot loader)
but I'm making the change as small as possible. Further refactoring can be done later.
Consider the following wasm module:
```wast
(module
(func (export "foo")
i32.const 0x104
i32.const 0x12345678
i32.store
)
(memory 1 1)
)
```
While the address (0x104) is perfectly aligned for i32.store,
as our aot compiler uses 1-byte alignment for load/store LLVM
IR instructions, it often produces inefficient machine code,
especially for alignment-sensitive targets.
For example, the above "foo" function is compiled into the
following xtensa machine code.
```
0000002c <aot_func_internal#0>:
2c: 004136 entry a1, 32
2f: 07a182 movi a8, 0x107
32: 828a add.n a8, a2, a8
34: 291c movi.n a9, 18
36: 004892 s8i a9, a8, 0
39: 06a182 movi a8, 0x106
3c: 828a add.n a8, a2, a8
3e: ffff91 l32r a9, 3c <aot_func_internal#0+0x10> (ff91828a <aot_func_internal#0+0xff91825e>)
3e: R_XTENSA_SLOT0_OP .literal+0x8
41: 004892 s8i a9, a8, 0
44: 05a182 movi a8, 0x105
47: 828a add.n a8, a2, a8
49: ffff91 l32r a9, 48 <aot_func_internal#0+0x1c> (ffff9182 <aot_func_internal#0+0xffff9156>)
49: R_XTENSA_SLOT0_OP .literal+0xc
4c: 41a890 srli a10, a9, 8
4f: 0048a2 s8i a10, a8, 0
52: 04a182 movi a8, 0x104
55: 828a add.n a8, a2, a8
57: 004892 s8i a9, a8, 0
5a: f01d retw.n
```
Note that the each four bytes are stored separately using
one-byte-store instruction, s8i.
This commit tries to use larger alignments for load/store LLVM IR
instructions when possible. with this commit, the above example is
compiled into the following machine code, which seems more reasonable.
```
0000002c <aot_func_internal#0>:
2c: 004136 entry a1, 32
2f: ffff81 l32r a8, 2c <aot_func_internal#0> (81004136 <aot_func_internal#0+0x8100410a>)
2f: R_XTENSA_SLOT0_OP .literal+0x8
32: 416282 s32i a8, a2, 0x104
35: f01d retw.n
```
Note: this doesn't work well for --xip because aot_load_const_from_table()
hides the constness of the value. Maybe we need our own mechanism to
propagate the constness and the value.
Add simple infrastructure to add more unit tests in the future. At the moment tests
are only executed on Linux, but can be extended to other platforms if needed.
Use https://github.com/google/googletest/ as a framework.