278 lines
17 KiB
Plaintext
278 lines
17 KiB
Plaintext
=========================================================================================
|
|
Steps to run a boot image in Fail* using the Bochs simulator backend:
|
|
=========================================================================================
|
|
Follow the Bochs documentation, and start your own "bochsrc" configuration file
|
|
based on the "${PREFIX}/share/doc/bochs/bochsrc-sample.txt" template (or
|
|
"/usr/share/doc/bochs/examples/bochsrc.gz" on Debian systems with Bochs installed).
|
|
1. Add your floppy/cdrom/hdd image in the floppya/ata0-master/ata0-slave
|
|
sections; configure the boot: section appropriately.
|
|
2. Comment out com1 and parport1.
|
|
3. The following Bochs configuration settings (managed in the "bochsrc" file) might
|
|
be helpful, depending on your needs:
|
|
- For "headless" experiments:
|
|
config_interface: textconfig
|
|
display_library: nogui
|
|
- For an X11 GUI:
|
|
config_interface: textconfig
|
|
display_library: x
|
|
- For a wxWidgets GUI (does not play well with Fail*'s "restore" feature):
|
|
config_interface: wx
|
|
display_library: wx
|
|
- Reduce the guest system's RAM to a minimum to reduce Fail*'s memory footprint
|
|
and save/restore overhead, e.g.:
|
|
memory: guest=16, host=16
|
|
- If you want to redirect FailBochs's output to a file using the shell's
|
|
redirection operator '>', make sure "/dev/stdout" is not used as a target
|
|
file for logging. (The Debian "bochsrc" template unfortunately does this
|
|
in two places. It suffices to comment out these entries.)
|
|
- To make Fail* terminate if something unexpected happens in a larger
|
|
campaign, be sure it doesn't "ask" in these cases, e.g.:
|
|
panic: action=fatal
|
|
error: action=fatal
|
|
info: action=ignore
|
|
debug: action=ignore
|
|
pass: action=ignore
|
|
- If you need a quick-and-dirty way to pass data from the guest system to the
|
|
outside world, and you don't want to write an experiment utilizing
|
|
GuestEvents, you can use the "port e9 hack" that prints all outbs to port
|
|
0xe9 to the console:
|
|
port_e9_hack: enabled=1
|
|
- Determinism: (Fail)Bochs is deterministic regarding timer interrupts,
|
|
i.e., two experiment runs after calling simulator.restore() will count
|
|
the same number of instructions between two interrupts. Though, you
|
|
need to be careful when running (Fail)Bochs with a GUI enabled: Typing
|
|
fail-client -q<return>
|
|
on the command line may lead to the GUI window receiving a "return key
|
|
released" event, resulting in a keyboard interrupt for the guest system.
|
|
This can be avoided by starting Bochs with "sleep 1; fail-client -q", by
|
|
suppressing keyboard input (CONFIG_DISABLE_KEYB_INTERRUPTS setting in
|
|
the CMake configuration), or disabling the GUI (see "headless
|
|
experiments" above).
|
|
|
|
=========================================================================================
|
|
Example experiments and code snippets
|
|
=========================================================================================
|
|
|
|
Experiment "hsc-simple":
|
|
**********************************************************************
|
|
A simple standalone experiment (without a separate campaign). To compile this
|
|
experiment, the following steps are required:
|
|
1. Add "hsc-simple" to ccmake's EXPERIMENTS_ACTIVATED.
|
|
2. Enable CONFIG_EVENT_BREAKPOINTS, CONFIG_SR_RESTORE and CONFIG_SR_SAVE.
|
|
3. Build Fail* and Bochs, see "how-to-build.txt" for details.
|
|
4. Enter experiment_targets/hscsimple/, bunzip2 -k *.bz2
|
|
5. Start the Bochs simulator by typing
|
|
$ fail-client -q
|
|
After successfully booting the eCos/hello world example, the console shows
|
|
"[HSC] breakpoint reached, saving", and a hello.state/ subdirectory appears.
|
|
You probably need to adjust the bochsrc's paths to romimage/vgaromimage.
|
|
These by default point to the locations installed by the Debian packages
|
|
"bochsbios" and "vgabios"; for example, you alternatively may use the
|
|
BIOSes supplied in "${FAIL_DIR}/simulators/bochs/bios/".
|
|
6. Compile the experiment's second step: edit
|
|
fail/src/experiments/hsc-simple/experiment.cc, and change the first "#if 1"
|
|
into "#if 0". Make an incremental build, e.g., by running
|
|
"${FAIL_DIR}/scripts/rebuild-bochs.sh -" from your ${BUILD_DIR}.
|
|
7. Back to ../experiment_targets/hscsimple/ (assuming, your are in ${FAIL_DIR}),
|
|
again run
|
|
$ fail-client -q
|
|
After restoring the state, the hello world program's calculation should
|
|
yield a different result.
|
|
|
|
|
|
Experiment "coolchecksum":
|
|
**********************************************************************
|
|
An example for separate campaign/experiment implementations. To compile this
|
|
experiment, the following steps are required:
|
|
1. Run step #1 (and if you're curious how COOL_ECC_NUMINSTR in
|
|
experimentInfo.hpp was figured out, then step #2) of the experiment
|
|
(analogous to what needed to be done in case of the "hsc-simple" experiment,
|
|
see above). The experiment's target guest system can be found under
|
|
../experiment_targets/coolchecksum/.
|
|
(If you want to enable COOL_FAULTSPACE_PRUNING, step #2 is mandatory because
|
|
it generates the instruction/memory access trace needed for pruning.)
|
|
2. Build the campaign server (if it wasn't already built automatically):
|
|
$ make coolchecksum-server
|
|
3. Run the campaign server: bin/coolchecksum-server
|
|
4. In another terminal, run step #3 of the experiment ("fail-client -q").
|
|
|
|
Step #3 of the experiment currently runs 2000 experiment iterations and then
|
|
terminates, because Bochs has some memory leak issues. You need to re-run
|
|
fail-client for the next 2k experiments.
|
|
|
|
The experiments can be significantly sped up by
|
|
a) parallelization (run more FailBochs clients and
|
|
b) a headless (and more optimized) Fail* configuration (see above).
|
|
|
|
|
|
Experiment "MHTestCampaign":
|
|
**********************************************************************
|
|
An example for separate campaign/experiment implementations.
|
|
1. Execute campaign (job server): ${BUILD_DIR}/bin/MHTestCampaign-server
|
|
2. Run the FailBochs instance, in properly defined environment:
|
|
$ fail-client -q
|
|
|
|
Experiment "dciao-kernelstructs":
|
|
**********************************************************************
|
|
This is an example for a database-driven FI campaign, with a campaign server
|
|
instantiating the generic DatabaseCampaign. The general workflow is as follows:
|
|
- Create a new database for this campaign, e.g., "dciao". (See
|
|
how-to-build.txt, "Database backend setup")
|
|
- Record a trace using the generic tracing tool: Build a fail-client with the
|
|
"generic-tracing" experiment, and parametrize it with -Wf,[option] (e.g.,
|
|
"fail-client -Wf,--help"). Be aware that the generic tracing "experiment"
|
|
needs a complete simulator environment along with the ELF binary that
|
|
supplies symbol addresses; for FailBochs this means the usual bochsrc and
|
|
boot image files need to be present.
|
|
FIXME: dciao-kernelstructs has not been committed to
|
|
danceos/devel/experiment_targets/, use a "complete" example here
|
|
- Import the trace to the database using the import-trace tool (enable
|
|
BUILD_IMPORT_TRACE in the CMake configuration to get this built). The
|
|
--variant/--benchmark parameters are only significant if you intend to
|
|
evaluate multiple protection variants and/or benchmarks. Currently the
|
|
following importers (--importer X) are implemented:
|
|
MemoryImporter: Imports fault locations for single-location single-event
|
|
RAM faults (e.g., single-bit flips, or burst faults within the same
|
|
byte). Directly maps memory access events from the trace to the DB.
|
|
AdvancedMemoryImporter: A MemoryImporter that additionally imports
|
|
Relyzer-style conditional branch history, instruction opcodes, and a
|
|
virtual duration = time2 - time1 + 1 column.
|
|
RegisterImporter: Imports fault locations for single-location single-event
|
|
register-file faults (e.g., single-bit flips in general-purpose
|
|
registers). Considers only instruction addresses from the trace,
|
|
disassembles corresponding instructions in the supplied ELF binary (using
|
|
LLVM's disassembler library), and extracts used/defined registers.
|
|
Registers are mapped to/from "addresses" with fail::LLVMtoFailTranslator.
|
|
InstructionImporter: Interprets every instruction fetch as a memory read,
|
|
and handles it the same way the MemoryImporter handles "normal" memory
|
|
accesses. Implements a fault model with faults in CPU instructions.
|
|
RandomJumpImporter: Implements multi-bit faults in the instruction pointer.
|
|
As the IP is read before every instruction, the fault space explodes
|
|
rapidly. Should therefore be limited to small memory areas to jump
|
|
from/to.
|
|
Note that specifying an importer with --importer adds more parameters to the
|
|
--help output in some cases.
|
|
DB detail: This tool creates and fills the variant and trace tables.
|
|
- Prune the fault space with the prune-trace tool (enable BUILD_PRUNE_TRACE in
|
|
the CMake configuration). This prepares all information necessary for
|
|
running the FI campaign. Currently only the "basic" pruning method is
|
|
available, applying usual def/use pruning.
|
|
DB detail: This tool creates and fills the fsppilot, fspgroup and fspmethod
|
|
tables.
|
|
- Run the campaign server as usual. In this case it does not do much more than
|
|
to instantiate the generic DatabaseCampaign, and to parametrize it with the
|
|
experiment-specific protobuf message type (DCIAOKernelProtoMsg) which must
|
|
adhere to some structural constraints (a required DatabaseCampaignMessage
|
|
fsppilot member, and a repeated group Result that contains no further
|
|
subgroups or repeated fields). The DatabaseCampaign takes care of
|
|
distributing unfinished jobs, and collecting the results in an automatically
|
|
created result table with columns corresponding to the fields in the Result
|
|
group of the protobuf message.
|
|
|
|
=========================================================================================
|
|
Parallelization
|
|
=========================================================================================
|
|
Fail* is designed to allow parallelization of experiment execution allowing to reduce
|
|
the time needed to execute the experiments on a (larger) set of experiment data (aka
|
|
input parameters for the experiment execution, e.g. instruction pointer, registers, bit
|
|
numbers, ...). We call such "experiment data" the parameter sets. The so called "campaign"
|
|
is responsible for managing the parameter sets (i.e., the data to be used by the experiment
|
|
flows), inquired by the clients. As a consequence, the campaign is running on the server-
|
|
side and the experiment flows are running on the (distributed) clients.
|
|
First of all, the Fail* instances (and other required files, e.g. saved state) are
|
|
distributed to the clients. In the second step the campaign(-server) is started, preparing
|
|
its parameter sets in order to be able to answer the requests from the clients. (Once
|
|
there are available parameter sets, the clients can request them.) In the final step,
|
|
the distributed Fail* clients have to be started. As soon as this setup is finished,
|
|
the clients request new parameter sets, execute their experiment code and return their
|
|
results to the server (aka campaign) in an iterative way, until all paremeter sets have
|
|
been processed successfully. If all (new) parameter sets have been distributed, the
|
|
campaign starts to re-send unfinished parameter sets to requesting clients in order to
|
|
speed up the overall campaign execution. Additionally, this ensures that all parameter
|
|
sets will produce a corresponding result set. (If, for example, a client terminates
|
|
abnormally, no result is sent back. This scenario is dealt with by this mechanism, too.)
|
|
|
|
|
|
Shell scripts supporting experiment distribution:
|
|
**********************************************************************
|
|
These can be found in ${FAIL_DIR}/scripts/ (for now have a look at the script files
|
|
themselves, they contain some documentation):
|
|
- fail-env.sh: Environment variables for distribution/parallelization host
|
|
lists etc.; don't modify in-place but edit your own copy!
|
|
- distribute-experiment.sh: Distribute necessary FailBochs ingredients to
|
|
experiment hosts.
|
|
- runcampaign.sh: Locally run a campaign server, and a large amount of
|
|
clients on the experiment hosts.
|
|
- multiple-clients.sh: Is run on an experiment host by runcampaign.sh,
|
|
starts several instances of client.sh in a tmux session.
|
|
- client.sh: (Repeatedly) Runs a single fail-client instance.
|
|
|
|
|
|
Some useful things to note:
|
|
**********************************************************************
|
|
- Using the distribute-experiment.sh script causes the local fail-client binary to
|
|
be copied to the hosts. If the binary is not present in the current directory
|
|
the default fail-client binary (-> $ which fail-client) will be used. If you
|
|
have modified some of your experiment code (i.e., your fail-client binary will
|
|
change), don't forget to delete the local fail-client binary in order to
|
|
distribute the *new* binary.
|
|
- The runcampaign.sh script prints some status information about the clients
|
|
recently started. In addition, there will be a few error messages concerning
|
|
ssh, tmux and so on. They can be ignored for now.
|
|
- The runcampaign.sh script starts the coolchecksum-server. Note that the server
|
|
instance will terminate immediately (without notice), if there is still an
|
|
existing coolcampaign.csv file.
|
|
|
|
=========================================================================================
|
|
Steps to run an experiment with gem5:
|
|
=========================================================================================
|
|
1. Create a directory which will be used as gem5 system directory (which
|
|
will contain the guest system and boot image). Further called $SYSTEM.
|
|
2. Create two directories $SYSTEM/binaries and $SYSTEM/disks.
|
|
3. Put guestsystem kernel to $SYSTEM/binaries and boot image to $SYSTEM/disks.
|
|
For ARM targets, you can use the "linux-arm-ael.img" image contained in
|
|
http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2
|
|
As an example, the resulting directory structure might look like this
|
|
boecke@kos:~/$FAIL_DIR/build/gem5sys$ find
|
|
./binaries/abo-simple-arm.elf # your experiment binary (!= gem5)
|
|
./disks/linux-arm-ael.img # the ARM image (FIXME: whats this exactly?)
|
|
./disks/boot.arm # the ARM bootloader (FIXME: dito)
|
|
4. Run gem5 in $FAIL_DIR/simulators/gem5/ with:
|
|
$ M5_PATH=$SYSTEM build/ARM/gem5.debug configs/example/fs.py --bare-metal --kernel kernelname
|
|
|
|
=========================================================================================
|
|
Steps to run an experiment with the pandaboard/openocd backend:
|
|
=========================================================================================
|
|
1. Prepare sd card for pandaboard usage. For example by installing a sd-image
|
|
of the ubuntu for pandaboard.
|
|
2. In u-boot set the option bootdelay to 0 to enable the pandaboard to directly
|
|
boot the installed kernel without any delay. (In the first partition add the
|
|
file "preEnv.txt" and edit its content to "bootdelay=0")
|
|
3. Use "lra-panda-dijkstra" from the "experiment_targets" directory of the project
|
|
svn as a template for your application development.
|
|
This gives you the needed fault handlers and startup code for bare metal
|
|
execution of code. It also delivers a minimum functionality for serial output.
|
|
Further reading in lra-panda-dijkstra/README.
|
|
4. Copy the generated files to the prepared sd card.
|
|
5. Connect flyswatter2 to the host computer and to the pandaboard.
|
|
6. As information from the executable (elf format) and from the trace file are
|
|
needed, these two must be specified with the envirenment variables
|
|
FAIL_TRACE_PATH and FAIL_ELF_PATH.
|
|
7. Execute the experiment/campaign as usual. If errors occure, "oocd.log" might
|
|
give you a hint for problem solution.
|
|
|
|
=========================================================================================
|
|
Example experiments and code snippets
|
|
=========================================================================================
|
|
|
|
Experiment "lra-simple-panda":
|
|
**********************************************************************
|
|
A campaign experiment to use with the lra-panda-dijkstra experiment-target.
|
|
- The CMake option CONFIG_INJECTIONPOINT_HOPS should be enabled for fast trace
|
|
navigation.
|
|
- The campaign uses the DatabaseCampaign module.
|
|
- It will create a result table for the possible experiment outcomes
|
|
'OK','ERR_WRONG_RESULT','ERR_TRAP','ERR_TIMEOUT' and 'ERR_OUTSIDE_TEXT'.
|
|
'ERR_OUTSIDE_TEXT' describes any memory accesses which went outside the
|
|
application memory.
|