========================================================================================= Steps to run a boot image in Fail* using the Bochs simulator backend: ========================================================================================= Follow the Bochs documentation, and start your own "bochsrc" configuration file based on the "${PREFIX}/share/doc/bochs/bochsrc-sample.txt" template (or "/usr/share/doc/bochs/examples/bochsrc.gz" on Debian systems with Bochs installed). 1. Add your floppy/cdrom/hdd image in the floppya/ata0-master/ata0-slave sections; configure the boot: section appropriately. 2. Comment out com1 and parport1. 3. The following Bochs configuration settings (managed in the "bochsrc" file) might be helpful, depending on your needs: - For "headless" experiments: config_interface: textconfig display_library: nogui - For an X11 GUI: config_interface: textconfig display_library: x - For a wxWidgets GUI (does not play well with Fail*'s "restore" feature): config_interface: wx display_library: wx - Reduce the guest system's RAM to a minimum to reduce Fail*'s memory footprint and save/restore overhead, e.g.: memory: guest=16, host=16 - If you want to redirect FailBochs's output to a file using the shell's redirection operator '>', make sure "/dev/stdout" is not used as a target file for logging. (The Debian "bochsrc" template unfortunately does this in two places. It suffices to comment out these entries.) - To make Fail* terminate if something unexpected happens in a larger campaign, be sure it doesn't "ask" in these cases, e.g.: panic: action=fatal error: action=fatal info: action=ignore debug: action=ignore pass: action=ignore - If you need a quick-and-dirty way to pass data from the guest system to the outside world, and you don't want to write an experiment utilizing GuestEvents, you can use the "port e9 hack" that prints all outbs to port 0xe9 to the console: port_e9_hack: enabled=1 - Determinism: (Fail)Bochs is deterministic regarding timer interrupts, i.e., two experiment runs after calling simulator.restore() will count the same number of instructions between two interrupts. Though, you need to be careful when running (Fail)Bochs with a GUI enabled: Typing fail-client -q on the command line may lead to the GUI window receiving a "return key released" event, resulting in a keyboard interrupt for the guest system. This can be avoided by starting Bochs with "sleep 1; fail-client -q", by suppressing keyboard input (CONFIG_DISABLE_KEYB_INTERRUPTS setting in the CMake configuration), or disabling the GUI (see "headless experiments" above). ========================================================================================= Example experiments and code snippets ========================================================================================= Experiment "hsc-simple": ********************************************************************** A simple standalone experiment (without a separate campaign). To compile this experiment, the following steps are required: 1. Add "hsc-simple" to ccmake's EXPERIMENTS_ACTIVATED. 2. Enable CONFIG_EVENT_BREAKPOINTS, CONFIG_SR_RESTORE and CONFIG_SR_SAVE. 3. Build Fail* and Bochs, see "how-to-build.txt" for details. 4. Enter experiment_targets/hscsimple/, bunzip2 -k *.bz2 5. Start the Bochs simulator by typing $ fail-client -q After successfully booting the eCos/hello world example, the console shows "[HSC] breakpoint reached, saving", and a hello.state/ subdirectory appears. You probably need to adjust the bochsrc's paths to romimage/vgaromimage. These by default point to the locations installed by the Debian packages "bochsbios" and "vgabios"; for example, you alternatively may use the BIOSes supplied in "${FAIL_DIR}/simulators/bochs/bios/". 6. Compile the experiment's second step: edit fail/src/experiments/hsc-simple/experiment.cc, and change the first "#if 1" into "#if 0". Make an incremental build, e.g., by running "${FAIL_DIR}/scripts/rebuild-bochs.sh -" from your ${BUILD_DIR}. 7. Back to ../experiment_targets/hscsimple/ (assuming, your are in ${FAIL_DIR}), again run $ fail-client -q After restoring the state, the hello world program's calculation should yield a different result. Experiment "coolchecksum": ********************************************************************** An example for separate campaign/experiment implementations. To compile this experiment, the following steps are required: 1. Run step #1 (and if you're curious how COOL_ECC_NUMINSTR in experimentInfo.hpp was figured out, then step #2) of the experiment (analogous to what needed to be done in case of the "hsc-simple" experiment, see above). The experiment's target guest system can be found under ../experiment_targets/coolchecksum/. (If you want to enable COOL_FAULTSPACE_PRUNING, step #2 is mandatory because it generates the instruction/memory access trace needed for pruning.) 2. Build the campaign server (if it wasn't already built automatically): $ make coolchecksum-server 3. Run the campaign server: bin/coolchecksum-server 4. In another terminal, run step #3 of the experiment ("fail-client -q"). Step #3 of the experiment currently runs 2000 experiment iterations and then terminates, because Bochs has some memory leak issues. You need to re-run fail-client for the next 2k experiments. The experiments can be significantly sped up by a) parallelization (run more FailBochs clients and b) a headless (and more optimized) Fail* configuration (see above). Experiment "MHTestCampaign": ********************************************************************** An example for separate campaign/experiment implementations. 1. Execute campaign (job server): ${BUILD_DIR}/bin/MHTestCampaign-server 2. Run the FailBochs instance, in properly defined environment: $ fail-client -q Experiment "dciao-kernelstructs": ********************************************************************** This is an example for a database-driven FI campaign, with a campaign server instantiating the generic DatabaseCampaign. The general workflow is as follows: - Create a new database for this campaign, e.g., "dciao". (See how-to-build.txt, "Database backend setup") - Record a trace using the generic tracing tool: Build a fail-client with the "generic-tracing" experiment, and parametrize it with -Wf,[option] (e.g., "fail-client -Wf,--help"). Be aware that the generic tracing "experiment" needs a complete simulator environment along with the ELF binary that supplies symbol addresses; for FailBochs this means the usual bochsrc and boot image files need to be present. FIXME: dciao-kernelstructs has not been committed to danceos/devel/experiment_targets/, use a "complete" example here - Import the trace to the database using the import-trace tool (enable BUILD_IMPORT_TRACE in the CMake configuration to get this built). The --variant/--benchmark parameters are only significant if you intend to evaluate multiple protection variants and/or benchmarks. Currently the following importers (--importer X) are implemented: MemoryImporter: Imports fault locations for single-location single-event RAM faults (e.g., single-bit flips, or burst faults within the same byte). Directly maps memory access events from the trace to the DB. AdvancedMemoryImporter: A MemoryImporter that additionally imports Relyzer-style conditional branch history, instruction opcodes, and a virtual duration = time2 - time1 + 1 column. RegisterImporter: Imports fault locations for single-location single-event register-file faults (e.g., single-bit flips in general-purpose registers). Considers only instruction addresses from the trace, disassembles corresponding instructions in the supplied ELF binary (using LLVM's disassembler library), and extracts used/defined registers. Registers are mapped to/from "addresses" with fail::LLVMtoFailTranslator. InstructionImporter: Interprets every instruction fetch as a memory read, and handles it the same way the MemoryImporter handles "normal" memory accesses. Implements a fault model with faults in CPU instructions. RandomJumpImporter: Implements multi-bit faults in the instruction pointer. As the IP is read before every instruction, the fault space explodes rapidly. Should therefore be limited to small memory areas to jump from/to. Note that specifying an importer with --importer adds more parameters to the --help output in some cases. DB detail: This tool creates and fills the variant and trace tables. - Prune the fault space with the prune-trace tool (enable BUILD_PRUNE_TRACE in the CMake configuration). This prepares all information necessary for running the FI campaign. Currently only the "basic" pruning method is available, applying usual def/use pruning. DB detail: This tool creates and fills the fsppilot, fspgroup and fspmethod tables. - Run the campaign server as usual. In this case it does not do much more than to instantiate the generic DatabaseCampaign, and to parametrize it with the experiment-specific protobuf message type (DCIAOKernelProtoMsg) which must adhere to some structural constraints (a required DatabaseCampaignMessage fsppilot member, and a repeated group Result that contains no further subgroups or repeated fields). The DatabaseCampaign takes care of distributing unfinished jobs, and collecting the results in an automatically created result table with columns corresponding to the fields in the Result group of the protobuf message. ========================================================================================= Parallelization ========================================================================================= Fail* is designed to allow parallelization of experiment execution allowing to reduce the time needed to execute the experiments on a (larger) set of experiment data (aka input parameters for the experiment execution, e.g. instruction pointer, registers, bit numbers, ...). We call such "experiment data" the parameter sets. The so called "campaign" is responsible for managing the parameter sets (i.e., the data to be used by the experiment flows), inquired by the clients. As a consequence, the campaign is running on the server- side and the experiment flows are running on the (distributed) clients. First of all, the Fail* instances (and other required files, e.g. saved state) are distributed to the clients. In the second step the campaign(-server) is started, preparing its parameter sets in order to be able to answer the requests from the clients. (Once there are available parameter sets, the clients can request them.) In the final step, the distributed Fail* clients have to be started. As soon as this setup is finished, the clients request new parameter sets, execute their experiment code and return their results to the server (aka campaign) in an iterative way, until all paremeter sets have been processed successfully. If all (new) parameter sets have been distributed, the campaign starts to re-send unfinished parameter sets to requesting clients in order to speed up the overall campaign execution. Additionally, this ensures that all parameter sets will produce a corresponding result set. (If, for example, a client terminates abnormally, no result is sent back. This scenario is dealt with by this mechanism, too.) Shell scripts supporting experiment distribution: ********************************************************************** These can be found in ${FAIL_DIR}/scripts/ (for now have a look at the script files themselves, they contain some documentation): - fail-env.sh: Environment variables for distribution/parallelization host lists etc.; don't modify in-place but edit your own copy! - distribute-experiment.sh: Distribute necessary FailBochs ingredients to experiment hosts. - runcampaign.sh: Locally run a campaign server, and a large amount of clients on the experiment hosts. - multiple-clients.sh: Is run on an experiment host by runcampaign.sh, starts several instances of client.sh in a tmux session. - client.sh: (Repeatedly) Runs a single fail-client instance. Some useful things to note: ********************************************************************** - Using the distribute-experiment.sh script causes the local fail-client binary to be copied to the hosts. If the binary is not present in the current directory the default fail-client binary (-> $ which fail-client) will be used. If you have modified some of your experiment code (i.e., your fail-client binary will change), don't forget to delete the local fail-client binary in order to distribute the *new* binary. - The runcampaign.sh script prints some status information about the clients recently started. In addition, there will be a few error messages concerning ssh, tmux and so on. They can be ignored for now. - The runcampaign.sh script starts the coolchecksum-server. Note that the server instance will terminate immediately (without notice), if there is still an existing coolcampaign.csv file. ========================================================================================= Steps to run an experiment with gem5: ========================================================================================= 1. Create a directory which will be used as gem5 system directory (which will contain the guest system and boot image). Further called $SYSTEM. 2. Create two directories $SYSTEM/binaries and $SYSTEM/disks. 3. Put guestsystem kernel to $SYSTEM/binaries and boot image to $SYSTEM/disks. For ARM targets, you can use the "linux-arm-ael.img" image contained in http://www.gem5.org/dist/current/arm/arm-system-2011-08.tar.bz2 As an example, the resulting directory structure might look like this boecke@kos:~/$FAIL_DIR/build/gem5sys$ find ./binaries/abo-simple-arm.elf # your experiment binary (!= gem5) ./disks/linux-arm-ael.img # the ARM image (FIXME: whats this exactly?) ./disks/boot.arm # the ARM bootloader (FIXME: dito) 4. Run gem5 in $FAIL_DIR/simulators/gem5/ with: $ M5_PATH=$SYSTEM build/ARM/gem5.debug configs/example/fs.py --bare-metal --kernel kernelname ========================================================================================= Steps to run an experiment with the pandaboard/openocd backend: ========================================================================================= 1. Prepare sd card for pandaboard usage. For example by installing a sd-image of the ubuntu for pandaboard. 2. In u-boot set the option bootdelay to 0 to enable the pandaboard to directly boot the installed kernel without any delay. (In the first partition add the file "preEnv.txt" and edit its content to "bootdelay=0") 3. Use "lra-panda-dijkstra" from the "experiment_targets" directory of the project svn as a template for your application development. This gives you the needed fault handlers and startup code for bare metal execution of code. It also delivers a minimum functionality for serial output. Further reading in lra-panda-dijkstra/README. 4. Copy the generated files to the prepared sd card. 5. Connect flyswatter2 to the host computer and to the pandaboard. 6. As information from the executable (elf format) and from the trace file are needed, these two must be specified with the envirenment variables FAIL_TRACE_PATH and FAIL_ELF_PATH. 7. Execute the experiment/campaign as usual. If errors occure, "oocd.log" might give you a hint for problem solution. ========================================================================================= Example experiments and code snippets ========================================================================================= Experiment "lra-simple-panda": ********************************************************************** A campaign experiment to use with the lra-panda-dijkstra experiment-target. - The CMake option CONFIG_INJECTIONPOINT_HOPS should be enabled for fast trace navigation. - The campaign uses the DatabaseCampaign module. - It will create a result table for the possible experiment outcomes 'OK','ERR_WRONG_RESULT','ERR_TRAP','ERR_TIMEOUT' and 'ERR_OUTSIDE_TEXT'. 'ERR_OUTSIDE_TEXT' describes any memory accesses which went outside the application memory.