diff --git a/doc/how-to-build.txt b/doc/how-to-build.txt index 14c36dde..ac81eaf0 100644 --- a/doc/how-to-build.txt +++ b/doc/how-to-build.txt @@ -2,13 +2,11 @@ Additional libraries/packages/tools needed for Fail*: ========================================================================================= -Required anyway: +Required for Fail*: ********************************************************************** - libprotobuf-dev - - libpthread - libpcl1-dev - - libboost-dev - - libboost-all-dev (or at least libboost-thread-dev) + - libboost-thread-dev - protobuf-compiler - cmake - cmake-curses-gui @@ -16,6 +14,11 @@ Required anyway: obtained from http://www.aspectc.org; nightlies can be downloaded from http://akut.aspectc.org +Required for the Bochs simulator backend: +********************************************************************** + - libpthread + - Probably more, depending on, e.g., the GUI you configure (X11 -> + libxrandr-dev) For distribution/parallelization: ********************************************************************** @@ -25,9 +28,14 @@ For distribution/parallelization: 32-bit FailBochs on x86_64 Linux machines: ********************************************************************** - - libc6-i386 + all libraries listed by - $ ldd bochs|awk '{print $3}' - in ~/bochslibs (client.sh will add these to LD_LIBRARY_PATH) + - Create a "bochslibs" directory and fill it with all necessary libraries from + your build machine: + $ mkdir bochslibs + $ cp -v $(ldd fail-client|awk '{print $3}'|egrep -v '\(|lib(pthread|selinux|c.so.)|^$') bochslibs/ + - Copy this directory to ~/bochslibs on all machines lacking these libraries + (this may also be the case for i386 machines you cannot install library + packages on yourself). client.sh will add ~/bochslibs to LD_LIBRARY_PATH if + it exists. ========================================================================================= Compiling, building and modifying: Simulators and Fail* @@ -44,18 +52,23 @@ For the first time: $ find -name CMakeCache.txt | xargs rm 3. Create out of source build directory (${BUILD_DIR}, see also "fail-structure.txt"): $ mkdir build + Note that currently this build directory must be located somewhere below + the fail/ directory, as generated .ah files in there will not be included + in the compile process otherwise. 4. Enter out-of-source build directory. All generated files end up there. $ cd build 5. Generate CMake environment. $ cmake .. 6. Setup build configuration by opening the CMake configuration tool $ ccmake . - Select "BUILD_BOCHS" or "BUILD_OVP". Select an experiment to enable by naming its - "experiments/" subdirectory under "EXPERIMENTS_ACTIVATED". Configure Fail* features - you need for this experiment by enabling "CONFIG_*" options. Press 'c', 'g' to - regenerate the build system. (Alternatively use + Select "BUILD_BOCHS" or "BUILD_OVP". Select an experiment to enable by + naming its "experiments/" subdirectory under "EXPERIMENTS_ACTIVATED". + Configure Fail* features you need for this experiment by enabling + "CONFIG_*" options. Press 'c', 'g' to regenerate the build system. + (Alternatively use $ cmake-gui . - for a Qt GUI.) To enable a Debug build, choose "Debug" as the build type. + for a Qt GUI.) To enable a Debug build, choose "Debug" as the build type, + otherwise choose "Release". 7. Additionally make sure Bochs is at least configured (see below). @@ -64,13 +77,11 @@ After changes to Fail* code: Prerequisite, if you're building with Bochs: configure Bochs (see below). Compile (in ${BUILD_DIR}, optionally "add -jN" for parallel building): $ make -CMake will build all Fail* libraries, merge them into a libfail.a and put it into -"${FAIL_DIR}/src". (As the current Bochs Makefile expects it there.) The static -library contains all core components and activated experiments/plugings. -You may use the shell script +CMake will build all Fail* libraries and link them with the simulator backend +library to a binary called "fail-client". You may use the shell script $ ${FAIL_DIR}/scripts/rebuild-bochs.sh [-] -to speed up repetitive tasks regarding Fail/Bochs builds. This script contains a -concise documentation on itself. +to speed up repetitive tasks regarding Fail/Bochs builds. This script contains +a concise documentation on itself. Add new Fail* sources to build chain: @@ -99,7 +110,7 @@ to be compiled previously: $ cd src/core/doc/latex; make -Building Bochs: +Building FailBochs: ********************************************************************** For the first time: @@ -122,20 +133,16 @@ For the first time: FIXME: Remove more redundant flags/libraries -After changes to Bochs code or Bochs-affecting aspects: +After changes to Bochs code: ------------------------------------------------------------ - - Compiling: The make call from the make-ag++.sh is now invokable by calling - (still in ${BUILD_DIR}, optionally adding -jN for parallel building): - $ cd ../build %% FIXME: involviert make bochs (im build-Ver.) wirklich make-ag++.sh? - $ make bochs - (Of course, this requires a configured Bochs/Fail*.) - - Cleaning up: The former make all-clean is now invokable by + - Just re-run "make" in ${BUILD_DIR}, or call "scripts/rebuild-bochs.sh -". + The latter automatically runs "make install" after rebuilding fail-client + (and probably the experiment's campaign server). + - Cleaning up (forcing a complete rebuild of libfailbochs.a next time): $ make bochsallclean - - Installing: For installing the bochs executable (former "make install") - $ make bochsinstall - (See "make help" for a target listing.) - - Note: You may use scripts/rebuild-bochs.sh to speed up repetitive tasks regarding - Fail/Bochs builds. This script contains a concise documentation on itself. + This is especially necessary if you changed a Bochs-affecting aspect header + (.ah), as the build system does not know about Bochs sources depending on + certain aspects. Debug build: @@ -143,6 +150,8 @@ Debug build: Configure Bochs to use debugging-related compiler flags (expects to be in ${BUILD_DIR}): $ cd ../simulator/bochs $ CFLAGS="-g -O0" CXXFLAGS="-g -O0" ./configure --prefix=... ... (see above) +You might additionally want to configure the rest of Fail* into debug mode by +setting CMAKE_BUILD_TYPE to "Debug" (ccmake, see above). Profiling-based optimization build: diff --git a/doc/how-to-use.txt b/doc/how-to-use.txt index 1adb8fa8..7d1045bf 100644 --- a/doc/how-to-use.txt +++ b/doc/how-to-use.txt @@ -38,13 +38,16 @@ based on the "${PREFIX}/share/doc/bochs/bochsrc-sample.txt" template (or 0xe9 to the console: port_e9_hack: enabled=1 - Determinism: (Fail)Bochs is deterministic regarding timer interrupts, - i.e., two experiment runs after calling simulator.restore() will count the - same number of instructions between two interrupts. Though, you need to be - careful when running (Fail)Bochs with a GUI enabled: Typing "bochs -q" + i.e., two experiment runs after calling simulator.restore() will count + the same number of instructions between two interrupts. Though, you + need to be careful when running (Fail)Bochs with a GUI enabled: Typing + fail-client -q on the command line may lead to the GUI window receiving a "return key released" event, resulting in a keyboard interrupt for the guest system. - This can be avoided by starting Bochs with "sleep 1; bochs -q", or - disabling the GUI (see "headless experiments" above). + This can be avoided by starting Bochs with "sleep 1; fail-client -q", by + suppressing keyboard input (CONFIG_DISABLE_KEYB_INTERRUPTS setting in + the CMake configuration), or disabling the GUI (see "headless + experiments" above). ========================================================================================= Example experiments and code snippets @@ -56,10 +59,10 @@ A simple standalone experiment (without a separate campaign). To compile this experiment, the following steps are required: 1. Add "hsc-simple" to ccmake's EXPERIMENTS_ACTIVATED. 2. Enable CONFIG_EVENT_BREAKPOINTS, CONFIG_SR_RESTORE and CONFIG_SR_SAVE. - 3. Build Fail* and Bochs, see "how-to-build.txt" for details- + 3. Build Fail* and Bochs, see "how-to-build.txt" for details. 4. Enter experiment_targets/hscsimple/, bunzip2 -k *.bz2 5. Start the Bochs simulator by typing - $ bochs -q + $ fail-client -q After successfully booting the eCos/hello world example, the console shows "[HSC] breakpoint reached, saving", and a hello.state/ subdirectory appears. You probably need to adjust the bochsrc's paths to romimage/vgaromimage. @@ -71,8 +74,8 @@ experiment, the following steps are required: into "#if 0". Make an incremental build, e.g., by running "${FAIL_DIR}/scripts/rebuild-bochs.sh -" from your ${BUILD_DIR}. 7. Back to ../experiment_targets/hscsimple/ (assuming, your are in ${FAIL_DIR}), - run - $ bochs -q + again run + $ fail-client -q After restoring the state, the hello world program's calculation should yield a different result. @@ -88,13 +91,14 @@ experiment, the following steps are required: ../experiment_targets/coolchecksum/. (If you want to enable COOL_FAULTSPACE_PRUNING, step #2 is mandatory because it generates the instruction/memory access trace needed for pruning.) - 2. Build the campaign server: make coolchecksum-server + 2. Build the campaign server (if it wasn't already built automatically): + $ make coolchecksum-server 3. Run the campaign server: bin/coolchecksum-server - 4. In another terminal, run step #3 of the experiment ("bochs -q"). + 4. In another terminal, run step #3 of the experiment ("fail-client -q"). Step #3 of the experiment currently runs 2000 experiment iterations and then terminates, because Bochs has some memory leak issues. You need to re-run -Bochs for the next 2k experiments. +fail-client for the next 2k experiments. The experiments can be significantly sped up by a) parallelization (run more FailBochs clients and @@ -104,9 +108,9 @@ The experiments can be significantly sped up by Experiment "MHTestCampaign": ********************************************************************** An example for separate campaign/experiment implementations. - 1. Execute Campaign (job server): ${BUILD_DIR}/bin/MHTestCampaign-server + 1. Execute campaign (job server): ${BUILD_DIR}/bin/MHTestCampaign-server 2. Run the FailBochs instance, in properly defined environment: - $ bochs -q + $ fail-client -q ========================================================================================= Parallelization @@ -120,17 +124,16 @@ flows), inquired by the clients. As a consequence, the campaign is running on th side and the experiment flow are running on the (distributed) clients. First of all, the Fail* instances (and other required files, e.g. saved state) are distributed to the clients. In the second step the campaign(-server) is started, preparing -it's parameter-sets in order to be able to answer the requests from the clients. (Once -there are available parameter-sets, the clients can request them.) In the final step, +its parameter sets in order to be able to answer the requests from the clients. (Once +there are available parameter sets, the clients can request them.) In the final step, the distributed Fail* clients have to be started. As soon as this setup is finished, -the clients request new parameter-sets, execute their experiment code and return their -results to the server (aka campaign) in an iterative way, until all paremeter-sets have -been processed successfully. If all (new) parameter-sets have been distributed, the -campaign starts to resend unfinished parameter-sets to requesting clients in order to +the clients request new parameter sets, execute their experiment code and return their +results to the server (aka campaign) in an iterative way, until all paremeter sets have +been processed successfully. If all (new) parameter sets have been distributed, the +campaign starts to re-send unfinished parameter sets to requesting clients in order to speed up the overall campaign execution. Additionally, this ensures that all parameter sets will produce a corresponding result set. (If, for example, a client terminates -abnormally, no result is send back. This scenario is managed by this "resend-mechanism" -of the campain, too.) +abnormally, no result is sent back. This scenario is dealt with by this mechanism, too.) Shell scripts supporting experiment distribution: @@ -145,27 +148,30 @@ themselves, they contain some documentation): clients on the experiment hosts. - multiple-clients.sh: Is run on an experiment host by runcampaign.sh, starts several instances of client.sh in a tmux session. - - client.sh: (Repeatedly) Runs a single FailBochs instance. + - client.sh: (Repeatedly) Runs a single fail-client instance. Some useful things to note: ********************************************************************** - - Using the distribute-experiment.sh script causes the local bochs binary to + - Using the distribute-experiment.sh script causes the local fail-client binary to be copied to the hosts. If the binary is not present in the current directory - the default bochs binary (-> $ which bochs) will be used. If you have modified - some of your experiment code (i.e., your bochs binary will change), don't - forget to delete the local bochs binary in order to distribute the *new* binary. + the default fail-client binary (-> $ which fail-client) will be used. If you + have modified some of your experiment code (i.e., your fail-client binary will + change), don't forget to delete the local fail-client binary in order to + distribute the *new* binary. - The runcampaign.sh script prints some status information about the clients recently started. In addition, there will be a few error messages concerning ssh, tmux and so on. They can be ignored for now. - The runcampaign.sh script starts the coolchecksum-server. Note that the server - instance will terminate immediatly (without notice), if there is still an + instance will terminate immediately (without notice), if there is still an existing coolcampaign.csv file. - In order to make the performance gains (mentioned above) take effect, a "workload balancing" between the server and the clients is mandatory. This means that - the communication overhead (client <-> server) and the time, needed to execute + the communication overhead (client <-> server) and the time needed to execute the experiment code on the client-side should be in due proportion. More specifically, for each experiment there will be exactly 2 TCP connections - (send parameter-set to client, send result to server) established. Therefore - you should ensure that the execution time of the experiment is "long enough" - (heuristic). (See existing experiments for examples.) + (send parameter set to client, send result to server) established. Therefore + you should ensure that the jobs you distribute take enough time not to + overflow the server with requests. You may need to bundle parameters for + more than one experiment if a single experiment only takes a few hundred + milliseconds. (See existing experiments for examples.)