christoph/fail - fail - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Florian Lukas	010d4a892d	DatabaseCampaign: fix finished experiments SQL The database queries to fetch all unfinished experiments were broken. The server tried to insert all finished pilot_ids into the temporary result_ids table and then discard all experiments which have the correct (finished) count of IDs in this table. This cannot work as the pilot_id is the only column of result_ids and must be a unique primary key. As a fix, the count of results is stored as a second field in result_ids and the result table is now joined against result_ids to check this field. Change-Id: I6a9fb774825f0cc4ce104c6e51d7b2fe16957aec	2014-03-18 11:18:27 +01:00
Michael Lenz	f775c92d72	dump-trace: keep track of read/written bytes This change adds access-size tracking to dump-trace and output thereof in mode "-s". Change-Id: I5647d7b16c89499b7813faaf8c3844f275bc552a	2014-03-05 11:55:11 +01:00
Florian Lukas	9df6d983bf	util/llvmdisassembler: compile with -fno-rtti For some reason, this is required even when LLVM is not built using -fno-rtti. Change-Id: I992799c8b54135a0a87b2de7c4a3d57f2d3670d9	2014-02-26 14:46:23 +01:00
Horst Schirmeier	6e03753b6e	ecos: modifications for mibench benchmarks Change-Id: Ifecbf24912dbffa814b189aed9336a5420ec6392	2014-02-25 13:32:56 +01:00
Horst Schirmeier	cbf9daea97	ecos: rewrite for DatabaseCampaign + modified resulttypes Change-Id: I463759e66e7497c80eeee9a065fc95e058ec3dc1	2014-02-25 13:32:56 +01:00
Horst Schirmeier	0009a95e62	fail-env updated Change-Id: Idf605dddc4026c9b796d0ef7174e430d9aed1236	2014-02-25 13:32:56 +01:00
Horst Schirmeier	5ccc6e3525	comm: ExperimentData needs a virtual destructor Classes deriving from ExperimentData usually contain the experiment-specific Protobuf message, which needs to be properly destroyed. This is particularly a problem in the generic DatabaseCampaign, as it never downcasts ExperimentData objects retrieved from JobServer::getDone(). As the embedded DatabaseCampaignMessage (usually named "fsppilot") is allocated on the heap (this happens in the campaign's cb_send_pilot() function, asking for a mutable_fsppilot()), the lack of a virtual destructor in ExperimentData led to a memory leak, rendering the campaign server inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process memory limit). Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce	2014-02-25 13:32:56 +01:00
Horst Schirmeier	5ee96032c9	jobserver: gracefully handle thread creation failures Due to the previous DatabaseCampaign fix, this may not be necessary anymore, but it's nevertheless a good idea to handle thread creation failures properly. Change-Id: I8317a77dd5338509727e737040944320e7755ae3	2014-02-25 13:32:56 +01:00
Horst Schirmeier	25a390970a	DatabaseCampaign: avoid table locking It is necessary to copy pilot IDs of existing results to a temporary table before fetching undone jobs from the DB: Otherwise, due to MyISAMs table-level locking, collect_result_thread() will block in INSERT (SHOW PROCESSLIST state "Waiting for table level lock") until the (streamed) pilot query finishes. As one pilot query follows after the other, collect_result_thread() may even starve until the memory for the JobServer's "done" queue runs out, resulting in a crash and the loss of all queued results. Change-Id: Ib0ec5fa84db466844b1e9aa0e94142b4d336b022	2014-02-25 13:32:55 +01:00
Horst Schirmeier	b094753fde	doc: missing libraries Change-Id: Ife89a0b3cc74433e4fb711580c5eb3cd82467081	2014-02-25 13:32:55 +01:00
Horst Schirmeier	bc2103c527	sal/bochs: don't show errors in non-verbose mode The patched eCos variant we analyze intentionally overflows the 16550 UART FIFOs, flooding the terminal with Bochs error messages. Enabling CONFIG_BOCHS_NON_VERBOSE now also enforces ignoring error messages, regardless of log verbosity settings in the bochsrc. Change-Id: If14e2532234e61bf60720a45150ef4973e8d508b	2014-02-25 13:32:55 +01:00
Horst Schirmeier	953fbe2156	serialoutput: cleanup Change-Id: I255a3dd44fcf075a441461a883f564ee2d626ee1	2014-02-25 13:32:55 +01:00
Horst Schirmeier	455c088cd9	serialoutput: optional character limit This prevents unlimited memory consumption in case the guest system enters an endless loop. Change-Id: Ia1bb178f7d8cb8ad8bf958210d90f6d7c2e11359	2014-02-25 13:32:55 +01:00
Horst Schirmeier	c319f3458c	serialoutput: consistent plugin class naming Change-Id: I8abe0cfdebecb0adc7229e29bd241da65b27105a	2014-02-25 13:32:55 +01:00
Horst Schirmeier	36ae6fd6c3	prune-trace: use none/none only without any parameters Before this change, running prune-trace with, e.g. "prune-trace -d fsp_mibench -v bitmap% --benchmark-exclude clockcnv" resulted in an implied "--benchmark none", rendering --benchmark-exclude ineffective and resulting in nothing being pruned. Now, the "none" default only applies when neither --benchmark nor --benchmark-exclude (analogously for --variant / --variant-exclude) is provided. Change-Id: Ic7c88919d7cfde1261749a745dc6a679472ff348	2014-02-25 13:32:55 +01:00
Horst Schirmeier	1df43e9726	import-trace: major speedup Using Database::insert_multiple() instead of prepared statements speeds up trace import by a factor of 3-4. While being there, we now properly deal with nonexistent extended trace values (i.e., put NULLs into the DB). Side note: The ElfImporter should switch to insert_multiple(), too. Change-Id: I96785e9775e3ef4f242fd50720d5c34adb4e88a1	2014-02-25 13:32:55 +01:00
Martin Hoffmann	76bda55c5e	Merge "plugins/randomgenerator: add deterministic PRNG plugin"	2014-02-20 11:26:35 +01:00
Horst Schirmeier	69ba9e0f94	dump-trace: properly deal with empty extended trace entries As dereferencing register contents is not always possible, extended trace entries may be empty. Change-Id: I603fcef2eb2b0429a9d6ed0469441bc314e365fd	2014-02-19 19:08:46 +01:00
Horst Schirmeier	b6fc98abae	generic-tracing: remove --save-symbol At least for the Bochs backend there might be side effects when saving the simulator state while tracing, which therefore should be avoided. As there is no known use-case for using a --save-symbol different to --start-symbol, this change disables the semantics behind --save-symbol completely and only keeps the command-line switch for backward compatibility reasons (existing automatic test scripts etc.). The generic-tracing experiment now complains and aborts if a --save-symbol different to --start-symbol is given. Change-Id: I6072d846be96e016534cc83db375a400cfc25303	2014-02-19 19:08:46 +01:00
Horst Schirmeier	836325e74b	generic-tracing: cleanups Change-Id: I5c3d1131248910228cb4fee44cf107c750c01e21	2014-02-19 19:08:46 +01:00
Horst Schirmeier	85152238da	tracing: fix endless loop when only tracing mem accesses With m_tracetype=TRACE_MEM, bool first was never reset to false in the tracing plugin's main loop. This bug was most probably never triggered, though, as nobody only traces memory accesses. This change also slightly simplifies the internal logic in the tracing plugin. Change-Id: I65d7df6a3781ec552cfb892bbf3394b421e227f1	2014-02-19 19:08:46 +01:00
Florian Lukas	b82e547b53	plugins/randomgenerator: add deterministic PRNG plugin A simple plugin which deterministically returns a new random value each time the specified symbol is read. Change-Id: I6ccac421fc064f02a88e8b126f8a26044d1f51c6	2014-02-18 16:40:34 +01:00
Horst Schirmeier	01c1321b48	tracing: bugfix for mem dereferences at mapping boundary As we copy a 32-bit word from the dereferenced address, we also need to check whether address+3 is also mapped. (Yes, I've seen this in the wild.) Change-Id: I43f891c56e077333670c9cb48c0ee8e9342fa41d	2014-02-17 23:24:16 +01:00
Horst Schirmeier	58fa4c59cc	sal/bochs: fix handling of unmapped memory Up to now, BochsMemory::isMapped() always returned true in 32-bit protected mode with a 4GB linear address space (as used by, e.g., eCos), even for addresses greater than the configured memory size. This led to lots of bogus memory dereferences in the (extended) tracing plugin. This change (a follow-up to commit `5171645`) additionally checks the return value of getHostMemAddr(), and announces BX_RW (read/write access) instead of BX_READ as the intended type of memory access. In the aforementioned scenario, memory addresses greater than the memory size are now correctly detected as "not mapped". Change-Id: Ic2fa7554c869cb90191164535a601bae4dbb49b6	2014-02-17 23:24:16 +01:00
Martin Hoffmann	e4bf980b97	Fail* result browser for pruning experiments. Based on the database layout given by the pruner. Run ./run.py -c <path to mysql.cnf> (Default config ~/.my.cnf) - Checks if objdump table exists - Added view for results per instruction - Added config file support for table details - Overview data loaded at server startup - Result type mapping configurable via config file Based on Flask and MySQLdb Change-Id: Ib49eac8f5c1e0ab23921aedb5bc53c34d0cde14d	2014-02-17 10:17:25 +01:00
Horst Schirmeier	b4f144745a	Revert "import-trace: emit warning for malformed traces" Memory accesses that don't belong to the preceding IP event in the trace do have a use case: a hardware interrupt causes the CPU to push its state onto the (kernel) stack. At the moment we cannot distinguish this case from a malformed trace (as we don't record the occurrence of interrupts), hence this warning needs to be disabled for now. This reverts commit `84edd02b6f`.	2014-02-11 14:57:29 +01:00
Horst Schirmeier	8942740d9a	Merge branch 'memorymap'	2014-02-09 14:20:53 +01:00
Christian Dietrich	26308dc9ea	Merge "Removes serial.out from bochs config for massive performance boost on cluster runs."	2014-01-31 13:52:45 +01:00
Philip Taffner	4ba028e740	Removes serial.out from bochs config for massive performance boost on cluster runs. If this output file is enabled, all running processes try to write to the same file on the shared filesystem. They block each other which leads to massive I/O wait time and CPU idle time. This change reduces the runtime e.g. from several hours (12+) to few minutes (20). Change-Id: I028628af31c845fc517e5daca5b4f981eade3cf4	2014-01-31 12:51:58 +01:00
Horst Schirmeier	4b921a5fe3	util: MemoryMap test Change-Id: I54680685326a85cfd723a47e8aef8d71662c9aeb	2014-01-30 15:26:20 +01:00
Horst Schirmeier	4bcce14659	util: space-efficient MemoryMap We now use boost::icl::interval_set internally, consuming extremely lower amounts of memory. boost::icl was introduced with Boost 1.46; Debian 7.0 comes with 1.49, so this dependency should be no problem anymore. Both the class interface and the memory-map file format stay the same. Change-Id: I38e8148384c90aa493984d0f6280817df00f1702	2014-01-30 15:26:12 +01:00
Richard Hellwig	119ae40be9	util/Database: added a wrapper function for mysql_real_escape_string() Change-Id: I999aad3c35c5f389fa3acfe8d7a11c417c478787	2014-01-28 11:07:34 +01:00
Richard Hellwig	13175c259b	import-trace: import debug info If the --debug option is set, the line number table of the elf binary will be imported into the database. The information will be stored in the "dbg_mapping" table. If the --sources option is set, the source files will be imported into the database. Only the files that were actually used in the elf binary will be imported. Change-Id: I0e9de6b456bc42b329c1700c25e5839d9552cdbb	2014-01-28 11:07:34 +01:00
Christian Dietrich	d307dd2ecb	dciao-kernelstructs: reuse sobres experiment for ISORC2014 Differences: - the task activation order is determined in the faulty experiment as well as in the golden run (which is now done by fail-generic-tracing) by observing a variable fail_virtual_port. - There is a panic value read from the fail_virtual_port - The golden run task activation is determined by giving an extended trace to task_activation.py. The script collects all writes to fail_virtual_port, and determines the activation from this. Change-Id: Id401b78933b45a4b2cf031fc0a8b5ac90151ec24	2014-01-27 10:32:09 +01:00
Horst Schirmeier	c48c7296fb	util/WallclockTimer: bugfix: include ostream This only compiled everywhere because all users included (i)ostream. Change-Id: I29b0fb13a01606fdffd8ebdb9701eff652065916	2014-01-24 20:33:32 +01:00
Horst Schirmeier	85e3911202	Merge branch 'ubuntu-saucy-fixes'	2014-01-24 17:02:44 +01:00
Horst Schirmeier	17e76c140b	cpn: needs comm and MySQL at link time The dependency on fail-comm exists not only at compile time (the latter is due to protobuf header generation). Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30	2014-01-23 14:31:24 +01:00
Horst Schirmeier	4cb97a7fa5	formatting, typos, comments, details Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a	2014-01-22 13:08:13 +01:00
Horst Schirmeier	7591c9edc5	Merge branch 'jobclientserver-fixes'	2014-01-22 13:07:59 +01:00
Michael Lenz	e37f2db4be	Merge "prune-trace: use the first write pilot instead of any"	2014-01-22 09:58:44 +01:00
Michael Lenz	4ccddeb137	prune-trace: use the first write pilot instead of any In some cases the write-pilot is located at the upper boundary of the experiment and thus is in a race situation with the experiment's end. If the experiment's end occurs first, the campaign ends and complains about missing data, otherwise everything is fine. This patch circumvents this via using "the first" writing pilot; iff the only write is located at the experiment's end, the race will still occur, but cleverly written experiment code can, according to hsc, circumvent it. Change-Id: I6a27a8c4770c04ea8dcaef8aa7bd85d18f43f0b5	2014-01-22 09:06:27 +01:00
Richard Hellwig	5fbf13d07d	gem5: TrapListener implemented The TrapListener works like in Bochs. Instead of a number to a trap the offset is returned for GEM5. See: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211h/Babfeega.html Conflicts: simulators/gem5/src/cpu/simple/atomic.cc Change-Id: Ia8b2083e3c16315d9c577150f14f16995494b2e6	2014-01-21 16:09:08 +01:00
Richard Hellwig	fa1690bd1f	Merge "core/sal: Added features that indicate whether FAIL* is initialized"	2014-01-21 15:35:22 +01:00
Horst Schirmeier	813414984c	util: boost::thread 1.53 depends on boost::system Unfortunately this implicit dependency is currently not resolved anywhere else (e.g., FindBoost.cmake), although the 'net heavily discusses this issue. Change-Id: I8a7c8518394cdba27e591fed250623011d988067	2014-01-21 00:29:34 +01:00
Lars Rademacher	4e21b42374	cpn: use strtoul for conversion of unsigned ints As 32-bit libc6 atoi() caps the value of unsigned ints bigger than 2^31-1 (instead of just letting it overflow to the corresponding negative value, as on x86_64), it must not be used especially for the conversion of 32-bit pointers. Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f	2014-01-21 00:10:56 +01:00
Horst Schirmeier	122eb8c9dc	use uint32 for addresses in protobuf msgs This prevents integer overflows when using addresses > 2GiB, which are common for x86 operating systems with paging (Linux, Fiasco.OC) or some test cases on the PandaBoard. Note that this results in slightly different result table definitions when automatically translating an experiment's protobuf message in the DatabaseCampaign. This change affects all existing protobuf messages to prevent copy/paste propagation of this issue. Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2	2014-01-21 00:08:41 +01:00
Horst Schirmeier	de39bf6120	jobclient: use initializer list Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10	2014-01-20 22:48:08 +01:00
Horst Schirmeier	5ffcb82138	jobclient: initial number of jobs configurable The new CLIENT_JOB_INITIAL configuration option allows to configure the client to request more than one job in the first request round. If a reasonable initial value is chosen, this removes the job ramp-up after each fail-client restart, and slightly improves overall throughput. Change-Id: Idac2721264ec264c520d341fac64a8311a974708	2014-01-20 22:48:08 +01:00
Horst Schirmeier	2c31bf79b0	jobclient: expect communication failures This change makes the JobClient act properly on communication aborts. Change-Id: I0a76489f117e9721546215e3b627002605e25452	2014-01-20 22:48:08 +01:00
Horst Schirmeier	882d4f381b	jobclient: bugfix: faster shutdown at campaign end The JobClient currently waits a LONG time until it really shuts down after not having reached the server in sendResultsToServer() (which is unfortunately the by far most probable point in the code to determine this): - A different bug (fixed in the previous commit) provoked the situation that a (way) too large amount of jobs was fetched before. - sendResult() (called after each experiment iteration) realized that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to prematurely call home to send first results (without planning to get new jobs yet). - If the server was gone (done, or aborted), connect in sendResultsToServer() failed after several retries and timeouts. - All subsequent calls to sendResult() retried connecting to the server (again, with retries and timeouts), once for each remaining job. - When all jobs were done, getParam() tries to connect a last time, finally telling the experiment that nobody's home. This resulted in client shutdown times of up to four hours (for the default CLIENT_JOB_LIMIT of 1000) after the campaign server terminated. This change solves the issue by not handing out new (cached) jobs after the connect failed once, making the experiment terminate quickly. Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7	2014-01-20 22:48:08 +01:00

1 2 3 4 5 ...

1011 Commits