christoph/fail - fail - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Florian Lukas	ba774a258c	util/llvmdisassembler: fix section end symbols Somehow, while iterating symbols in a section, it can happen that the last symbol start address is equal to the section size, which means it is beyond the section end. In this case the LLVM getInstruction() method does not return a failure, but a zero-size instruction, resulting in an infinite loop. Now, if beyond section limits, the iteration is aborted. Additionally, an assertion checks for disassembled zero-size instructions. Change-Id: Id8a355475161150d3ee919cd6cf603d4ff26b228	2014-04-03 15:02:57 +02:00
Horst Schirmeier	442069dd45	cmake: added missing link-time dependency cpn->util (for synchronized queue implementations) Change-Id: I7b32273b8e76a7b7921af117fdf3ca5af2f42553	2014-04-02 10:43:48 +02:00
Horst Schirmeier	7694195f1b	cmake: added missing link-time dependency efw->util Change-Id: Iea812ed881f67e452827a718a52517c7328b0729	2014-04-01 16:15:58 +02:00
Horst Schirmeier	5378620b63	core/sal: fix warning for 32-bit Bochs warning: unused variable ‘regdata’ Change-Id: I2b5b0ab7d920dc060ec814494e9817f1f49496a9	2014-03-26 17:47:42 +01:00
Horst Schirmeier	ba60ecc4e6	core/sal: comment typos Change-Id: I9ac66d7d1afa22dc6645433736e1c1d38a4e23fa	2014-03-26 17:47:37 +01:00
Horst Schirmeier	136d397a52	core/sal: identical register IDs for 32 and 64 bit With this change, x86 special-purpose registers (e.g., EFLAGS) get the same register ID in both 32 bit and 64 bit configurations. Change-Id: I69db9397481414f99ca05ecb0ea9dc8ab7d989c9	2014-03-26 17:47:25 +01:00
Horst Schirmeier	1ad2eb5110	core/sal: fix x86 register IDs for x86_64 This bug mapped EFLAGS to the same Fail* register ID as R9 on x86_64. This probably has not had consequences yet, as most FailBochs users use 32-bit code. Change-Id: I00a680675bb9e73c2781276f3ef651162c8e4445	2014-03-26 17:27:18 +01:00
Florian Lukas	0799e52fde	util/llvmdisassembler: map registers by names Internal LLVM register IDs can and did change between LLVM versions. These magic integers are replaced by iterating over all LLVM registers and mapping them to FAIL* registers by name. As this iteration requires a LLVM object created from a binary, a static convenience function is added to LLVMtoFailTranslator which creates a translator given the binary filename. Building this functionality inside libfail-llvmdisassembler prevents experiments from needing to add LLVM includes and library definitions. Change-Id: I27927f40d5cb6d9a22bb2caf21ca2450f6bcb0b8	2014-03-24 15:01:09 +01:00
Florian Lukas	21f5f681e0	core/sal: add x86 control and segment registers new register type RT_CONTROL: CR0, CR2, CR3, CR4 new register type RT_SEGMENT: CS, DS, ES, FS, GS, SS Reading/writing is mostly untested except for CR3... Change-Id: I0d0fba4a1669153ab2577e82ab64a04cf2bbfb94	2014-03-24 14:32:54 +01:00
Florian Lukas	396e00ce59	cmake: static library dependencies CMake does not support linker groups, which were used to "automatically" fix circular dependencies between different static FAIL* libraries and the ordering of dynamic external libraries broke linking. CMake can however correctly invoke the linker if dependencies are decribed correctly (even if circular). This required changing all add_dependencies calls between libraries to target_link_libraries (which creates a link-time dependency) and linking all experiments to fail-sal. Change-Id: I3a0d5dddb9b3d963ef538814e20d6b3de85d4ec5	2014-03-24 11:47:46 +01:00
Florian Lukas	5567c595fb	DatabaseCampaign: experiment completion checks If the queue for outbound jobs is not unlimited, experiment rows are fetched from the DB server continuously as experiments finish. When this takes too long the connection to the DB server can be lost. The code did not check for a mysql_error and assumed the result set was fetched completely, thus skipping a potentially large amount of experiments (in our case only ~20000 of 400000+ experiments were run). This change adds checks to determine if the result fetch loop was finished due to an error and checks the sent pilot count to the unfinished experiment count. Additionally, the mysql result object is correctly freed. The underlying problem of MySQL connection loss can hopefully be prevented by increasing timeouts in the MySQL config as described in doc/how-to-build.txt. To prevent the problem from occurring when this is forgotten, this change reverts the default job queue length to be unlimited (SERVER_OUT_QUEUE_SIZE=0), at the cost of increased memory usage. Change-Id: I09d9faddd8190c6dd5fbe733a0679a733d5837ec	2014-03-21 11:36:38 +01:00
Florian Lukas	010d4a892d	DatabaseCampaign: fix finished experiments SQL The database queries to fetch all unfinished experiments were broken. The server tried to insert all finished pilot_ids into the temporary result_ids table and then discard all experiments which have the correct (finished) count of IDs in this table. This cannot work as the pilot_id is the only column of result_ids and must be a unique primary key. As a fix, the count of results is stored as a second field in result_ids and the result table is now joined against result_ids to check this field. Change-Id: I6a9fb774825f0cc4ce104c6e51d7b2fe16957aec	2014-03-18 11:18:27 +01:00
Florian Lukas	9df6d983bf	util/llvmdisassembler: compile with -fno-rtti For some reason, this is required even when LLVM is not built using -fno-rtti. Change-Id: I992799c8b54135a0a87b2de7c4a3d57f2d3670d9	2014-02-26 14:46:23 +01:00
Horst Schirmeier	5ccc6e3525	comm: ExperimentData needs a virtual destructor Classes deriving from ExperimentData usually contain the experiment-specific Protobuf message, which needs to be properly destroyed. This is particularly a problem in the generic DatabaseCampaign, as it never downcasts ExperimentData objects retrieved from JobServer::getDone(). As the embedded DatabaseCampaignMessage (usually named "fsppilot") is allocated on the heap (this happens in the campaign's cb_send_pilot() function, asking for a mutable_fsppilot()), the lack of a virtual destructor in ExperimentData led to a memory leak, rendering the campaign server inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process memory limit). Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce	2014-02-25 13:32:56 +01:00
Horst Schirmeier	5ee96032c9	jobserver: gracefully handle thread creation failures Due to the previous DatabaseCampaign fix, this may not be necessary anymore, but it's nevertheless a good idea to handle thread creation failures properly. Change-Id: I8317a77dd5338509727e737040944320e7755ae3	2014-02-25 13:32:56 +01:00
Horst Schirmeier	25a390970a	DatabaseCampaign: avoid table locking It is necessary to copy pilot IDs of existing results to a temporary table before fetching undone jobs from the DB: Otherwise, due to MyISAMs table-level locking, collect_result_thread() will block in INSERT (SHOW PROCESSLIST state "Waiting for table level lock") until the (streamed) pilot query finishes. As one pilot query follows after the other, collect_result_thread() may even starve until the memory for the JobServer's "done" queue runs out, resulting in a crash and the loss of all queued results. Change-Id: Ib0ec5fa84db466844b1e9aa0e94142b4d336b022	2014-02-25 13:32:55 +01:00
Horst Schirmeier	bc2103c527	sal/bochs: don't show errors in non-verbose mode The patched eCos variant we analyze intentionally overflows the 16550 UART FIFOs, flooding the terminal with Bochs error messages. Enabling CONFIG_BOCHS_NON_VERBOSE now also enforces ignoring error messages, regardless of log verbosity settings in the bochsrc. Change-Id: If14e2532234e61bf60720a45150ef4973e8d508b	2014-02-25 13:32:55 +01:00
Horst Schirmeier	1df43e9726	import-trace: major speedup Using Database::insert_multiple() instead of prepared statements speeds up trace import by a factor of 3-4. While being there, we now properly deal with nonexistent extended trace values (i.e., put NULLs into the DB). Side note: The ElfImporter should switch to insert_multiple(), too. Change-Id: I96785e9775e3ef4f242fd50720d5c34adb4e88a1	2014-02-25 13:32:55 +01:00
Horst Schirmeier	58fa4c59cc	sal/bochs: fix handling of unmapped memory Up to now, BochsMemory::isMapped() always returned true in 32-bit protected mode with a 4GB linear address space (as used by, e.g., eCos), even for addresses greater than the configured memory size. This led to lots of bogus memory dereferences in the (extended) tracing plugin. This change (a follow-up to commit `5171645`) additionally checks the return value of getHostMemAddr(), and announces BX_RW (read/write access) instead of BX_READ as the intended type of memory access. In the aforementioned scenario, memory addresses greater than the memory size are now correctly detected as "not mapped". Change-Id: Ic2fa7554c869cb90191164535a601bae4dbb49b6	2014-02-17 23:24:16 +01:00
Horst Schirmeier	4b921a5fe3	util: MemoryMap test Change-Id: I54680685326a85cfd723a47e8aef8d71662c9aeb	2014-01-30 15:26:20 +01:00
Horst Schirmeier	4bcce14659	util: space-efficient MemoryMap We now use boost::icl::interval_set internally, consuming extremely lower amounts of memory. boost::icl was introduced with Boost 1.46; Debian 7.0 comes with 1.49, so this dependency should be no problem anymore. Both the class interface and the memory-map file format stay the same. Change-Id: I38e8148384c90aa493984d0f6280817df00f1702	2014-01-30 15:26:12 +01:00
Richard Hellwig	119ae40be9	util/Database: added a wrapper function for mysql_real_escape_string() Change-Id: I999aad3c35c5f389fa3acfe8d7a11c417c478787	2014-01-28 11:07:34 +01:00
Richard Hellwig	13175c259b	import-trace: import debug info If the --debug option is set, the line number table of the elf binary will be imported into the database. The information will be stored in the "dbg_mapping" table. If the --sources option is set, the source files will be imported into the database. Only the files that were actually used in the elf binary will be imported. Change-Id: I0e9de6b456bc42b329c1700c25e5839d9552cdbb	2014-01-28 11:07:34 +01:00
Horst Schirmeier	c48c7296fb	util/WallclockTimer: bugfix: include ostream This only compiled everywhere because all users included (i)ostream. Change-Id: I29b0fb13a01606fdffd8ebdb9701eff652065916	2014-01-24 20:33:32 +01:00
Horst Schirmeier	85e3911202	Merge branch 'ubuntu-saucy-fixes'	2014-01-24 17:02:44 +01:00
Horst Schirmeier	17e76c140b	cpn: needs comm and MySQL at link time The dependency on fail-comm exists not only at compile time (the latter is due to protobuf header generation). Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30	2014-01-23 14:31:24 +01:00
Horst Schirmeier	4cb97a7fa5	formatting, typos, comments, details Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a	2014-01-22 13:08:13 +01:00
Horst Schirmeier	7591c9edc5	Merge branch 'jobclientserver-fixes'	2014-01-22 13:07:59 +01:00
Richard Hellwig	fa1690bd1f	Merge "core/sal: Added features that indicate whether FAIL* is initialized"	2014-01-21 15:35:22 +01:00
Horst Schirmeier	813414984c	util: boost::thread 1.53 depends on boost::system Unfortunately this implicit dependency is currently not resolved anywhere else (e.g., FindBoost.cmake), although the 'net heavily discusses this issue. Change-Id: I8a7c8518394cdba27e591fed250623011d988067	2014-01-21 00:29:34 +01:00
Lars Rademacher	4e21b42374	cpn: use strtoul for conversion of unsigned ints As 32-bit libc6 atoi() caps the value of unsigned ints bigger than 2^31-1 (instead of just letting it overflow to the corresponding negative value, as on x86_64), it must not be used especially for the conversion of 32-bit pointers. Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f	2014-01-21 00:10:56 +01:00
Horst Schirmeier	122eb8c9dc	use uint32 for addresses in protobuf msgs This prevents integer overflows when using addresses > 2GiB, which are common for x86 operating systems with paging (Linux, Fiasco.OC) or some test cases on the PandaBoard. Note that this results in slightly different result table definitions when automatically translating an experiment's protobuf message in the DatabaseCampaign. This change affects all existing protobuf messages to prevent copy/paste propagation of this issue. Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2	2014-01-21 00:08:41 +01:00
Horst Schirmeier	de39bf6120	jobclient: use initializer list Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10	2014-01-20 22:48:08 +01:00
Horst Schirmeier	5ffcb82138	jobclient: initial number of jobs configurable The new CLIENT_JOB_INITIAL configuration option allows to configure the client to request more than one job in the first request round. If a reasonable initial value is chosen, this removes the job ramp-up after each fail-client restart, and slightly improves overall throughput. Change-Id: Idac2721264ec264c520d341fac64a8311a974708	2014-01-20 22:48:08 +01:00
Horst Schirmeier	2c31bf79b0	jobclient: expect communication failures This change makes the JobClient act properly on communication aborts. Change-Id: I0a76489f117e9721546215e3b627002605e25452	2014-01-20 22:48:08 +01:00
Horst Schirmeier	882d4f381b	jobclient: bugfix: faster shutdown at campaign end The JobClient currently waits a LONG time until it really shuts down after not having reached the server in sendResultsToServer() (which is unfortunately the by far most probable point in the code to determine this): - A different bug (fixed in the previous commit) provoked the situation that a (way) too large amount of jobs was fetched before. - sendResult() (called after each experiment iteration) realized that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to prematurely call home to send first results (without planning to get new jobs yet). - If the server was gone (done, or aborted), connect in sendResultsToServer() failed after several retries and timeouts. - All subsequent calls to sendResult() retried connecting to the server (again, with retries and timeouts), once for each remaining job. - When all jobs were done, getParam() tries to connect a last time, finally telling the experiment that nobody's home. This resulted in client shutdown times of up to four hours (for the default CLIENT_JOB_LIMIT of 1000) after the campaign server terminated. This change solves the issue by not handing out new (cached) jobs after the connect failed once, making the experiment terminate quickly. Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7	2014-01-20 22:48:08 +01:00
Horst Schirmeier	ee7bc23d85	jobclient: bugfix: initialize timing statistics If we don't properly initialize the job timing statistics, the number of jobs to be requested in the second request to the server is based on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were requested at once. Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4	2014-01-20 22:48:08 +01:00
Horst Schirmeier	1f6e275e5e	jobserver: bugfix: potential race Delay insertion of to-be-sent jobs into m_runningJobs until they are really sent, as getMessage() won't work anymore (as in: segfault) if this job is concurrently re-sent (due to campaign end), its result is received, and deleted in the campaign. This becomes non-hypothetical with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC. Additionally, reinsert the remaining jobs into the input queue if communication fails, instead of inefficiently delaying redistribution until the campaign end. Change-Id: If85e3c8261deda86beb8d4d93343429223753f22	2014-01-20 22:48:08 +01:00
Horst Schirmeier	128b54b045	jobserver: outgoing jobqueue bounded by default Bounding the outgoing queue is always a good idea: If the campaign has separate threads for outgoing and incoming jobs (true for the DatabaseCampaign), this keeps memory requirements reasonable. If the campaign works in a single thread, this is not disadvantageous either. Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86	2014-01-20 22:48:08 +01:00
Horst Schirmeier	73adc71437	jobserver: use non-blocking accept To allow the JobServer to shutdown properly, the accept() loop in JobServer::run() needs to regularly check whether we're done. This change introduces a timed, non-blocking variant of accept() into SocketComm to achieve this. Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22	2014-01-20 22:48:08 +01:00
Horst Schirmeier	8671669053	jobserver: join remaining threads on shutdown To avoid accessing destroyed resources in CommThreads talking to clients, we need to properly join them on shutdown. The m_CommMutex becomes a JobServer member to make sure it isn't destroyed before the JobServer itself. Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0	2014-01-20 22:48:08 +01:00
Horst Schirmeier	8505ddbb04	jobserver: synchronization cleanup This change cleans up in/out queue synchronization in the job server. End-of-jobs conditions are now properly signaled through the SynchronizedQueue, allowing to resume and abort blocked readers when no more input is expected. Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906	2014-01-20 22:48:08 +01:00
Horst Schirmeier	5ac108ea4b	Merge branch 'mysql-concurrency-fixes'	2014-01-20 18:35:35 +01:00
Horst Schirmeier	8f9ee3fddd	DatabaseCampaign: run statistics update when finished Change-Id: Ib68e54ba82e988db0d2d74ffafa6dc9bd54cd272	2014-01-20 18:34:51 +01:00
Horst Schirmeier	33b63651ae	DatabaseCampaign: MySQL / concurrency fixes According to <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>, a MySQL connection handle must not be used concurrently with an open result set and mysql_use_result() in one thread (DatabaseCampaign::run()), and mysql_query() in another (DatabaseCampaign::collect_result_thread()). This indeed leads to crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE), and maybe even more insidous effects in other cases. The solution is to create separate connections for both threads. Additionally, call mysql_library_init() before spawning any threads. Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb	2014-01-20 18:34:51 +01:00
Michael Lenz	0534b503a6	Merge branch 'use_size_prefix-REMOVED'	2014-01-15 13:54:25 +01:00
Michael Lenz	9c984b9704	fail/cpn: (Database)Campaign no longer loses jobs Up until now the JobServer was silently losing jobs and only claiming to be finished - a workaround for this was to restart the campaign until all jobs were finished according to the database and the campaign's output. This change fixes the underlying problem, so a single campaign-run suffices and does no longer lose any jobs. Debugging this was awful and took us quite some time... Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c	2014-01-15 12:59:13 +01:00
Michael Lenz	abd9decf0b	fail/cpn: removed USE_SIZE_PREFIX from SocketComm This removes the ability to directly parse protobufs from the socket, because google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message; thus preventing us from sending multiple Message objects over a single socket. Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656	2014-01-15 12:56:38 +01:00
Richard Hellwig	3c7861ff06	core/sal: Added features that indicate whether FAIL* is initialized GEM5 throws a reset trap during initialization. This happens before the startup function is called. This leads to problems because the startup function fills the m_CPUs list. m_CPUs is needed for the TrapListener. Therefore, we only react on traps after initialization. This is needed in the following commit (see gem5/src/arch/arm/faults.cc). Change-Id: I9ec6fd453705feb54b4f8a87d024181323a2d7ef	2014-01-14 13:07:21 +01:00
Richard Hellwig	f359364888	sal/gem5: getTimerTicks(), getTimerTicksPerSecond() implemented Change-Id: I01fdb5e4bdd61fc761e93ef77904c830131c9ed6	2014-01-14 12:13:55 +01:00

1 2 3 4 5 ...

363 Commits