christoph/fail - fail - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lars Rademacher	146984f2fc	openocd: Added support for MMU memory watch AccessListener with long width are implemented as MMU page-faults Change-Id: I85208463b1f7eb3dbab187287caa387394a4af90	2014-01-22 17:54:09 +01:00
Lars Rademacher	809af0ae55	openocd: add cycle counter for trace timestamp Added performance monitor hw-function cycle count. Also fix for single-stepping exit, some additional register exits and prevention of reboot failures. Change-Id: I74196905dc39ecc14ae78366e7e1cb70ec7092f1	2014-01-22 17:54:03 +01:00
Lars Rademacher	1feab4fd54	sal: wrong include fixed Include of ArmArchitecture was misspelled Change-Id: Iba3e0a9f1b687cfcd640c74ad9d185f0ffabe510	2014-01-22 17:47:17 +01:00
Lars Rademacher	d4776cf628	panda: fix in breakpoints aspect Halt condition type not properly set Change-Id: I17c780216606b89a7c8a0ace03ac3788582d95ac	2014-01-22 17:47:17 +01:00
Lars Rademacher	98a478badd	openocd: arm register mapping Mapping register id (ArmArchitecture) to openocd register id. Change-Id: Id951ce1606e1720e7bc2fd7d6686cff8c1d5c9b4	2014-01-22 17:47:10 +01:00
Lars Rademacher	582459c5bb	panda: non-returning openocd-loop at terminate Previously for correct termination, the PandaController called the finish-function of the openocd wrapper, invoked a coroutine switch and waited for the openocd wrapper to finish up and switch coroutine again, so the PandaController could exit with correct exitStatus. Now the openocd-wrapper directly exits with chosen exit status. Change-Id: I8d318a4143c53340896ccee4d059a0d79fdcfe89	2014-01-22 17:43:31 +01:00
Lars Rademacher	0d2a5175cf	panda: comment fix & remove unimplemented functions Change-Id: Ibe533a41871bbf186272d6df43966dabb692dede	2014-01-22 17:43:31 +01:00
Lars Rademacher	db0b82daca	fail: modifications for pandaboard support Change-Id: I52d3c9b9862b206a000394c45126f0afdfee081f	2014-01-22 17:43:31 +01:00
Lars Rademacher	749631e21c	fail: add support for pandaboard Change-Id: I1525c9b36d58bf53ad238a553d914f183f983bba	2014-01-22 17:43:23 +01:00
Horst Schirmeier	4cb97a7fa5	formatting, typos, comments, details Change-Id: Iae5f1acb653a694622e9ac2bad93efcfca588f3a	2014-01-22 13:08:13 +01:00
Horst Schirmeier	7591c9edc5	Merge branch 'jobclientserver-fixes'	2014-01-22 13:07:59 +01:00
Richard Hellwig	fa1690bd1f	Merge "core/sal: Added features that indicate whether FAIL* is initialized"	2014-01-21 15:35:22 +01:00
Horst Schirmeier	813414984c	util: boost::thread 1.53 depends on boost::system Unfortunately this implicit dependency is currently not resolved anywhere else (e.g., FindBoost.cmake), although the 'net heavily discusses this issue. Change-Id: I8a7c8518394cdba27e591fed250623011d988067	2014-01-21 00:29:34 +01:00
Lars Rademacher	4e21b42374	cpn: use strtoul for conversion of unsigned ints As 32-bit libc6 atoi() caps the value of unsigned ints bigger than 2^31-1 (instead of just letting it overflow to the corresponding negative value, as on x86_64), it must not be used especially for the conversion of 32-bit pointers. Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f	2014-01-21 00:10:56 +01:00
Horst Schirmeier	122eb8c9dc	use uint32 for addresses in protobuf msgs This prevents integer overflows when using addresses > 2GiB, which are common for x86 operating systems with paging (Linux, Fiasco.OC) or some test cases on the PandaBoard. Note that this results in slightly different result table definitions when automatically translating an experiment's protobuf message in the DatabaseCampaign. This change affects all existing protobuf messages to prevent copy/paste propagation of this issue. Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2	2014-01-21 00:08:41 +01:00
Horst Schirmeier	de39bf6120	jobclient: use initializer list Change-Id: I7eb42f947bbabd61e1aad9224cedd7ffceec4f10	2014-01-20 22:48:08 +01:00
Horst Schirmeier	5ffcb82138	jobclient: initial number of jobs configurable The new CLIENT_JOB_INITIAL configuration option allows to configure the client to request more than one job in the first request round. If a reasonable initial value is chosen, this removes the job ramp-up after each fail-client restart, and slightly improves overall throughput. Change-Id: Idac2721264ec264c520d341fac64a8311a974708	2014-01-20 22:48:08 +01:00
Horst Schirmeier	2c31bf79b0	jobclient: expect communication failures This change makes the JobClient act properly on communication aborts. Change-Id: I0a76489f117e9721546215e3b627002605e25452	2014-01-20 22:48:08 +01:00
Horst Schirmeier	882d4f381b	jobclient: bugfix: faster shutdown at campaign end The JobClient currently waits a LONG time until it really shuts down after not having reached the server in sendResultsToServer() (which is unfortunately the by far most probable point in the code to determine this): - A different bug (fixed in the previous commit) provoked the situation that a (way) too large amount of jobs was fetched before. - sendResult() (called after each experiment iteration) realized that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to prematurely call home to send first results (without planning to get new jobs yet). - If the server was gone (done, or aborted), connect in sendResultsToServer() failed after several retries and timeouts. - All subsequent calls to sendResult() retried connecting to the server (again, with retries and timeouts), once for each remaining job. - When all jobs were done, getParam() tries to connect a last time, finally telling the experiment that nobody's home. This resulted in client shutdown times of up to four hours (for the default CLIENT_JOB_LIMIT of 1000) after the campaign server terminated. This change solves the issue by not handing out new (cached) jobs after the connect failed once, making the experiment terminate quickly. Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7	2014-01-20 22:48:08 +01:00
Horst Schirmeier	ee7bc23d85	jobclient: bugfix: initialize timing statistics If we don't properly initialize the job timing statistics, the number of jobs to be requested in the second request to the server is based on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were requested at once. Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4	2014-01-20 22:48:08 +01:00
Horst Schirmeier	1f6e275e5e	jobserver: bugfix: potential race Delay insertion of to-be-sent jobs into m_runningJobs until they are really sent, as getMessage() won't work anymore (as in: segfault) if this job is concurrently re-sent (due to campaign end), its result is received, and deleted in the campaign. This becomes non-hypothetical with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC. Additionally, reinsert the remaining jobs into the input queue if communication fails, instead of inefficiently delaying redistribution until the campaign end. Change-Id: If85e3c8261deda86beb8d4d93343429223753f22	2014-01-20 22:48:08 +01:00
Horst Schirmeier	128b54b045	jobserver: outgoing jobqueue bounded by default Bounding the outgoing queue is always a good idea: If the campaign has separate threads for outgoing and incoming jobs (true for the DatabaseCampaign), this keeps memory requirements reasonable. If the campaign works in a single thread, this is not disadvantageous either. Change-Id: Ic75272daa8266f051adf7b23e2ffe87f5c965b86	2014-01-20 22:48:08 +01:00
Horst Schirmeier	73adc71437	jobserver: use non-blocking accept To allow the JobServer to shutdown properly, the accept() loop in JobServer::run() needs to regularly check whether we're done. This change introduces a timed, non-blocking variant of accept() into SocketComm to achieve this. Change-Id: Id411096be816c4ed6c7b0b37674410e22152eb22	2014-01-20 22:48:08 +01:00
Horst Schirmeier	8671669053	jobserver: join remaining threads on shutdown To avoid accessing destroyed resources in CommThreads talking to clients, we need to properly join them on shutdown. The m_CommMutex becomes a JobServer member to make sure it isn't destroyed before the JobServer itself. Change-Id: I35b9fb93ace08a7a9476650f8f5e93597a3a8aa0	2014-01-20 22:48:08 +01:00
Horst Schirmeier	8505ddbb04	jobserver: synchronization cleanup This change cleans up in/out queue synchronization in the job server. End-of-jobs conditions are now properly signaled through the SynchronizedQueue, allowing to resume and abort blocked readers when no more input is expected. Change-Id: I3eaf37115ccf8c5b5afe3d971c7109cd62b68906	2014-01-20 22:48:08 +01:00
Horst Schirmeier	5ac108ea4b	Merge branch 'mysql-concurrency-fixes'	2014-01-20 18:35:35 +01:00
Horst Schirmeier	8f9ee3fddd	DatabaseCampaign: run statistics update when finished Change-Id: Ib68e54ba82e988db0d2d74ffafa6dc9bd54cd272	2014-01-20 18:34:51 +01:00
Horst Schirmeier	33b63651ae	DatabaseCampaign: MySQL / concurrency fixes According to <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html>, a MySQL connection handle must not be used concurrently with an open result set and mysql_use_result() in one thread (DatabaseCampaign::run()), and mysql_query() in another (DatabaseCampaign::collect_result_thread()). This indeed leads to crashes when bounding the outgoing job queue (SERVER_OUT_QUEUE_SIZE), and maybe even more insidous effects in other cases. The solution is to create separate connections for both threads. Additionally, call mysql_library_init() before spawning any threads. Change-Id: I2981f2fdc67c9a2cbe8781f1a21654418f621aeb	2014-01-20 18:34:51 +01:00
Michael Lenz	0534b503a6	Merge branch 'use_size_prefix-REMOVED'	2014-01-15 13:54:25 +01:00
Michael Lenz	9c984b9704	fail/cpn: (Database)Campaign no longer loses jobs Up until now the JobServer was silently losing jobs and only claiming to be finished - a workaround for this was to restart the campaign until all jobs were finished according to the database and the campaign's output. This change fixes the underlying problem, so a single campaign-run suffices and does no longer lose any jobs. Debugging this was awful and took us quite some time... Change-Id: Ie6c982cc3b2ce11128941f1f13be563bae22565c	2014-01-15 12:59:13 +01:00
Michael Lenz	abd9decf0b	fail/cpn: removed USE_SIZE_PREFIX from SocketComm This removes the ability to directly parse protobufs from the socket, because google::protobuf::Message::ParseFromFileDescriptor() needs a EOF after each message; thus preventing us from sending multiple Message objects over a single socket. Change-Id: I67c0f631071470d6e0ae597e42848036a6db3656	2014-01-15 12:56:38 +01:00
Richard Hellwig	3c7861ff06	core/sal: Added features that indicate whether FAIL* is initialized GEM5 throws a reset trap during initialization. This happens before the startup function is called. This leads to problems because the startup function fills the m_CPUs list. m_CPUs is needed for the TrapListener. Therefore, we only react on traps after initialization. This is needed in the following commit (see gem5/src/arch/arm/faults.cc). Change-Id: I9ec6fd453705feb54b4f8a87d024181323a2d7ef	2014-01-14 13:07:21 +01:00
Richard Hellwig	f359364888	sal/gem5: getTimerTicks(), getTimerTicksPerSecond() implemented Change-Id: I01fdb5e4bdd61fc761e93ef77904c830131c9ed6	2014-01-14 12:13:55 +01:00
Horst Schirmeier	ab9c0edf10	DatabaseCampaign: run jobs for known-outcome exps, too Although we know that a known_outcome=1 pilot does not exhibit behavior different from the golden run, the database schema does not yet know what this behavior looks like (in terms of result-table column values). In order to be able to JOIN valid results for all memory writes in the trace table (fspgroup maps them all onto one pilot per variant), we need to run these experiments, too. Additionally, don't join the fspgroup table; we only need this one for result calculations afterwards. Change-Id: Idcd2991274fede84526b1eee68a231774625d11a	2013-12-05 19:27:44 +01:00
Richard Hellwig	bd91549367	Merge "gem5: restore works now"	2013-11-13 17:20:53 +01:00
Richard Hellwig	45e0b41022	gem5: restore works now The function restore(PATH) can now be used to restore a checkpoint. Change-Id: I25faf9f6335261d2b3ade4185eae93983ece9f97	2013-11-13 17:15:19 +01:00
Richard Hellwig	f31548c026	Merge "core/sal: register issue fixed"	2013-11-13 16:08:40 +01:00
Richard Hellwig	3bf64351a4	core/sal: register issue fixed Before, it was not possible to add registers in arbitrary order. Change-Id: I952c03ea4339da2cdaf34bd4546c76c33cecd4cd	2013-11-01 17:26:26 +01:00
Christian Dietrich	5171645d9a	plugin/tracing: fix extended trace on umapped memory areas When a register in the extended trace was dereferenced and the value was smaller than the memory pool size, but the address was not mapped an assertion occured and the tracing plugin terminated the simulator. Now the dereferenced memory address is checked for being mapped and not being smaller than the memory pool. Change-Id: I9ac954988ef860969679f9f360814c5e4b66f473	2013-10-28 15:09:35 +01:00
Horst Schirmeier	ec969603d5	Merge commit 'dcd2c021a5ac91d38187d397914e5f51e2fc8819' Conflicts: tools/import-trace/RegisterImporter.cc Change-Id: I4f49c976bd60badba73c15746aa03c420cb9f77b	2013-09-11 14:38:55 +02:00
Christian Dietrich	d26fc28fa4	cpn/database: include data_width in the fsppilot during prune step During the prune step the data_width of the injected location was not propagated before. It is now stored in fsppilot (database layout change!) and sent in the fsppilot protobuf message. Change-Id: I0562f6fc8957adea0f8a9fb63469ca5e3f4b7b2d	2013-09-11 10:27:04 +02:00
Horst Schirmeier	dcd2c021a5	util: global lock for certain MySQL operations Even the reentrant libmysqlclient_r has some non-threadsafe operations, which need to be protected by a global mutex. <http://dev.mysql.com/doc/refman/5.5/en/c-api-threaded-clients.html> Change-Id: I444e42f82cf982a6c8f8f2596e8991d0a5009b28	2013-09-10 18:35:44 +02:00
Horst Schirmeier	ba7c663551	import-trace: introduce AdvancedMemoryImporter A MemoryImporter that additionally imports Relyzer-style conditional branch history, instruction opcodes, and a virtual duration=time2-time1+1 column (MariaDB 5.2+ only) for fault-space pruning purposes. Change-Id: I6764a26fa8aae21655be44134b88fdee85e67ff6	2013-09-10 17:37:26 +02:00
Horst Schirmeier	12b539ff75	misc cleanups This change touches several subsystems, tools and experiments (sal, util, cmake, import-trace, generic-tracing, nanojpeg), and changes details not worth separate commits. Change-Id: Icd1d664d1be5cfc2212dbf77801c271183214d08	2013-09-10 17:37:25 +02:00
Horst Schirmeier	25d88bf93a	import-trace: import extended traces This tool can now import extended trace information with the --extended-trace command-line parameter. The existing importers cease using artificial access_info_t objects in favor of passing through the original Trace_Event wherever possible. This allows us to import extended trace information for all importers. Change-Id: I3613e9d05d5e69ad49e96f4dc5ba0b1c4ef95a11	2013-09-10 17:37:25 +02:00
Horst Schirmeier	96f2f56d5e	Merge branch 'register-mapping-fixes'	2013-09-10 11:46:58 +02:00
Horst Schirmeier	11513ef78d	util: handle missing register mapping gracefully It's OK if we cannot map every register LLVM knows to a Fail register ID, but we need to explicitly skip these cases in the RegisterImporter. Change-Id: I2152f819fb94aa4de5720c5798b229b66988d382	2013-09-09 16:14:35 +02:00
Horst Schirmeier	e4a5a7a592	util: gzstream needs zlib This change is needed to build on Ubuntu 13.04. Change-Id: I683ed4427044264f58bc8f7c94cb5fbbff89cd95	2013-09-08 22:15:14 +02:00
Horst Schirmeier	6d4dfeb913	shutdown cleanups revisited This change became necessary as we observed weird fail-client SIGSEGV crashes with both Bochs and Gem5 backends and different experiments. Some Fail* components are instantiated statically: the SimulatorController instance "simulator", containing the ListenerManager and the CoroutineManager, and the active ExperimentFlow subclass(es) (experiments/instantiate-experiment*.ah.in). The experiment(s) is registered as an active flow in the CoroutineManager at startup. As plugins (which are ExperimentFlows themselves) are often created on an experiment's stack, ExperimentFlows deregister themselves on destruction (e.g., when leaving the plugin variable's scope). The core problem is, that the creation and destruction order of statically instantiated objects depends on the link order; if the experiment is destroyed after the CoroutineManager, its automatic self-deregistering feature talks to the smoking ruins of the latter. This change removes all static instantiations of ExperimentFlow and replaces them with constructions on the heap. Additionally it makes sure that the CoroutineManager recognizes that a shutdown is in progress, and refrains from touching potentially already destroyed data structures when a (mistakenly globally instantiated) ExperimentFlow deregisters in this case. Change-Id: I8a7d42fb141222cd2cce6040ab1a01f9de61be24	2013-09-04 10:13:48 +02:00
Horst Schirmeier	203ec6c5cc	remove #ifndef __puma from code using LLVM Contemporary AspectC++ versions can deal with the LLVM headers very well, and #ifdef __puma stuff in Fail* headers results in unmaintainable #ifdef __puma blocks in other parts of Fail* (e.g., the trace importer). Make sure you're using a 64-bit ac++ when living in a 64-bit userland (the 32-bit version doesn't know about __int128), and be aware that AspectC++ r325 introduced a regression that has not been fixed yet. Change-Id: I5bb759b08995a74b020d44a2b40e9d7a6e18111c	2013-09-04 10:13:48 +02:00

... 3 4 5 6 7 ...

546 Commits