Classes deriving from ExperimentData usually contain the
experiment-specific Protobuf message, which needs to be properly
destroyed. This is particularly a problem in the generic
DatabaseCampaign, as it never downcasts ExperimentData objects
retrieved from JobServer::getDone(). As the embedded
DatabaseCampaignMessage (usually named "fsppilot") is allocated on the
heap (this happens in the campaign's cb_send_pilot() function, asking
for a mutable_fsppilot()), the lack of a virtual destructor in
ExperimentData led to a memory leak, rendering the campaign server
inoperable after handling ~1E7 messages (with a 4GiB / 32-bit process
memory limit).
Change-Id: I4cb8a26d5a702e03189c4aae340051ce62a9c9ce
Due to the previous DatabaseCampaign fix, this may not be necessary
anymore, but it's nevertheless a good idea to handle thread creation
failures properly.
Change-Id: I8317a77dd5338509727e737040944320e7755ae3
It is necessary to copy pilot IDs of existing results to a temporary table
before fetching undone jobs from the DB: Otherwise, due to MyISAMs
table-level locking, collect_result_thread() will block in INSERT (SHOW
PROCESSLIST state "Waiting for table level lock") until the (streamed)
pilot query finishes. As one pilot query follows after the other,
collect_result_thread() may even starve until the memory for the
JobServer's "done" queue runs out, resulting in a crash and the loss of all
queued results.
Change-Id: Ib0ec5fa84db466844b1e9aa0e94142b4d336b022
The patched eCos variant we analyze intentionally overflows the 16550
UART FIFOs, flooding the terminal with Bochs error messages. Enabling
CONFIG_BOCHS_NON_VERBOSE now also enforces ignoring error messages,
regardless of log verbosity settings in the bochsrc.
Change-Id: If14e2532234e61bf60720a45150ef4973e8d508b
Before this change, running prune-trace with, e.g.
"prune-trace -d fsp_mibench -v bitmap% --benchmark-exclude clockcnv"
resulted in an implied "--benchmark none", rendering --benchmark-exclude
ineffective and resulting in nothing being pruned. Now, the "none" default
only applies when neither --benchmark nor --benchmark-exclude (analogously
for --variant / --variant-exclude) is provided.
Change-Id: Ic7c88919d7cfde1261749a745dc6a679472ff348
Using Database::insert_multiple() instead of prepared statements
speeds up trace import by a factor of 3-4. While being there, we now
properly deal with nonexistent extended trace values (i.e., put NULLs
into the DB).
Side note: The ElfImporter should switch to insert_multiple(), too.
Change-Id: I96785e9775e3ef4f242fd50720d5c34adb4e88a1
At least for the Bochs backend there might be side effects when saving
the simulator state while tracing, which therefore should be avoided.
As there is no known use-case for using a --save-symbol different to
--start-symbol, this change disables the semantics behind
--save-symbol completely and only keeps the command-line switch for
backward compatibility reasons (existing automatic test scripts etc.).
The generic-tracing experiment now complains and aborts if a
--save-symbol different to --start-symbol is given.
Change-Id: I6072d846be96e016534cc83db375a400cfc25303
With m_tracetype=TRACE_MEM, bool first was never reset to false in the
tracing plugin's main loop. This bug was most probably never
triggered, though, as nobody only traces memory accesses.
This change also slightly simplifies the internal logic in the tracing
plugin.
Change-Id: I65d7df6a3781ec552cfb892bbf3394b421e227f1
A simple plugin which deterministically returns a new random value
each time the specified symbol is read.
Change-Id: I6ccac421fc064f02a88e8b126f8a26044d1f51c6
As we copy a 32-bit word from the dereferenced address, we also need to
check whether address+3 is also mapped. (Yes, I've seen this in the
wild.)
Change-Id: I43f891c56e077333670c9cb48c0ee8e9342fa41d
Up to now, BochsMemory::isMapped() always returned true in 32-bit protected
mode with a 4GB linear address space (as used by, e.g., eCos), even for
addresses greater than the configured memory size. This led to lots of
bogus memory dereferences in the (extended) tracing plugin.
This change (a follow-up to commit 5171645) additionally checks the return
value of getHostMemAddr(), and announces BX_RW (read/write access) instead
of BX_READ as the intended type of memory access. In the aforementioned
scenario, memory addresses greater than the memory size are now correctly
detected as "not mapped".
Change-Id: Ic2fa7554c869cb90191164535a601bae4dbb49b6
Based on the database layout given by the pruner.
Run ./run.py -c <path to mysql.cnf>
(Default config ~/.my.cnf)
- Checks if objdump table exists
- Added view for results per instruction
- Added config file support for table details
- Overview data loaded at server startup
- Result type mapping configurable via config file
Based on Flask and MySQLdb
Change-Id: Ib49eac8f5c1e0ab23921aedb5bc53c34d0cde14d
Memory accesses that don't belong to the preceding IP event in the
trace *do* have a use case: a hardware interrupt causes the CPU to
push its state onto the (kernel) stack. At the moment we cannot
distinguish this case from a malformed trace (as we don't record the
occurrence of interrupts), hence this warning needs to be disabled for
now.
This reverts commit 84edd02b6f.
cluster runs.
If this output file is enabled, all running processes try to write to the
same file on the shared filesystem. They block each other which leads to
massive I/O wait time and CPU idle time.
This change reduces the runtime e.g. from several hours (12+) to few minutes
(20).
Change-Id: I028628af31c845fc517e5daca5b4f981eade3cf4
We now use boost::icl::interval_set internally, consuming extremely
lower amounts of memory. boost::icl was introduced with Boost 1.46;
Debian 7.0 comes with 1.49, so this dependency should be no problem
anymore.
Both the class interface and the memory-map file format stay the same.
Change-Id: I38e8148384c90aa493984d0f6280817df00f1702
If the --debug option is set, the line number table of the elf binary will
be imported into the database. The information will be stored in the
"dbg_mapping" table.
If the --sources option is set, the source files will be imported
into the database. Only the files that were actually used in the
elf binary will be imported.
Change-Id: I0e9de6b456bc42b329c1700c25e5839d9552cdbb
Differences:
- the task activation order is determined in the faulty experiment as
well as in the golden run (which is now done by
fail-generic-tracing) by observing a variable fail_virtual_port.
- There is a panic value read from the fail_virtual_port
- The golden run task activation is determined by giving an extended
trace to task_activation.py. The script collects all writes to
fail_virtual_port, and determines the activation from this.
Change-Id: Id401b78933b45a4b2cf031fc0a8b5ac90151ec24
The dependency on fail-comm exists not only at compile time (the
latter is due to protobuf header generation).
Change-Id: I2bae51e763d9a385bda94e77df3e88619fa28a30
In some cases the write-pilot is located at the upper boundary of the
experiment and thus is in a race situation with the experiment's end.
If the experiment's end occurs first, the campaign ends and complains
about missing data, otherwise everything is fine.
This patch circumvents this via using "the first" writing pilot; iff the
only write is located at the experiment's end, the race will still occur,
but cleverly written experiment code can, according to hsc, circumvent it.
Change-Id: I6a27a8c4770c04ea8dcaef8aa7bd85d18f43f0b5
Unfortunately this implicit dependency is currently not resolved anywhere
else (e.g., FindBoost.cmake), although the 'net heavily discusses this
issue.
Change-Id: I8a7c8518394cdba27e591fed250623011d988067
As 32-bit libc6 atoi() caps the value of unsigned ints bigger than
2^31-1 (instead of just letting it overflow to the corresponding
negative value, as on x86_64), it must not be used especially for the
conversion of 32-bit pointers.
Change-Id: Ie0821a6f4cd04aebd37ea3d4028b63a05373810f
This prevents integer overflows when using addresses > 2GiB, which are
common for x86 operating systems with paging (Linux, Fiasco.OC) or
some test cases on the PandaBoard.
Note that this results in slightly different result table definitions
when automatically translating an experiment's protobuf message in the
DatabaseCampaign.
This change affects all existing protobuf messages to prevent
copy/paste propagation of this issue.
Change-Id: I09ec4b9d45eddd67a7a24c8b101e8b2b258df5e2
The new CLIENT_JOB_INITIAL configuration option allows to configure
the client to request more than one job in the first request round.
If a reasonable initial value is chosen, this removes the job ramp-up
after each fail-client restart, and slightly improves overall
throughput.
Change-Id: Idac2721264ec264c520d341fac64a8311a974708
The JobClient currently waits a LONG time until it really shuts down
after not having reached the server in sendResultsToServer() (which is
unfortunately the by far most probable point in the code to determine
this):
- A different bug (fixed in the previous commit) provoked the
situation that a (way) too large amount of jobs was fetched
before.
- sendResult() (called after each experiment iteration) realized
that CLIENT_JOB_REQUEST_SEC seconds are over, and tried to
prematurely call home to send first results (without planning to
get new jobs yet).
- If the server was gone (done, or aborted), connect in
sendResultsToServer() failed after several retries and timeouts.
- All subsequent calls to sendResult() retried connecting to the
server (again, with retries and timeouts), once for each remaining
job.
- When all jobs were done, getParam() tries to connect a last time,
finally telling the experiment that nobody's home.
This resulted in client shutdown times of up to four hours (for the
default CLIENT_JOB_LIMIT of 1000) after the campaign server
terminated. This change solves the issue by not handing out new
(cached) jobs after the connect failed once, making the experiment
terminate quickly.
Change-Id: I0d8cb2e084d783aca74c51a503fa72eb2b2eb0b7
If we don't properly initialize the job timing statistics, the number
of jobs to be requested in the second request to the server is based
on the wrong timings. In our test case, CLIENT_JOB_LIMIT jobs were
requested at once.
Change-Id: I7e9d8ab6fe14e4488b3a74baf061d9a07f3a77c4
Delay insertion of to-be-sent jobs into m_runningJobs until they are
really sent, as getMessage() won't work anymore (as in: segfault) if
this job is concurrently re-sent (due to campaign end), its result is
received, and deleted in the campaign. This becomes non-hypothetical
with larger values for CLIENT_JOB_LIMIT and CLIENT_JOB_REQUEST_SEC.
Additionally, reinsert the remaining jobs into the input queue if
communication fails, instead of inefficiently delaying redistribution
until the campaign end.
Change-Id: If85e3c8261deda86beb8d4d93343429223753f22