Initial commit
This commit is contained in:
645
chap/implementation.tex
Normal file
645
chap/implementation.tex
Normal file
@ -0,0 +1,645 @@
|
||||
\chapter{Implementation}
|
||||
\label{ch:implementation}
|
||||
|
||||
This section will detail how and to what extent the APIC support was implemented. The steps
|
||||
performed in this implementation are described in a generally applicable way, concrete
|
||||
implementation details with hhuOS as an example are found in \autoref{ch:listings}.
|
||||
\autoref{sec:hhuosintegration} deals with the modifications done to the hhuOS system to integrate
|
||||
the APIC implementation.
|
||||
|
||||
\clearpage
|
||||
|
||||
\section{Design Decisions and Scope}
|
||||
\label{sec:design}
|
||||
|
||||
The APIC interrupt architecture is split into multiple hardware components and tasks: The
|
||||
(potentially multiple) local APICs, the (usually single) I/O APIC and the APIC timer (part of each
|
||||
local APIC). Furthermore, the APIC system needs to interact with its memory mapped registers and
|
||||
the hhuOS ACPI subsystem, to gather information about the CPU topology and IRQ overrides. Also, the
|
||||
OS should be able to interact with the APIC system in a simple and easy manner, without needing to
|
||||
know all of its individual parts.
|
||||
|
||||
To keep the whole system structured and simple, the implementation is split into the following main
|
||||
components (see \autoref{fig:implarch}):
|
||||
|
||||
\begin{itemize}
|
||||
\item \code{LocalApic}: Provides functionality to interact with the local APIC
|
||||
(masking and unmasking, register access, etc.).
|
||||
\item \code{IoApic}: Provides functionality to interact with the I/O APIC (masking and
|
||||
unmasking, register access, etc.)
|
||||
\item \code{ApicTimer}: Provides functionality to calibrate the APIC timer and handle its interrupts.
|
||||
\item \code{Apic}: Condenses all the functionality above and exposes it to other parts
|
||||
of the OS\@.
|
||||
\end{itemize}
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}[b]{0.5\textwidth}
|
||||
\includesvg[width=1.0\linewidth]{img/Architecture.svg}
|
||||
\end{subfigure}
|
||||
\caption{Caller hierarchy of the main components.}
|
||||
\label{fig:implarch}
|
||||
\end{figure}
|
||||
|
||||
This implementation is targeted to support systems with a single I/O APIC\footnote{Operation with
|
||||
more than one I/O APIC is described further in the MultiProcessor
|
||||
specification~\cite[sec.~3.6.8]{mpspec}.}, because consumer hardware typically only uses a single
|
||||
one, and so does QEMU emulation. General information on implementing multiple I/O APIC support can
|
||||
be found in \autoref{subsec:multiioapic}.
|
||||
|
||||
With the introduction of ACPI's GSIs to the OS, new types are introduced to clearly differentiate
|
||||
different representations of interrupts and prevent unintentional conversions:
|
||||
|
||||
\begin{itemize}
|
||||
\item \code{GlobalSystemInterrupt}: ACPI's interrupt abstraction, detached from the hardware
|
||||
interrupt lines.
|
||||
\item \code{InterruptRequest}: Represents an IRQ, allowing the OS to address interrupts by
|
||||
the name of the device that triggers it. When the APIC is used, IRQ overrides map IRQs to GSIs.
|
||||
\item \code{InterruptVector}: Represents an interrupt's vector number, as used by the
|
||||
\code{InterruptDispatcher}. The dispatcher maps interrupt vectors to interrupt handlers.
|
||||
\end{itemize}
|
||||
|
||||
Both BIOS and UEFI are supported by hhuOS, but the hhuOS ACPI subsystem is currently only available
|
||||
with BIOS\footnote{State from 11/02/23.}, so when hhuOS is booted using UEFI, APIC support can't be
|
||||
enabled. Also, the APIC can handle MSIs, but they are not included in this implementation, as hhuOS
|
||||
currently does not utilize them.
|
||||
|
||||
SMP systems are partially supported: The APs are initialized, but only to a busy-looping state, as
|
||||
hhuOS currently is a single-core OS and lacks some required infrastructure. All interrupts are
|
||||
handled using the BSP\@.
|
||||
|
||||
Summary of features that are outside the scope of this thesis:
|
||||
|
||||
\begin{itemize}
|
||||
\item Operation with a discrete APIC or x2Apic.
|
||||
\item Interrupts with logical destinations or custom priorities.
|
||||
\item Returning from APIC operation to PIC mode\footnote{This would be theoretically possible with
|
||||
single-core hardware, but probably useless.}.
|
||||
\item Relocation of a local APIC's MMIO memory region\footnote{Relocation is possible by writing a new
|
||||
physical APIC base address to the IA32\textunderscore{}APIC\textunderscore{}BASE MSR.}.
|
||||
\item Distributing external interrupts to different APs in SMP enabled systems.
|
||||
\item Usage of the system's performance counter or monitoring interrupts.
|
||||
\item Meaningful APIC error handling.
|
||||
\item Handling of MSIs.
|
||||
\end{itemize}
|
||||
|
||||
To be able to easily extend an APIC implementation for single-core systems to SMP systems, some
|
||||
things have to be taken into account:
|
||||
|
||||
\begin{itemize}
|
||||
\item SMP systems need to manage multiple \code{LocalApic} and \code{ApicTimer} instances. This is
|
||||
handled by the \code{Apic} class.
|
||||
\item Initialization of the different components can no longer happen at the same ``location'': The local
|
||||
APICs and APIC timers of additional APs have to be initialized by the APs themselves, because the
|
||||
BSP can not access an AP's registers.
|
||||
\item APs are only allowed to access instances of APIC classes that belong to them.
|
||||
\item Interrupt handlers that get called on multiple APs may need to take the current processor into
|
||||
account (for example the APIC timer interrupt handler).
|
||||
\item Register access has to be synchronized, if it is performed in multiple operations on the same
|
||||
address space.
|
||||
\end{itemize}
|
||||
|
||||
\section{Code Style}
|
||||
\label{sec:codestyle}
|
||||
|
||||
Individual state of local and I/O APICs is managed through instances of their respective classes.
|
||||
Because each CPU core can only access the local APIC contained in itself, this can create a
|
||||
misconception: It is not possible to (e.g.) allow an interrupt in a certain local APIC by calling a
|
||||
function on a certain \code{LocalApic} instance. This is communicated through code by declaring a
|
||||
range of functions as \code{static}. It is also in direct contrast to the \code{IoApic} class: I/O
|
||||
APICs can be addressed by instances, because they are not part of the CPU core: Each core can
|
||||
always access all I/O APICs (if there are multiple).
|
||||
|
||||
Error checking is done to a small extent in this implementation: Publicly exposed functions (from
|
||||
the \code{Apic} class) do check for invalid invocations, but the internally used classes do not
|
||||
protect their invariants, because they are not used directly by other parts of the OS. These
|
||||
classes only selectively expose their interfaces (by using the \code{friend} declaration) for the
|
||||
same reason.
|
||||
|
||||
\clearpage
|
||||
|
||||
\section{Local APIC}
|
||||
\label{sec:lapicinit}
|
||||
|
||||
The local APIC block diagram (see \autoref{fig:localapicblock}) shows a number of registers that
|
||||
are required for initialization:
|
||||
|
||||
\begin{itemize}
|
||||
\item \textbf{\gls{lvt}}: Used to configure how local interrupts are handled.
|
||||
\item \textbf{\gls{svr}}: Contains the software-enable flag and the spurious interrupt vector number.
|
||||
\item \textbf{\gls{tpr}}: Decides the order in which interrupts are handled and a possible interrupt
|
||||
priority threshold, to ignore low priority interrupts.
|
||||
\item \textbf{\gls{icr}}: Controls the sending of IPIs for starting up APs in SMP enabled systems.
|
||||
\item \textbf{\gls{apr}}: Controls the priority in the APIC arbitration mechanism.
|
||||
\end{itemize}
|
||||
|
||||
There are multiple registers associated with the LVT, those belong to the different local
|
||||
interrupts the local APIC can handle. Local interrupts this implementation is concerned about are
|
||||
listed below:
|
||||
|
||||
\begin{itemize}
|
||||
\item LINT1: The local APIC's NMI source.
|
||||
\item Timer: Periodic interrupt triggered by the APIC timer.
|
||||
\item Error: Triggered by errors in the APIC system (e.g.\ invalid vector numbers or corrupted messages
|
||||
in internal APIC communication).
|
||||
\end{itemize}
|
||||
|
||||
The LINT0 interrupt is unlisted, because it is mainly important for virtual wire mode (it can be
|
||||
triggered by external interrupts from the PIC). The performance and thermal monitoring interrupts
|
||||
also remain unused in this implementation.
|
||||
|
||||
Besides the local APIC's own registers, the IMCR and \textbf{\gls{ia32 apic base msr}} also require
|
||||
initialization (described in \autoref{subsec:lapicenable}).
|
||||
|
||||
After system power-up, the local APIC is in the following state~\cite[sec.~3.11.4.7]{ia32}:
|
||||
|
||||
\begin{itemize}
|
||||
\item IRR, ISR and TPR are reset to \code{0x00000000}.
|
||||
\item The LVT is reset to \code{0x00000000}, except for the masking bits (all local interrupts are masked
|
||||
on power-up).
|
||||
\item The SVR is reset to \code{0x000000FF}.
|
||||
\item The APIC is in xApic mode, even if x2Apic support is present.
|
||||
\item Only the BSP is enabled, other APs have to be enabled by the BSP's local APIC\@.
|
||||
\end{itemize}
|
||||
|
||||
The initialization sequence consists of these steps:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Enable symmetric I/O mode and set the APIC operating mode.
|
||||
\item Initialize the LVT and NMI\@.
|
||||
\item Initialize the SVR\@.
|
||||
\item Clear outstanding signals.
|
||||
\item Initialize the TPR\@.
|
||||
\item Initialize the APIC timer and error handler.
|
||||
\item Startup the APs and initialize their respective local APICs.
|
||||
\item Synchronize the APRs.
|
||||
\end{enumerate}
|
||||
|
||||
\subsection{Accessing Local APIC Registers in xApic Mode}
|
||||
\label{subsec:xapicregacc}
|
||||
|
||||
Accessing registers in xApic mode is done via MMIO and requires a single page (4 KB) of memory,
|
||||
mapped to the ``APIC Base'' address, which can be obtained by reading the
|
||||
IA32\textunderscore{}APIC\textunderscore{}BASE MSR (see
|
||||
\autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) or parsing the MADT (see
|
||||
\autoref{tab:madt}). The IA-32 manual specifies a caching strategy of ``strong
|
||||
uncachable''\footnote{See IA-32 manual for information on this caching
|
||||
strategy~\cite[sec.~3.12.3]{ia32}.}~\cite[sec.~3.11.4.1]{ia32} for this region (see
|
||||
\autoref{sec:apxxapicregacc} for the example implementation).
|
||||
|
||||
The address offsets (from the base address) for the local APIC registers are listed in the IA-32
|
||||
manual~\cite[sec.~3.11.4.1]{ia32} and in \autoref{tab:lapicregs}.
|
||||
|
||||
\subsection{Enabling the Local APIC}
|
||||
\label{subsec:lapicenable}
|
||||
|
||||
The following steps have to be executed before any interrupt handling has been enabled by the OS\@.
|
||||
|
||||
Because the system boots in PIC mode, \code{0x01} should be written to the
|
||||
IMCR~\cite[sec.~3.6.2.1]{mpspec} to disconnect the PIC from the local APIC's LINT0 pin (see
|
||||
\autoref{fig:integratedapic}, for the example implementation see \autoref{sec:apxdisablepic}).
|
||||
|
||||
To set the operating mode, it is first determined which modes are supported by executing the
|
||||
\code{cpuid} instruction with \code{eax=1}: If bit 9 of the \code{edx} register is set, xApic mode
|
||||
is supported, if bit 21 of the \code{ecx} register is set, x2Apic mode is
|
||||
supported~\cite[sec.~5.1.2]{cpuid}.
|
||||
|
||||
If xApic mode is supported (if the local APIC is an integrated APIC), it will be in that mode
|
||||
already. The ``global enable/disable'' (``EN'') bit in the
|
||||
IA32\textunderscore{}APIC\textunderscore{}BASE MSR (see
|
||||
\autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) allows disabling xApic mode, and thus the
|
||||
entire local APIC, globally\footnote{If the system uses the discrete APIC bus, xApic mode cannot be
|
||||
re-enabled without a system reset~\cite[sec.~3.11.4.3]{ia32}.}.
|
||||
|
||||
Enabling x2Apic mode is done by setting the ``EXTD'' bit (see
|
||||
\autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) of the
|
||||
IA32\textunderscore{}APIC\textunderscore{}BASE MSR~\cite[sec.~3.11.4.3]{ia32}.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}[b]{0.7\textwidth}
|
||||
\includesvg[width=1.0\linewidth]{img/ia32_apic_base_msr.svg}
|
||||
\end{subfigure}
|
||||
\caption{The IA32\textunderscore{}APIC\textunderscore{}BASE MSR~\cite[sec.~3.11.4.4]{ia32}.}
|
||||
\label{fig:ia32apicbasemsr}
|
||||
\end{figure}
|
||||
|
||||
Because QEMU does not support x2Apic mode via its TCG\footnote{QEMU Tiny Code Generator transforms
|
||||
target instructions to instructions for the host CPU.}, this implementation only uses xApic mode.
|
||||
|
||||
Besides the ``global enable/disable'' (``EN'') flag, the APIC can also be enabled/disabled by using
|
||||
the ``software enable/disable'' flag in the SVR, see \autoref{subsec:lapicsoftenable}.
|
||||
|
||||
\subsection{Handling Local Interrupts}
|
||||
\label{subsec:lapiclvtinit}
|
||||
|
||||
To configure how the local APIC handles the different local interrupts, the LVT registers are
|
||||
written (see \autoref{fig:localapiclvt}).
|
||||
|
||||
Local interrupts can be configured to use different delivery modes
|
||||
(excerpt)~\cite[sec.~3.11.5.1]{ia32}:
|
||||
|
||||
\begin{itemize}
|
||||
\item Fixed: Simply delivers the interrupt vector stored in the LVT register.
|
||||
\item NMI: Configures this interrupt as non-maskable, this ignores the stored vector number.
|
||||
\item ExtINT: The interrupt is treated as an external interrupt (instead of local interrupt), used e.g.\
|
||||
in virtual wire mode for LINT0.
|
||||
\end{itemize}
|
||||
|
||||
Initially, all local interrupts are initialized to PC/AT defaults: Masked, edge-triggered,
|
||||
active-high and with ``fixed'' delivery mode. Most importantly, the vector fields (bits 0:7 of an
|
||||
LVT register) are set to their corresponding interrupt vector (see \autoref{sec:apxlvtinit} for an
|
||||
example implementation).
|
||||
|
||||
After default-initializing the local interrupts, LINT1 has to be configured separately (see
|
||||
\autoref{tab:lapicregslvtlint}), because it is the local APIC's NMI source\footnote{In older
|
||||
specifications~\cite{mpspec}, LINT1 is defined as NMI source (see \autoref{fig:integratedapic}). It
|
||||
is possible that this changed with newer architectures, so for increased compatibility this
|
||||
implementation configures the local APIC NMI source as reported by ACPI. It is also possible that
|
||||
ACPI reports information on the NMI source just for future-proofing.}. The delivery mode is set to
|
||||
``NMI'' and the interrupt vector to \code{0x00}. This information is also provided by ACPI in the
|
||||
MADT (see \autoref{tab:madtlnmi}). Other local interrupts (APIC timer and the error interrupt) will
|
||||
be configured later (see \autoref{subsec:lapictimer} and \autoref{subsec:lapicerror}).
|
||||
|
||||
\subsection{Allowing Interrupt Processing}
|
||||
\label{subsec:lapicsoftenable}
|
||||
|
||||
To complete a minimal local APIC initialization, the ``software enable/disable'' flag and the
|
||||
spurious interrupt vector (both contained in the SVR, see
|
||||
\autoref{fig:ia32apicsvr}/\autoref{tab:lapicregssvr}), are set. It makes sense to choose
|
||||
\code{0xFF} for the spurious interrupt vector, as on P6 and Pentium processors, the lowest 4 bits
|
||||
must be set (see \autoref{fig:ia32apicsvr}).
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}[b]{0.7\textwidth}
|
||||
\includesvg[width=1.0\linewidth]{img/ia32_apic_svr.svg}
|
||||
\end{subfigure}
|
||||
\caption{The local APIC SVR register~\cite[sec.~3.11.9]{ia32}.}
|
||||
\label{fig:ia32apicsvr}
|
||||
\end{figure}
|
||||
|
||||
Because the APIC's spurious interrupt has a dedicated interrupt vector (unlike the PIC's spurious
|
||||
interrupt), it can be ignored easily by registering a stub interrupt handler for the appropriate
|
||||
vector (see \autoref{sec:apxsvr} for an implementation example).
|
||||
|
||||
The final step to initialize the BSP's local APIC is to allow the local APIC to receive interrupts
|
||||
of all priorities. This is done by writing \code{0x00} to the TPR~\cite[sec.~3.11.8.3]{ia32} (see
|
||||
\autoref{tab:lapicregstpr}). By configuring the TPRs of different local APICs to different
|
||||
priorities or priority classes, distribution of external interrupts to CPUs can be controlled, but
|
||||
this is not used in this thesis.
|
||||
|
||||
\subsection{Local Interrupt EOI}
|
||||
\label{subsec:lapiceoi}
|
||||
|
||||
To notify the local APIC that a local interrupt has been handled, its EOI register (see
|
||||
\autoref{tab:lapicregseoi}) has to be written. Not all local interrupts require EOIs: NMI, SMI,
|
||||
INIT, ExtINT, STARTUP, or INIT-Deassert interrupts are excluded~\cite[sec.~3.11.8.5]{ia32}.
|
||||
|
||||
EOIs for external interrupts are also handled by the local APIC, this is described in
|
||||
\autoref{subsec:ioapiceoi}.
|
||||
|
||||
\subsection{APIC Timer}
|
||||
\label{subsec:lapictimer}
|
||||
|
||||
The APIC timer is integrated into the local APIC, so it requires initialization of the latter. Like
|
||||
the PIT, the APIC timer can generate periodic interrupts in a specified interval by using a
|
||||
counter, that is initialized with a starting value depending on the desired interval. Because the
|
||||
APIC timer doesn't tick with a fixed frequency, but at bus frequency, the initial counter has to be
|
||||
determined at runtime by using an external time source. In addition to the counter register, the
|
||||
APIC timer interval is influenced by a divider: Instead of decrementing the counter at every bus
|
||||
clock, it will be decremented every \(n\)-th bus clock, where \(n\) is the divider. This is useful
|
||||
to allow for long intervals (with decreased precision), that would require a larger counter
|
||||
register otherwise.
|
||||
|
||||
The APIC timer supports three different timer modes, that can be set in the timer's LVT register:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Oneshot: Trigger exactly one interrupt when the counter reaches zero.
|
||||
\item Periodic: Trigger an interrupt each time the counter reaches zero, on zero the counter reloads its
|
||||
initial value.
|
||||
\item TSC-Deadline: Trigger exactly one interrupt at an absolute time.
|
||||
\end{enumerate}
|
||||
|
||||
This implementation uses the APIC timer in periodic mode, to trigger the scheduler preemption.
|
||||
Initialization requires the following steps (order recommended by OSDev~\cite{osdev}):
|
||||
|
||||
\begin{enumerate}
|
||||
\item Measure the timer frequency with an external time source.
|
||||
\item Configuration of the timer's divider register (see \autoref{tab:lapicregstimerdiv}).
|
||||
\item Setting the timer mode to periodic (see \autoref{tab:lapicregslvtt}).
|
||||
\item Initializing the counter register (see \autoref{tab:lapicregstimerinit}), depending on the measured
|
||||
timer frequency and the desired interval.
|
||||
\end{enumerate}
|
||||
|
||||
In this implementation, the APIC timer is calibrated by counting the amount of ticks in one
|
||||
millisecond using oneshot mode (see \autoref{sec:apxapictimer} for an example implementation). The
|
||||
measured amount of timer ticks can then be used to calculate the required counter for an arbitrary
|
||||
millisecond interval, although very large intervals could require the use of a larger divider,
|
||||
while very small intervals (in micro- or nanosecond scale) could require the opposite, to provide
|
||||
the necessary precision. For this approach it is important that the timer is initialized with the
|
||||
same divider that was used during calibration.
|
||||
|
||||
To use the timer, an interrupt handler has to be registered to its interrupt vector (see
|
||||
\autoref{sec:apxapictimer} for an example implementation).
|
||||
|
||||
\subsection{APIC Error Interrupt}
|
||||
\label{subsec:lapicerror}
|
||||
|
||||
Errors can occur for example when the local APIC receives an invalid vector number, or an APIC
|
||||
message gets corrupted on the system bus. To handle these cases, the local APIC provides the local
|
||||
error interrupt, whose interrupt handler can read the error status from the local APIC's
|
||||
\textbf{\gls{esr}} (see \autoref{fig:ia32esr}/\autoref{tab:lapicregsesr}) and take appropriate
|
||||
action.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}[b]{0.7\textwidth}
|
||||
\includesvg[width=1.0\linewidth]{img/ia32_error_status_register.svg}
|
||||
\end{subfigure}
|
||||
\caption{Error Status Register~\cite[sec.~3.11.5.3]{ia32}.}
|
||||
\label{fig:ia32esr}
|
||||
\end{figure}
|
||||
|
||||
The ESR is a ``write/read'' register: Before reading a value from the ESR, it has to be written,
|
||||
which updates the ESR's contents to the error status since the last write. Writing the ESR also
|
||||
arms the local error interrupt again~\cite[sec.~3.11.5.3]{ia32}.
|
||||
|
||||
Enabling the local error interrupt is now as simple as enabling it in the local APIC's LVT and
|
||||
registering an interrupt handler for the appropriate vector (see \autoref{sec:apxhandlingerror} for
|
||||
an example implementation).
|
||||
|
||||
\clearpage
|
||||
|
||||
\section{I/O APIC}
|
||||
\label{sec:ioapicinit}
|
||||
|
||||
% TODO: Continue moving code to the appendix from here on
|
||||
|
||||
To fully replace the PIC and handle external interrupts using the APIC, the I/O APIC, located in
|
||||
the system chipset, has to be initialized by setting its \textbf{\gls{redtbl}}
|
||||
registers~\cite[sec.~9.5.8]{ich5} (see \autoref{tab:ioapicregsredtbl}). Like the local APIC's LVT,
|
||||
the REDTBL allows configuration of interrupt vectors, masking bits, interrupt delivery modes, pin
|
||||
polarities and trigger modes (see \autoref{subsec:lapiclvtinit}).
|
||||
|
||||
Additionally, for external interrupts a destination and destination mode can be specified. This is
|
||||
required because the I/O APIC is able to forward external interrupts to different local APICs over
|
||||
the system bus (see \autoref{fig:integratedapic}). SMP systems use this mechanism to distribute
|
||||
external interrupts to different CPU cores for performance benefits. Because this implementation's
|
||||
focus is not on SMP, all external interrupts are default initialized to ``physical'' destination
|
||||
mode\footnote{The alternative is "logical" destination mode, which allows addressing individual or
|
||||
clusters of local APIC's in a larger volume of
|
||||
processors~\cite[sec.~3.11.6.2.2]{ia32}.}~\cite[sec.~3.11.6.2.1]{ia32} and are sent to the BSP for
|
||||
servicing, by using the BSP's local APIC ID as the destination. The other fields are set to
|
||||
\textbf{\gls{isa}} bus defaults\footnote{Edge-triggered, active-high.}, with ``fixed'' delivery
|
||||
mode, masked, and the corresponding interrupt vector, as defined by the \code{InterruptVector}
|
||||
enum.
|
||||
|
||||
The I/O APIC does not have to be enabled explicitly, if the local APIC is enabled and the REDTBL is
|
||||
initialized correctly, external interrupts will be redirected to the local APIC and handled by the
|
||||
CPU\@.
|
||||
|
||||
Unlike the local APIC's registers, the REDTBL registers are accessed indirectly: Two registers, the
|
||||
``Index'' and ``Data'' register~\cite[sec.~9.5.1]{ich5}, are mapped to the main memory and can be
|
||||
used analogous to the local APIC's registers. The MMIO base address can be parsed from the MADT
|
||||
(see \autoref{tab:madtioapic}). Writing an offset to the index register exposes an indirectly
|
||||
accessible I/O APIC register through the data register (see \autoref{sec:iolistings} for an example
|
||||
implementation). This indirect addressing scheme is useful, because the number of external
|
||||
interrupts an I/O APIC supports, and in turn the number of REDTBL registers, can
|
||||
vary\footnote{Intel's consumer \textbf{\glspl{ich}} always support a fixed amount of 24 external
|
||||
interrupts though~\cite[sec.~9.5.7]{ich5}.}.
|
||||
|
||||
It is possible that one or multiple of the I/O APIC's interrupt inputs act as an NMI source. If
|
||||
this is the case is reported in the MADT (see \autoref{tab:madtionmi}), so when necessary, the
|
||||
corresponding REDTBL entries are initialized like the local APIC's NMI source (see
|
||||
\autoref{subsec:lapiclvtinit}), and using these interrupt inputs for external interrupts is
|
||||
forbidden.
|
||||
|
||||
\subsection{Interrupt Overrides}
|
||||
\label{subsec:ioapicpcat}
|
||||
|
||||
In every PC/AT compatible system, external devices are hardwired to the PIC in the same order.
|
||||
Because this is not the case for the I/O APIC, the interrupt line used by each PC/AT compatible
|
||||
interrupt has to be determined by the OS at runtime, by using ACPI. ACPI provides ``Interrupt
|
||||
Source Override'' structures~\cite[sec.~5.2.8.3.1]{acpi1} inside the MADT (see
|
||||
\autoref{tab:madtirqoverride}) for each PC/AT compatible interrupt that is mapped differently to
|
||||
the I/O APIC than to the PIC\@.
|
||||
|
||||
In addition to the interrupt input mapping, these structures also allow to customize the pin
|
||||
polarity and trigger mode of PC/AT compatible interrupts.
|
||||
|
||||
This information does not only apply to the REDTBL initialization, but it has to be taken into
|
||||
account every time an action is performed on a PC/AT compatible interrupt, like masking or
|
||||
unmasking: If \code{IRQ0} (PIT) should be unmasked, it has to be determined what GSI (or in other
|
||||
words, I/O APIC interrupt input) it belongs to. In many systems \code{IRQ0} is mapped to
|
||||
\code{GSI2}, because the PC/AT compatible PICs are connected to \code{GSI0}. Thus, to allow the PIT
|
||||
interrupt in those systems, the REDTBL entry belonging to \code{GSI2} instead of \code{GSI0} has to
|
||||
be written (see \autoref{sec:apxirqoverrides} for an example implementation).
|
||||
|
||||
\subsection{External Interrupt EOI}
|
||||
\label{subsec:ioapiceoi}
|
||||
|
||||
Notifying the I/O APIC that an external interrupt has been handled differs depending on the
|
||||
interrupt trigger mode: Edge-triggered external interrupts are completed by writing the local
|
||||
APIC's EOI register (see \autoref{subsec:lapiceoi})\footnote{Because external interrupts are
|
||||
forwarded to the local APIC, the local APIC is responsible for tracking them in its IRR and ISR.}.
|
||||
Level-triggered interrupts are treated separately: Upon registering a level-triggered external
|
||||
interrupt, the I/O APIC sets an internal ``Remote IRR'' bit in the corresponding REDTBL
|
||||
entry~\cite[sec.~9.5.8]{ich5} (see \autoref{tab:ioapicregsredtbl}).
|
||||
|
||||
There are three possible ways to signal completion of a level-triggered external interrupt to clear
|
||||
the remote IRR bit:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Using the local APIC's EOI broadcasting feature: If EOI broadcasting is enabled, writing the local
|
||||
APIC's EOI register also triggers EOIs for each I/O APIC (for the appropriate interrupt), which
|
||||
clears the remote IRR bit.
|
||||
\item Sending a directed EOI to an I/O APIC: I/O APICs with versions greater than \code{0x20} include an
|
||||
I/O EOI register. Writing the vector number of the handled interrupt to this register clears the
|
||||
remote IRR bit.
|
||||
\item Simulating a directed EOI for I/O APICs with versions smaller than \code{0x20}: Temporarily masking
|
||||
and setting a completed interrupt as edge-triggered clears the remote IRR
|
||||
bit~\cite[io\textunderscore{}apic.c]{linux}.
|
||||
\end{enumerate}
|
||||
|
||||
Because the first option is the only one supported by all APIC versions, it is used in this
|
||||
implementation\footnote{Disabling EOI broadcasting is not supported by all local
|
||||
APICs~\cite[sec.~3.11.8.5]{ia32}.}.
|
||||
|
||||
At this point, after initializing the local and I/O APIC for the BSP, the APIC system is fully
|
||||
usable. External interrupts now have to be enabled/disabled by writing the ``masked'' bit in these
|
||||
interrupts' REDTBL entries, interrupt handler completion is signaled by writing the local APIC's
|
||||
EOI register, and spurious interrupts are detected by using the local APIC's spurious interrupt
|
||||
vector.
|
||||
|
||||
\subsection{Multiple I/O APICs}
|
||||
\label{subsec:multiioapic}
|
||||
|
||||
Most consumer hardware, for example all IA processors~\cite{ia32} and ICH hubs~\cite{ich5}, only
|
||||
provide a single I/O APIC, although technically multiple I/O APICs are supported by the
|
||||
MultiProcessor specification~\cite[sec.~3.6.8]{mpspec}.
|
||||
|
||||
If ACPI reports multiple I/O APICs (by supplying multiple MADT I/O APIC structures, see
|
||||
\autoref{tab:madtioapic}), the previously described initialization has to be performed for each I/O
|
||||
APIC individually. Additionally, the I/O APIC's ID, also reported by ACPI, has to be written to the
|
||||
corresponding I/O APIC's ID register (see \autoref{tab:ioapicregsid}), because this register is
|
||||
always initialized to zero~\cite[sec.~9.5.6]{ich5}.
|
||||
|
||||
Using a variable number of I/O APICs requires determining the target I/O APIC for each operation
|
||||
that concerns a GSI, like masking or unmasking. For this reason, ACPI provides the ``GSI
|
||||
Base''~\cite[sec.~5.2.8.2]{acpi1} for each available I/O APIC, the number of GSIs a single I/O APIC
|
||||
can handle can be determined by reading the I/O APIC's version register~\cite[sec.~9.5.7]{ich5}
|
||||
(see \autoref{tab:ioapicregsver})\footnote{This approach was previously used in this
|
||||
implementation, but removed for simplicity.}.
|
||||
|
||||
\clearpage
|
||||
|
||||
\section{Symmetric Multiprocessing}
|
||||
\label{sec:smpinit}
|
||||
|
||||
Like single-core systems, SMP systems boot using only a single core, the BSP. By using the APIC's
|
||||
capabilities to send IPIs between cores, additional APs can be put into startup state and booted
|
||||
for system use.
|
||||
|
||||
To determine the amount of usable processors, the MADT is parsed (see \autoref{tab:madtlapic}).
|
||||
Note, that some processors may be reported as disabled, those may not be used by the OS (see
|
||||
\autoref{tab:madtlapicflags}).
|
||||
|
||||
\subsection{Inter-Processor Interrupts}
|
||||
\label{subsec:ipis}
|
||||
|
||||
Issuing IPIs works by writing the local APIC's ICR (see
|
||||
\autoref{fig:ia32icr}/\autoref{tab:lapicregsicr}). It allows specifying IPI type, destination
|
||||
(analogous to REDTBL destinations, see \autoref{sec:ioapicinit}) and vector (see
|
||||
\autoref{sec:apxipis} for an example implementation).
|
||||
|
||||
Depending on the APIC architecture, two different IPIs are required: The INIT IPI for systems using
|
||||
a discrete APIC, and the \textbf{\gls{sipi}} for systems using the xApic or x2Apic architectures:
|
||||
|
||||
\begin{itemize}
|
||||
\item The INIT IPI causes an AP to reset its state and start executing at the address specified at its
|
||||
system reset vector. If paired with a system warm-reset, the AP can be instructed to start
|
||||
executing the AP boot sequence by writing the appropriate address to the warm-reset
|
||||
vector~\cite[sec.~B.4.1]{mpspec}.
|
||||
\item Since the xApic architecture, the SIPI is used for AP startup: It causes the AP to start executing
|
||||
code in real mode, at a page specified in the IPIs interrupt vector~\cite[sec.~B.4.2]{mpspec}. By
|
||||
copying the AP boot routine to a page in lower physical memory, and sending the SIPI with the
|
||||
correct page number, an AP can be booted.
|
||||
\end{itemize}
|
||||
|
||||
To wait until the IPI is sent, the ICR's delivery status bit can be polled.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\begin{subfigure}[b]{0.7\textwidth}
|
||||
\includesvg[width=1.0\linewidth]{img/ia32_interrupt_command_register.svg}
|
||||
\end{subfigure}
|
||||
\caption{Interrupt Command Register~\cite[sec.~3.11.6.1]{ia32}.}
|
||||
\label{fig:ia32icr}
|
||||
\end{figure}
|
||||
|
||||
\subsection{Universal Startup Algorithm}
|
||||
\label{subsec:apstartup}
|
||||
|
||||
SMP initialization is performed differently on various processors. Intel's MultiProcessor
|
||||
specification defines a ``universal startup algorithm'' for multiprocessor
|
||||
systems~\cite[sec.~B.4]{mpspec}, which can be used to boot SMP systems with either discrete APIC,
|
||||
xApic or x2Apic, as it issues both, INIT IPI and SIPI\footnote{Technically, it always issues the
|
||||
INIT IPI, and the SIPI only for xApic or x2Apic, but since the SIPI is ignored by discrete APICs,
|
||||
it can be sent either way. This ``INIT-SIPI-SIPI'' sequence is also stated in the IA-32
|
||||
manual~\cite[sec.~3.9.4]{ia32}.}.
|
||||
|
||||
This algorithm has some prerequisites: It is required to copy the AP boot routine (detailed in
|
||||
\autoref{subsec:apboot}) to lower memory, where the APs will start their execution. Also, the APs
|
||||
need allocated stack memory to call the entry function, and in case of a discrete APIC that uses
|
||||
the INIT IPI, the system needs to be configured for a warm-reset (by writing \code{0xAH} to the
|
||||
CMOS shutdown status byte, located at \code{0xF}~\cite[sec.~B.4]{mpspec}), because the INIT IPI
|
||||
does not support supplying the address where AP execution should begin, unlike the SIPI. The
|
||||
warm-reset vector (a 32-bit field, located at physical address
|
||||
\code{40:67}~\cite[sec.~B.4]{mpspec}) needs to be set to the physical address the AP startup
|
||||
routine was copied to. Additionally, the entire AP startup procedure has to be performed with all
|
||||
sources of interrupts disabled, which offers a small challenge, since some timings need to be taken
|
||||
into account\footnote{This implementation uses the PIT's mode 0 on channel 0 for timekeeping.}.
|
||||
|
||||
The usage of delays in the algorithm is quite specific, but the specification provides no further
|
||||
information on the importance of these timings or required precision. The algorithm allowed for
|
||||
successful startup of additional APs when tested in QEMU (with and without KVM) and on certain real
|
||||
hardware, although for different processors or emulators (like Bochs), different timings might be
|
||||
required~\cite[lapic.c]{xv6}.
|
||||
|
||||
After preparation, the universal startup algorithm is now performed as follows, for each AP
|
||||
sequentially (see \autoref{sec:apxmpusa} for an example implementation):
|
||||
|
||||
\begin{enumerate}
|
||||
\item Assert and de-assert the level-triggered INIT IPI\@.
|
||||
\item Delay for 10 milliseconds.
|
||||
\item Send the SIPI\@.
|
||||
\item Delay for 200 microseconds.
|
||||
\item Send the SIPI again.
|
||||
\item Delay for 200 microseconds again.
|
||||
\item Wait until the AP has signaled boot completion, then continue to the next.
|
||||
\end{enumerate}
|
||||
|
||||
If the system uses a discrete APIC, the APs will reach the boot routine by starting execution at
|
||||
the location specified in the warm-reset vector, if the system uses the xApic or x2Apic
|
||||
architecture, the APs will reach the boot routine because its location was specified in the SIPI\@.
|
||||
|
||||
Signaling boot completion from the APs entry function can be done by using a global bitmap
|
||||
variable, where the \(n\)-th bit indicates the running state of the \(n\)-th processor. This
|
||||
variable does not have to be synchronized across APs, because the startup is performed
|
||||
sequentially.
|
||||
|
||||
\subsection{Application Processor Boot Routine}
|
||||
\label{subsec:apboot}
|
||||
|
||||
After executing the ``INIT-SIPI-SIPI'' sequence, the targeted AP will start executing its boot
|
||||
routine in real mode. The general steps required are similar to those required when booting a
|
||||
single-core system, but since the BSP in SMP systems is already fully operational at this point,
|
||||
much can be recycled. The AP boot routine this implementation uses can be roughly described as
|
||||
follows (see \autoref{sec:apxapboot} for an example implementation):
|
||||
|
||||
\begin{enumerate}
|
||||
\item Load a temporary \textbf{\gls{gdt}}, used for switching to protected mode.
|
||||
\item Enable protected mode by writing \code{cr0}.
|
||||
\item Far jump to switch to protected mode and reload the code-segment register, set up the other
|
||||
segments manually.
|
||||
\item Load the \code{cr3}, \code{cr0} and \code{cr4} values used by the BSP to enable paging (in that
|
||||
order).
|
||||
\item Load the IDT used by the BSP\@.
|
||||
\item Determine the AP's APIC ID by using CPUID\@.
|
||||
\item Load the GDT and \textbf{\gls{tss}} prepared for this AP\@.
|
||||
\item Load the stack prepared for this AP\@.
|
||||
\item Call the (C++) AP entry function.
|
||||
\end{enumerate}
|
||||
|
||||
The APIC ID is used to determine which GDT and stack were prepared for a certain AP\@. It is
|
||||
necessary for each AP to have its own GDT, because each processor needs its own TSS for context
|
||||
switching, for example when interrupt-based system calls are used on all CPUs.
|
||||
|
||||
Because it is relocated into lower physical memory (in this implementation to \code{0x8000}), this
|
||||
code has to be position independent. For this reason, absolute physical addresses have to be used
|
||||
when jumping, loading the IDTR and GDTR, or referencing variables. Also, any variables required
|
||||
during boot have to be available after relocation, this can be achieved by locating them inside the
|
||||
``TEXT'' section of the routine, so they stay with the rest of the instructions when copying. These
|
||||
variables have to be initialized during runtime, before the routine is copied (see
|
||||
\autoref{sec:apxpreparesmp} for an example implementation).
|
||||
|
||||
\subsection{Application Processor Post-Boot Routine}
|
||||
\label{subsec:apsystementry}
|
||||
|
||||
In the entry function, called at the end of the boot routine, the AP signals boot completion as
|
||||
described in \autoref{subsec:apstartup} and initializes its local APIC by repeating the necessary
|
||||
steps from \autoref{subsec:lapiclvtinit}, \autoref{subsec:lapicsoftenable},
|
||||
\autoref{subsec:lapictimer} and \autoref{subsec:lapicerror}\footnote{MMIO memory does not have to
|
||||
be allocated again, as all local APICs use the same memory region in this implementation. Also, the
|
||||
initial value for the APIC timer's counter can be reused, if already calibrated.}.
|
||||
|
||||
Because multiple local APICs are present and active in the system now, the possibility arises that
|
||||
a certain local APIC receives multiple messages from different local APICs at a similar time. To
|
||||
decide the order of handling these messages, an arbitration mechanism based on the local APIC's ID
|
||||
is used~\cite[sec.~3.11.7]{ia32}. To make sure the arbitration priority matches the local APIC's
|
||||
ID, the ARPs can be synchronized by issuing an INIT-level-deassert IPI\footnote{This is not
|
||||
supported on Pentium 4 and Xeon processors.} (see \autoref{sec:apxappostboot} for an example
|
||||
implementation).
|
||||
|
||||
\clearpage
|
||||
Reference in New Issue
Block a user