interrupt-handling-using-th…/chap/implementation.tex

\chapter{Implementation}
\label{ch:implementation}

This section will detail how and to what extent the APIC support was implemented.
The steps performed in this implementation are described in a generally applicable way, details on the implementation source code are found in \autoref{ch:listings}.
\autoref{subsec:hhuosintegration} deals with the modifications done to the hhuOS system to integrate the APIC implementation.

\clearpage

\section{Design Decisions and Scope}
\label{sec:design}

The APIC interrupt architecture is split into multiple hardware components and tasks: The (potentially multiple) local APICs, the (usually single) I/O APIC and the APIC timer (part of each local APIC).
Furthermore, the APIC system needs to interact with its memory mapped registers and the hhuOS ACPI subsystem, to gather information about the CPU topology and interrupt overrides.
Also, the OS should be able to interact with the APIC system in a simple and easy manner, without needing to know all of its individual parts.

To keep the whole system structured and simple, the implementation is split into the following main components (see \autoref{fig:implarch}):

\begin{itemize}
  \item \code{LocalApic}: Provides functionality to interact with the local APIC (masking and unmasking, register access, etc.).
  \item \code{IoApic}: Provides functionality to interact with the I/O APIC (masking and unmasking, register access, etc.)
  \item \code{ApicTimer}: Provides functionality to calibrate the APIC timer and handle its interrupts.
  \item \code{Apic}: Condenses all the functionality above and exposes it to other parts of the OS\@.
\end{itemize}

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.5\textwidth}
    \includesvg[width=1.0\linewidth]{img/Architecture.svg}
  \end{subfigure}
  \caption{Caller Hierarchy of the Main Components.}
  \label{fig:implarch}
\end{figure}

This implementation is targeted to support systems with a single I/O APIC\footnote{
  Operation with more than one I/O APIC is described further in the \textquote{MultiProcessor Specification}~\autocite[sec.~3.6.8]{mpspec}.},
\ because consumer hardware typically only uses a single one, and so does QEMU emulation.
General information on implementing multiple I/O APIC support can be found in \autoref{subsec:multiioapic}.

With the introduction of ACPI's GSIs to the OS, new types are introduced to clearly differentiate different representations of interrupts and prevent unintentional conversions:

\begin{itemize}
  \item \code{GlobalSystemInterrupt}: ACPI's interrupt abstraction, detached from the hardware interrupt lines.
  \item \code{InterruptRequest}: Represents an \textbf{\gls{isa}} IRQ, allowing the OS to address interrupts by the name of the device that triggers it.
        When the APIC is used, interrupt overrides map IRQs to GSIs.
  \item \code{InterruptVector}: Represents an interrupt's vector number, as used by the \code{InterruptDispatcher}.
        The dispatcher maps interrupt vectors to interrupt handlers.
\end{itemize}

Both BIOS and UEFI are supported by hhuOS, but the hhuOS ACPI subsystem is currently only available with BIOS\footnote{
  State from 11/02/2023.},
\ so when hhuOS is booted using UEFI, APIC support cannot be enabled.
Also, the APIC can handle MSIs, but they are not included in this implementation, as hhuOS currently does not utilize them.

SMP systems are partially supported: The APs are initialized, but only to a busy-looping state, as hhuOS currently is a single-core OS and lacks some required infrastructure.
All interrupts are handled using the BSP\@.

A summary of features that are outside the scope of this thesis:

\begin{itemize}
  \item Operation with a discrete APIC or x2Apic.
  \item Interrupts with logical destinations or custom priorities.
  \item Returning from APIC operation to PIC mode\footnote{
        This would be theoretically possible with single-core hardware, but probably useless.}.
  \item Relocation of a local APIC's MMIO memory region\footnote{
        Relocation is possible by writing a new physical APIC base address to the IA32\textunderscore{}APIC\textunderscore{}BASE MSR~\autocite[sec.~3.11.4.5]{ia32}.}.
  \item Distributing external interrupts to different APs in SMP enabled systems.
  \item Usage of the system's performance counter or monitoring interrupts.
  \item Meaningful treatment of APIC errors.
  \item Handling of MSIs.
\end{itemize}

To be able to easily extend an APIC implementation for single-core systems to SMP systems, some things have to be taken into account:

\begin{itemize}
  \item SMP systems need to manage multiple \code{LocalApic} and \code{ApicTimer} instances.
        This is handled by the \code{Apic} class.
  \item Initialization of the different components can no longer happen at the same \textquote{location}: The local APICs and APIC timers of additional APs have to be initialized by the APs themselves, because the BSP can not access an AP's registers.
  \item APs are only allowed to access instances of APIC classes that belong to them.
  \item Interrupt handlers that get called on multiple APs may need to take the current processor into account (for example the APIC timer interrupt handler).
  \item Register access has to be synchronized, if it is performed in multiple operations on the same address space.
\end{itemize}

% \section{Code Style}
% \label{sec:codestyle}
%
% Individual state of local and I/O APICs is managed through instances of their respective classes.
% Because each CPU core can only access the local APIC contained in itself, this can create a
% misconception: It is not possible to (e.g.) allow an interrupt in a certain local APIC by calling a
% function on a certain \code{LocalApic} instance. This is communicated through code by declaring a
% range of functions as \code{static}. It is also in direct contrast to the \code{IoApic} class: I/O
% APICs can be addressed by instances, because they are not part of the CPU core: Each core can
% always access all I/O APICs (if there are multiple).
%
% Error checking is done to a small extent in this implementation: Publicly exposed functions (from
% the \code{Apic} class) do check for invalid invocations, but the internally used classes do not
% protect their invariants, because they are not used directly by other parts of the OS. These
% classes only selectively expose their interfaces (by using the \code{friend} declaration) for the
% same reason.

\clearpage

\section{Local APIC}
\label{sec:lapicinit}

The local APIC block diagram (see \autoref{fig:localapicblock}) shows a number of registers that are required for initialization\footnote{
  Registers like the \textquote{Trigger Mode Register}, \textquote{Logical Destination Register} or \textquote{Destination Format Register} remain unused in this thesis.}:

\begin{itemize}
  \item \textbf{\gls{lvt}}: Used to configure how local interrupts are handled.
  \item \textbf{\gls{svr}}: Contains the software-enable flag and the spurious interrupt vector number.
  \item \textbf{\gls{tpr}}: Decides the order in which interrupts are handled and a possible interrupt priority threshold, to ignore low priority interrupts.
  \item \textbf{\gls{icr}}: Controls the sending of IPIs for starting up APs in SMP enabled systems.
  \item \textbf{\gls{apr}}: Controls the priority in the APIC arbitration mechanism.
\end{itemize}

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.9\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_manual_local_apic_blockdiagram.svg}
  \end{subfigure}
  \caption{The Local APIC Block Diagram~\autocite[sec.~3.11.4.1]{ia32}.}
  \label{fig:localapicblock}
\end{figure}

There are multiple registers associated with the LVT, those belong to the different local interrupts the local APIC can handle.
Local interrupts this implementation is concerned about are listed below:

\begin{itemize}
  \item \textit{LINT1}: The local APIC's NMI source.
  \item \textit{Timer}: Periodic interrupt triggered by the APIC timer.
  \item \textit{Error}: Triggered by errors in the APIC system (e.g.\ invalid vector numbers or corrupted messages in internal APIC communication).
\end{itemize}

The LINT0 interrupt is unlisted, because it is mainly important for virtual wire mode (it can be triggered by external interrupts from the PIC).
The performance and thermal monitoring interrupts also remain unused in this implementation.

Besides the local APIC's own registers, the IMCR and IA32\textunderscore{}APIC\textunderscore{}BASE MSR also requires initialization (described in \autoref{subsec:lapicenable}).

After system power-up, the local APIC is in the following state~\autocite[sec.~3.11.4.7]{ia32}:

\begin{itemize}
  \item IRR, ISR and TPR are reset to \code{0x00000000}.
  \item The LVT is reset to \code{0x00000000}, except for the masking bits (all local interrupts are masked on power-up).
  \item The SVR is reset to \code{0x000000FF}.
  \item The APIC is in xApic mode, even if x2Apic support is present.
  \item Only the BSP is enabled, other APs have to be enabled by the BSP's local APIC\@.
\end{itemize}

The initialization sequence consists of these steps, all performed before interrupt handling is enabled (see \autoref{fig:apicenable}):

\begin{enumerate}
  \item Enable symmetric I/O mode and set the APIC operating mode.
  \item Initialize the LVT and NMI\@.
  \item Initialize the SVR\@.
  \item Initialize the TPR\@.
  \item Initialize the APIC timer and error handler.
  \item Startup the APs and initialize their respective local APICs.
  \item Synchronize the APRs.
  \item Enable interrupts for the BSP and APs.
\end{enumerate}

\subsection{Accessing Local APIC Registers in xApic Mode}
\label{subsec:xapicregacc}

Accessing registers in xApic mode is done via MMIO and requires a single page (\SI{4}{\kilo\byte}) of memory, mapped to the \textquote{APIC Base} address, which can be obtained by reading the IA32\textunderscore{}APIC\textunderscore{}BASE MSR (see \autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) or parsing the MADT (see \autoref{tab:madt}).
The IA-32 manual specifies a caching strategy of \textquote{strong uncachable}\footnote{
  See IA-32 manual for information on this caching strategy~\autocite[sec.~3.12.3]{ia32}.}
~\autocite[sec.~3.11.4.1]{ia32} for this region (see \autoref{subsec:apxxapicregacc} for the example implementation).

The address offsets (from the base address) for the local APIC registers are listed in the IA-32 manual~\autocite[sec.~3.11.4.1]{ia32} and in \autoref{tab:lapicregs}.

\subsection{Enabling the Local APIC}
\label{subsec:lapicenable}

Because the system boots in PIC mode, \code{0x01} should be written to the IMCR~\autocite[sec.~3.6.2.1]{mpspec} to disconnect the PIC from the local APIC's LINT0 INTI (see \autoref{fig:integratedapic}, for the example implementation see \autoref{subsec:apxdisablepic}).

To set the operating mode, it is first determined which modes are supported by executing the \code{cpuid} instruction with \code{eax=1}: If bit 9 of the \code{edx} register is set, xApic mode is supported, if bit 21 of the \code{ecx} register is set, x2Apic mode is supported~\autocite[sec.~5.1.2]{cpuid}.
If xApic mode is supported (if the local APIC is an integrated APIC), it will be in that mode already.
The \textquote{global enable/disable} (\textquote{EN}) bit in the IA32\textunderscore{}APIC\textunderscore{}BASE MSR (see \autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) allows disabling xApic mode, and thus the entire local APIC, globally\footnote{
  If the system uses the discrete APIC bus, xApic mode cannot be re-enabled without a system reset~\autocite[sec.~3.11.4.3]{ia32}.}.
Enabling x2Apic mode is done by setting the \textquote{EXTD} bit (see \autoref{fig:ia32apicbasemsr}/\autoref{tab:lapicregsmsr}) of the IA32\textunderscore{}APIC\textunderscore{}BASE MSR~\autocite[sec.~3.11.4.3]{ia32}.

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.7\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_apic_base_msr.svg}
  \end{subfigure}
  \caption{The IA32\textunderscore{}APIC\textunderscore{}BASE MSR~\autocite[sec.~3.11.4.4]{ia32}.}
  \label{fig:ia32apicbasemsr}
\end{figure}

Because QEMU does not support x2Apic mode via its TCG\footnote{
  QEMU Tiny Code Generator transforms target instructions to instructions for the host CPU.},
\ this implementation only uses xApic mode.

Besides the \textquote{global enable/disable} flag, the APIC can also be enabled/disabled by using the \textquote{software enable/disable} flag in the SVR, see \autoref{subsec:lapicsoftenable}.

\subsection{Handling Local Interrupts}
\label{subsec:lapiclvtinit}

To configure how the local APIC handles the different local interrupts, the LVT registers are written (see \autoref{fig:localapiclvt}).

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.7\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_lvt.svg}
  \end{subfigure}
  \caption{The Local Vector Table~\autocite[sec.~3.11.5.1]{ia32}.}
  \label{fig:localapiclvt}
\end{figure}

Local interrupts can be configured to use different delivery modes (excerpt)~\autocite[sec.~3.11.5.1]{ia32}:

\begin{itemize}
  \item \textit{Fixed}: Simply delivers the interrupt vector stored in the LVT register.
  \item \textit{NMI}: Configures this interrupt as non-maskable, this ignores the stored vector number.
  \item \textit{ExtINT}: The interrupt is treated as an external interrupt (instead of local interrupt), used e.g.\ in virtual wire mode for LINT0\@.
\end{itemize}

Initially, all local interrupts are initialized to PC/AT defaults: Masked, edge-triggered, active-high and with \textit{fixed} delivery mode.
Most importantly, the vector fields (bits 0:7 of each LVT register) are set to their corresponding interrupt vector (see \autoref{subsec:apxlvtinit} for an example implementation).

After default-initializing the local interrupts, LINT1 has to be configured separately (see \autoref{tab:lapicregslvtlint}), because it is the local APIC's NMI source\footnote{
  In older specifications~\autocite{mpspec}, LINT1 is defined as NMI source (see \autoref{fig:integratedapic}).
  It is possible that this changed with newer architectures, so for increased compatibility this implementation configures the local APIC NMI source as reported by ACPI\@.
  It is also possible that ACPI reports information on the NMI source just for future-proofing.}.
The delivery mode is set to \textquote{NMI} and the interrupt vector to \code{0x00}.
This information is also provided by ACPI in the MADT (see \autoref{tab:madtlnmi}).
Other local interrupts (APIC timer and the error interrupt) will be configured later (see \autoref{subsec:lapictimer} and \autoref{subsec:lapicerror}).

\subsection{Allowing Interrupt Processing}
\label{subsec:lapicsoftenable}

To complete a minimal local APIC initialization, the \textquote{software enable/disable} flag and the spurious interrupt vector (see \autoref{fig:ia32apicsvr}/\autoref{tab:lapicregssvr}), are set.
It makes sense to choose \code{0xFF} for the spurious interrupt vector, as on P6 and Pentium processors, the lowest four bits must be set (see \autoref{fig:ia32apicsvr}).

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.7\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_apic_svr.svg}
  \end{subfigure}
  \caption{The Local APIC SVR Register~\autocite[sec.~3.11.9]{ia32}.}
  \label{fig:ia32apicsvr}
\end{figure}

Because the APIC's spurious interrupt has a dedicated interrupt vector (unlike the PIC's spurious interrupt), it can be ignored easily by registering a stub interrupt handler for the appropriate vector (see \autoref{subsec:apxsvr} for an implementation example).

The final step to initialize the BSP's local APIC is to allow the local APIC to receive interrupts of all priorities.
This is done by writing \code{0x00} to the TPR~\autocite[sec.~3.11.8.3]{ia32} (see \autoref{tab:lapicregstpr}).
By configuring the TPRs of different local APICs to different priorities or priority classes, distribution of external interrupts to CPUs can be controlled, but this is not used in this thesis.

\subsection{Local Interrupt EOI}
\label{subsec:lapiceoi}

To notify the local APIC that a local interrupt is being handled, its EOI register (see \autoref{tab:lapicregseoi}) has to be written.
Not all local interrupts require EOIs: NMI, SMI, INIT, ExtINT, SIPI, or INIT-Deassert interrupts are excluded~\autocite[sec.~3.11.8.5]{ia32}.

EOIs for external interrupts are also handled by the local APIC, further described in \autoref{subsec:ioapiceoi}.

\subsection{APIC Timer}
\label{subsec:lapictimer}

The APIC timer is integrated into the local APIC, so it requires initialization of the latter.
Like the PIT, the APIC timer can generate periodic interrupts in a specified interval by using a counter, that is initialized with a starting value depending on the desired interval.
Because the APIC timer does not tick with a fixed frequency, but at bus frequency, the initial counter has to be determined at runtime by using an external time source.
In addition to the counter register, the APIC timer interval is influenced by a divider: Instead of decrementing the counter at every bus clock, it will be decremented every \(n\)-th bus clock, where \(n\) is the divider.
This is useful to allow for long intervals (with decreased precision), that would require a larger counter register otherwise.

The APIC timer supports three different timer modes, that can be set in the timer's LVT register:

\begin{enumerate}
  \item \textit{Oneshot}: Trigger exactly one interrupt when the counter reaches zero.
  \item \textit{Periodic}: Trigger an interrupt each time the counter reaches zero, on zero the counter reloads its initial value.
  \item \textit{TSC-Deadline}: Trigger exactly one interrupt at an absolute time.
\end{enumerate}

This implementation uses the APIC timer in periodic mode, to trigger the scheduler preemption.
Initialization requires the following steps (order recommended by OSDev~\autocite{osdev}):

\begin{enumerate}
  \item Measure the timer frequency with an external time source.
  \item Configuration of the timer's divider register (see \autoref{tab:lapicregstimerdiv}).
  \item Setting the timer mode to periodic (see \autoref{tab:lapicregslvtt}).
  \item Initializing the counter register (see \autoref{tab:lapicregstimerinit}), depending on the measured timer frequency and the desired interval.
\end{enumerate}

In this implementation, the APIC timer is calibrated by counting the amount of ticks in one millisecond using oneshot mode (see \autoref{subsec:apxapictimer} for an example implementation).
The measured amount of timer ticks can then be used to calculate the required counter for an arbitrary millisecond interval, although very large intervals could require the use of a larger divider, while very small intervals (in micro- or nanosecond scale) could require the opposite, to provide the necessary precision.
For this approach it is important that the timer is initialized with the same divider that was used during calibration.

To use the timer, an interrupt handler has to be registered to its interrupt vector (see \autoref{subsec:apxapictimer} for an example implementation).

\subsection{APIC Error Interrupt}
\label{subsec:lapicerror}

Errors can occur e.g.\ when the local APIC receives an invalid vector number, or an APIC message gets corrupted on the system bus.
To handle these cases, the local APIC provides the local error interrupt, whose interrupt handler can read the error status from the local APIC's \textbf{\gls{esr}} (see \autoref{fig:ia32esr}/\autoref{tab:lapicregsesr}) and take appropriate action.

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.7\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_error_status_register.svg}
  \end{subfigure}
  \caption{Error Status Register~\autocite[sec.~3.11.5.3]{ia32}.}
  \label{fig:ia32esr}
\end{figure}

The ESR is a \textquote{write/read} register: Before reading a value from the ESR, it has to be written, which updates the ESR's contents to the error status since the last write.
Writing the ESR also arms the local error interrupt, which doesn't happen automatically~\autocite[sec.~3.11.5.3]{ia32}.

Enabling the local error interrupt is now as simple as enabling it in the local APIC's LVT and registering an interrupt handler for the appropriate vector (see \autoref{subsec:apxhandlingerror} for an example implementation).

\clearpage

\section{I/O APIC}
\label{sec:ioapicinit}

To fully replace the PIC and handle external interrupts using the APIC, the I/O APIC, located in the system chipset, has to be initialized by setting its \textbf{\gls{redtbl}} registers~\autocite[sec.~9.5.8]{ich5} (see \autoref{tab:ioapicregsredtbl}).
Like the local APIC's LVT, the REDTBL allows configuration of interrupt vectors, masking bits, interrupt delivery modes, pin polarities and trigger modes (see \autoref{subsec:lapiclvtinit}).

Additionally, for external interrupts a destination and destination mode can be specified.
This is required because the I/O APIC is able to forward external interrupts to different local APICs over the system bus (see \autoref{fig:integratedapic}).
SMP systems use this mechanism to distribute external interrupts to different CPU cores for performance benefits.
Because this implementation's focus is not on SMP, all external interrupts are default initialized to \textit{physical} destination mode\footnote{
  The alternative is \textit{logical} destination mode, which allows addressing individual or clusters of local APIC's in a larger volume of processors~\autocite[sec.~3.11.6.2.2]{ia32}.}
~\autocite[sec.~3.11.6.2.1]{ia32} and are sent to the BSP for servicing, by using the BSP's local APIC ID as the destination.
The other fields are set to ISA bus defaults\footnote{
  Edge-triggered, active-high.},
\ with \textit{fixed} delivery mode, masked, and the corresponding interrupt vector, as defined by the \code{InterruptVector} enum.

The I/O APIC does not have to be enabled explicitly, if the local APIC is enabled and the REDTBL is initialized correctly, external interrupts will be redirected to the local APIC and handled by the CPU\@.

Unlike the local APIC's registers, the REDTBL registers are accessed indirectly: Two registers, the \textquote{Index} and \textquote{Data} register~\autocite[sec.~9.5.1]{ich5}, are mapped to the main memory and can be used analogous to the local APIC's registers.
The MMIO base address can be parsed from the MADT (see \autoref{tab:madtioapic}).
Writing an offset to the index register exposes an indirectly accessible I/O APIC register through the data register (see \autoref{subsec:iolistings} for an example implementation).
This indirect addressing scheme is useful, because the number of external interrupts an I/O APIC supports, and in turn the number of REDTBL registers, can vary\footnote{
  Intel's consumer \textbf{\glspl{ich}} always support a fixed amount of 24 external interrupts though~\autocite[sec.~9.5.7]{ich5}.}.

It is possible that one or multiple of the I/O APIC's interrupt inputs act as an NMI source.
If this is the case is reported in the MADT (see \autoref{tab:madtionmi}), so when necessary, the corresponding REDTBL entries are initialized like the local APIC's NMI source (see \autoref{subsec:lapiclvtinit}), and using these interrupt inputs for external interrupts is forbidden.

\subsection{Interrupt Overrides}
\label{subsec:ioapicpcat}

In every PC/AT compatible system, external devices are hardwired to the PIC in the same order.
Because this is not the case for the I/O APIC, the INTIs used by each PC/AT compatible interrupt has to be determined by the OS at runtime, by using ACPI\@.
ACPI provides \textquote{Interrupt Source Override} structures~\autocite[sec.~5.2.8.3.1]{acpi1} inside the MADT (see \autoref{tab:madtirqoverride}) for each PC/AT compatible interrupt that is mapped differently to the I/O APIC than to the PIC\@.
In addition to the interrupt input mapping, these structures also allow customizing the pin polarity and trigger mode of PC/AT compatible interrupts.

This information does not only apply to the REDTBL initialization, but it has to be taken into account every time an action is performed on a PC/AT compatible interrupt, like masking or unmasking: If \code{IRQ0} (PIT) should be unmasked, it has to be determined what GSI (or in other words, I/O APIC interrupt input) it belongs to.
In many systems \code{IRQ0} is mapped to \code{GSI2}, because the PC/AT compatible PICs are connected to \code{GSI0} (see \autoref{fig:integratedapic}).
Thus, to allow the PIT interrupt in those systems, the REDTBL entry belonging to \code{GSI2} instead of \code{GSI0} has to be written (see \autoref{subsec:apxirqoverrides} for an example implementation).

\subsection{External Interrupt EOI}
\label{subsec:ioapiceoi}

Notifying the I/O APIC that an external interrupt has been handled differs depending on the interrupt trigger mode: Edge-triggered external interrupts are completed by writing the local APIC's EOI register (see \autoref{subsec:lapiceoi})\footnote{
  Because external interrupts are forwarded to the local APIC, the local APIC is responsible for tracking them in its IRR and ISR.}.
Level-triggered interrupts are treated separately: Upon registering a level-triggered external interrupt, the I/O APIC sets an internal \textquote{Remote IRR} bit in the corresponding REDTBL entry~\autocite[sec.~9.5.8]{ich5} (see \autoref{tab:ioapicregsredtbl}).

There are three possible ways to signal completion of a level-triggered external interrupt to clear the remote IRR bit:

\begin{enumerate}
  \item Using the local APIC's EOI broadcasting feature: If EOI broadcasting is enabled, writing the local APIC's EOI register also triggers EOIs for each I/O APIC (for the appropriate interrupt), which clears the remote IRR bit.
  \item Sending a directed EOI to an I/O APIC\@: I/O APICs with versions greater than \code{0x20} include an I/O EOI register~\autocite[sec.~9.5.5]{ich5}.
        Writing the vector number of the handled interrupt to this register clears the remote IRR bit.
  \item Simulating a directed EOI for I/O APICs with versions smaller than \code{0x20}: Temporarily masking and setting a completed interrupt as edge-triggered clears the remote IRR bit~\autocite[io\textunderscore{}apic.c]{linux}.
\end{enumerate}

Because the first option is the only one supported by all APIC versions, it is used in this implementation\footnote{
  Disabling EOI broadcasting is not supported by all local APICs~\autocite[sec.~3.11.8.5]{ia32}.}.

At this point, after initializing the local and I/O APIC for the BSP, the APIC system is fully usable.
External interrupts now have to be enabled/disabled by writing the \textquote{masked} bit in these interrupts' REDTBL entries, interrupt handler completion is signaled by writing the local APIC's EOI register, and spurious interrupts are detected by using the local APIC's spurious interrupt vector.

\subsection{Multiple I/O APICs}
\label{subsec:multiioapic}

Most consumer hardware, for example all IA processors~\autocite{ia32} and ICH hubs~\autocite{ich5}, only provide a single I/O APIC, although technically multiple I/O APICs are supported by the \textquote{MultiProcessor Specification}~\autocite[sec.~3.6.8]{mpspec}.
If ACPI reports multiple I/O APICs (by supplying multiple MADT I/O APIC structures, see \autoref{tab:madtioapic}), the previously described initialization has to be performed for each I/O APIC individually.
Additionally, the I/O APIC's ID, also reported by ACPI, has to be written to the corresponding I/O APIC's ID register (see \autoref{tab:ioapicregsid}), because this register is always initialized to zero~\autocite[sec.~9.5.6]{ich5}.

Using a variable number of I/O APICs requires determining the target I/O APIC for each operation that concerns a GSI, like masking or unmasking.
For this reason, ACPI provides the \textquote{GSI Base}~\autocite[sec.~5.2.8.2]{acpi1} for each available I/O APIC, the number of GSIs a single I/O APIC can handle can be determined by reading the I/O APIC's version register~\autocite[sec.~9.5.7]{ich5} (see \autoref{tab:ioapicregsver})\footnote{
  This approach was previously used in this implementation, but removed for simplicity.}.

\clearpage

\section{Symmetric Multiprocessing}
\label{sec:smpinit}

Like single-core systems, SMP systems boot using only a single core, the BSP\@.
By using the APIC's capabilities to send IPIs between cores, additional APs can be put into startup state and booted for system use.

To determine the number of usable processors, the MADT is parsed (see \autoref{tab:madtlapic}).
Note, that some processors may be reported as disabled, those may not be used by the OS (see \autoref{tab:madtlapicflags}).

\subsection{Inter-Processor Interrupts}
\label{subsec:ipis}

Issuing IPIs works by writing the local APIC's ICR (see \autoref{fig:ia32icr}/\autoref{tab:lapicregsicr}).
It allows specifying IPI type, destination (analogous to REDTBL destinations, see \autoref{sec:ioapicinit}) and vector (see \autoref{subsec:apxipis} for an example implementation).

Depending on the APIC architecture, two different IPIs are required: The \textit{INIT IPI} for systems using a discrete APIC, and the \textbf{\gls{sipi}} for systems using the xApic or x2Apic architectures:

\begin{itemize}
  \item The INIT IPI causes an AP to reset its state and start executing at the address specified at its system reset vector.
        If paired with a system warm-reset, the AP can be instructed to start executing the AP boot sequence by writing the appropriate address to the warm-reset vector~\autocite[sec.~B.4.1]{mpspec}.
  \item Since the xApic architecture, the SIPI is used for AP startup: It causes the AP to start executing code in real mode, at a page specified in the IPIs interrupt vector~\autocite[sec.~B.4.2]{mpspec}.
        By copying the AP boot routine to a page in lower physical memory, and sending the SIPI with the correct page number, an AP can be booted.
\end{itemize}

To wait until the IPI is sent, the ICR's delivery status bit can be polled.

\begin{figure}[h]
  \centering
  \begin{subfigure}[b]{0.7\textwidth}
    \includesvg[width=1.0\linewidth]{img/ia32_interrupt_command_register.svg}
  \end{subfigure}
  \caption{Interrupt Command Register~\autocite[sec.~3.11.6.1]{ia32}.}
  \label{fig:ia32icr}
\end{figure}

\subsection{Universal Startup Algorithm}
\label{subsec:apstartup}

SMP initialization is performed differently on various processors.
Intel's \textquote{MultiProcessor Specification} defines a \textquote{Universal Startup Algorithm} for multiprocessor systems~\autocite[sec.~B.4]{mpspec}, which can be used to boot SMP systems with either discrete APIC, xApic or x2Apic, as it issues both, INIT IPI and SIPI\@.

This algorithm has some prerequisites: It is required to copy the AP boot routine (detailed in \autoref{subsec:apboot}) to lower memory, where the APs will start their execution.
Also, the APs need allocated stack memory to call the entry function, and in case of a discrete APIC that uses the INIT IPI, the system needs to be configured for a warm-reset (by writing \code{0xAH} to the CMOS shutdown status byte, located at \code{0xF}~\autocite[sec.~B.4]{mpspec}), because the INIT IPI does not support supplying the address where AP execution should begin, unlike the SIPI\@.
The warm-reset vector (a \SI{32}{\bit} field, located at physical address \code{40:67}~\autocite[sec.~B.4]{mpspec}) needs to be set to the physical address the AP startup routine was copied to.
Additionally, the entire AP startup procedure has to be performed with all sources of interrupts disabled, which offers a small challenge, since some timings need to be taken into account\footnote{
  This implementation uses the PIT's mode 0 on channel 0 for timekeeping, see \autoref{subsec:apxapictimer}.}.

The usage of delays in the algorithm is quite specific, but the specification provides no further information on the importance of these timings or required precision.
The algorithm allowed for successful startup of additional APs when tested in QEMU (with and without KVM) and on certain real hardware, although for different processors or emulators (like Bochs), different timings might be required~\autocite[lapic.c]{xv6}.

After preparation, the universal startup algorithm is now performed as follows, for each AP sequentially (see \autoref{subsec:apxmpusa} for an example implementation and \autoref{fig:smpenable}):

\begin{enumerate}
  \item Assert and de-assert the level-triggered INIT IPI\@.
  \item Delay for \SI{10}{\milli\second}.
  \item Send the SIPI\@.
  \item Delay for \SI{200}{\micro\second}.
  \item Send the SIPI again.
  \item Delay for \SI{200}{\micro\second} again.
  \item Wait until the AP has signaled boot completion, then continue to the next.
\end{enumerate}

If the system uses a discrete APIC, the APs will reach the boot routine by starting execution at the location specified in the warm-reset vector, if the system uses the xApic or x2Apic architecture, the APs will reach the boot routine because its location was specified in the SIPI\@.

Signaling boot completion from the APs entry function can be done by using a global bitmap variable, where the \(n\)-th bit indicates the running state of the \(n\)-th processor.
This variable does not have to be synchronized across APs, because the startup is performed sequentially.

\subsection{Application Processor Boot Routine}
\label{subsec:apboot}

After executing the \textquote{INIT-SIPI-SIPI} sequence, the targeted AP will start executing its boot routine in real mode.
The general steps required are similar to those required when booting a single-core system, but since the BSP in SMP systems is already fully operational at this point, much can be recycled.
The AP boot routine this implementation uses can be roughly described as follows (see \autoref{subsec:apxapboot} for an example implementation):

\begin{enumerate}
  \item Load a temporary \textbf{\gls{gdt}}, used for switching to protected mode.
  \item Enable protected mode by writing \code{cr0}.
  \item Far jump to switch to protected mode and reload the code-segment register, set up the other segments manually.
  \item Load the \code{cr3}, \code{cr0} and \code{cr4} values used by the BSP to enable paging (in that order).
  \item Load the IDT used by the BSP\@.
  \item Determine the AP's APIC ID by using CPUID\@.
  \item Load the GDT and \textbf{\gls{tss}} prepared for this AP\@.
  \item Load the stack prepared for this AP\@.
  \item Call the (C++) AP entry function.
\end{enumerate}

The APIC ID is used to determine which GDT and stack were prepared for a certain AP\@.
It is necessary for each AP to have its own GDT, because each processor needs its own TSS for hardware context switching, e.g.\ when interrupt-based system calls are used on all CPUs.

Because it is relocated into lower physical memory, this code has to be position independent.
For this reason, absolute physical addresses have to be used when jumping, loading the IDTR and GDTR, or referencing variables.
Also, any variables required during boot have to be available after relocation, this can be achieved by locating them inside the \textquote{TEXT} section of the routine, so they stay with the rest of the instructions when copying.
These variables have to be initialized during runtime, before the routine is copied (see \autoref{subsec:apxpreparesmp} for an example implementation).

\subsection{Application Processor Post-Boot Routine}
\label{subsec:apsystementry}

In the entry function, called at the end of the boot routine, the AP signals boot completion as described in \autoref{subsec:apstartup} and initializes its local APIC by repeating the necessary steps from \autoref{subsec:lapiclvtinit}, \autoref{subsec:lapicsoftenable}, \autoref{subsec:lapictimer} and \autoref{subsec:lapicerror}\footnote{
  MMIO memory does not have to be allocated again, as all local APICs use the same memory region in this implementation.
  Also, the initial value for the APIC timer's counter can be reused, if already calibrated.}.

Because multiple local APICs are present and active in the system now, the possibility arises that a certain local APIC receives multiple messages from different local APICs at a similar time.
To decide the order of handling these messages, an arbitration mechanism based on the local APIC's ID is used~\autocite[sec.~3.11.7]{ia32}.
To make sure the arbitration priority matches the local APIC's ID, the ARPs can be synchronized by issuing an INIT-level-deassert IPI\footnote{
  This is not supported on Pentium 4 and Xeon processors.}
\ (see \autoref{subsec:apxappostboot} for an example implementation).

\cleardoublepage