307 lines
19 KiB
TeX
307 lines
19 KiB
TeX
\chapter{Blend Shape Models}
|
|
\label{chap:blendshapemodels}
|
|
|
|
Facial animation in film is typically done using blend shape models,
|
|
as they provide an easy-to-use framework that disallows unnatural facial deformations (unlike e.g.\ freeform deformations),
|
|
while also exposing parameters that are intuitively interpretable by animators (unlike e.g.\ generative PCA models).
|
|
A blend shape model is a linear, \textbf{semantic parameterization} of a face's range of expressions,
|
|
where the individual components (the \textbf{blend shape targets}) are \textquote{core} expressions that are linearly combined to reach a greater range of \textquote{mixed} expressions (the blend shape targets are the \textquote{basis} of the spanned \textquote{expression-space}).
|
|
This choice of linear components introduces a semantic interpretability to the model parameters,
|
|
as each parameter controls the influence of an intuitive human expression on the resulting model.
|
|
The total \textquote{range} of the model depends on the number of used blend shapes,
|
|
but in general a blend shape model remains a lossy representation of the human expression-space.
|
|
|
|
Custom blend shape targets can be chosen depending on the model requirements,
|
|
with the \textquote{Facial Action Coding System} (FACS) there also exists a standardized definition of \textquote{components} of facial expressions based on human anatomy.
|
|
FACS defines 46 \textquote{action units} that correspond to facial muscle movements (not counting units for general head and eye movement),
|
|
which can be realized as delta blend shape targets.
|
|
|
|
In contrast to skeletal forward and inverse kinematics,
|
|
blend shape kinematics does not deal with models made of bone segments and joints,
|
|
although the general problem formulations \(x=f(\theta)\) (forward) and \(\theta=f^{-1}(x)\) (inverse) stay the same.
|
|
|
|
\section{Delta Blend Shape Forward Kinematics}
|
|
\label{sec:blendshapeforwardkinematics}
|
|
|
|
The forward kinematics approach to facial animation using delta blend shape models consists of an animator tuning individual parameters of the model to reach a target expression.
|
|
The blend shape model's components are the \textquote{neutral face} \(b_0\) and \(n\) blend shape targets (custom created facial expressions) \(b_1,\dots,b_n\) (\(b_i\) always denotes a regular blend shape target, delta blend shape targets are denoted as \(b_i-b_0\)).
|
|
Each \(b_i\) has \(m\) vertices and identical triangulation to allow representation of a target expression as a linear combination of \(b_0,\dots,b_n\).
|
|
Every blend shape \(b_i\) \textquote{is a vector of \(m\) stacked vertex positions}:
|
|
\[b_i=\begin{pmatrix}
|
|
x_i^{(i)}\\
|
|
\vdots\\
|
|
x_m^{(i)}\\
|
|
\end{pmatrix}\in\mathbb{R}^{3m}.\]
|
|
Because \(x_i^{(i)}\in\mathbb{R}^3\),
|
|
each vertex's three coordinates are contained in \(b_i\) either in packed or interleaved fashion:
|
|
\(b_i=(x_1,\dots, x_m, y_1,\dots,y_m, z_1,\dots, z_m)^T\) or \(b_i=(x_1,y_1,z_1,\dots,x_m,y_m,z_m)^T\).
|
|
The ordering does not matter, as long as it is identical for every \(b_i\) and the neutral face \(b_0\).
|
|
|
|
A target expression \(F(w)\) is then generated by setting the \(n\)-dimensional parameter-vector \(w\) for an affine linear combination:
|
|
\[F(w)=b_0+\sum\limits_{i=1}^n w_i(b_i-b_0).\]
|
|
By combining divergences from the neutral face (\textquote{delta-blend shapes}),
|
|
each parameter \(w_i\) can be chosen arbitrarily\footnote{
|
|
Parameters should be chosen from the interval \([0, 1]\).
|
|
For the inverse kinematics problem,
|
|
this range is usually not enforced,
|
|
since the target expression could leave the spanned expression space (for example by opening the mouth too much).
|
|
For slight excesses this shouldn't be a problem,
|
|
but in general the delta blend shape target \(b_i\) is reached fully with weights \(w_j=0\ \forall j\neq i\) and \(w_i=1\),
|
|
so a weight exceeding \(1\) is of undefined quality.
|
|
},
|
|
while the effective weights \(\alpha_0,\dots,\alpha_n\) for blend shapes \(b_0,\dots,b_n\) still satisfy the affine property \(\sum_{i=0}^n\alpha_i=1\) (which is required,
|
|
otherwise the target expression could \textquote{leave} the expression space spanned by the blend shapes,
|
|
for example by unwanted scale deformations):
|
|
\begin{align*}
|
|
F(w)&=b_0+\sum\limits_{i=1}^{n}w_i(b_i-b_0)\\
|
|
&=b_0+\sum\limits_{i=1}^{n}w_i b_i-\sum\limits_{i=1}^{n}w_i b_0\\
|
|
&=b_0\underbrace{\left(1-\sum\limits_{i=1}^{n}w_i\right)}_{\alpha_0}+\sum\limits_{i=1}^{n}\underbrace{w_i}_{\alpha_i} b_i\\
|
|
\Rightarrow \sum\limits_{i=0}^{n}\alpha_i &= 1-\left(\sum\limits_{i=1}^{n}w_i+\sum\limits_{i=1}^{n}w_i\right)=1
|
|
\end{align*}
|
|
By combining the delta blend shape vectors \(b_i-b_0\) into a matrix
|
|
\[B=[b_1-b_0|\dots|b_n-b_0]\]
|
|
with \(B\in\mathbb{R}^{3m\times n}\), the model can be formulated more compactly:
|
|
\[F(w)=b_0+Bw.\]
|
|
|
|
The semantic nature of blend shape models makes animation using this forward approach generally possible for smaller models,
|
|
but it becomes inefficient or even impossible for high-quality models with hundreds of parameters\footnote{
|
|
The facial model created for \textquote{Gollum} from \textquote{The Lord of the Rings} uses over \(900\) blend shape targets.~\autocite{computeranimation}
|
|
}.
|
|
|
|
\section{Delta Blend Shape Inverse Kinematics}
|
|
\label{sec:blendshapeinversekinematics}
|
|
|
|
For this reason, animation using the inverse kinematics approach is desirable:
|
|
Instead of interacting with individual parameters,
|
|
\textquote{markers} or \textquote{manipulators} are placed on the model,
|
|
which allow to define the target expression more directly.
|
|
Markers positions can be obtained either manually through a user interface which allows direct interaction with the model (like in \autoref{fig:effectorface}) or through facial tracking (see \autoref{chap:performancedrivenfacialanimation}).
|
|
The blend shape parameters are then determined to match this target expression as closely as possible.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{subfigure}[b]{0.53\textwidth}
|
|
\includegraphics[scale=0.3]{img/effector_face.png}
|
|
\end{subfigure}
|
|
\caption{A marker placed on a facial model.~\autocite{directmanipulationblendshapes}}
|
|
\label{fig:effectorface}
|
|
\end{figure}
|
|
|
|
Placing a marker on the model effectively means choosing a vertex,
|
|
whose position should act as a constraint for the face deformation.
|
|
Marker \(i\)'s current position is the current vertex position \(x_i\),
|
|
which depends on the current model parameters \(w\).
|
|
The vertex's target position \(t_i\) can then be defined by moving the marker around.
|
|
|
|
Determining the model parameters \(w\) to satisfy the marker constraints can be formulated as the following minimization problem:
|
|
\[w=\arg\min\limits_w||\overline{B}w-(\overline{t}-\overline{b}_0)||^2=\arg\min\limits_w||\overline{B}w-m||^2,\]
|
|
where \(t\) is the vector of all marker's target positions,
|
|
\(m\) are the offsets of the target positions from the neutral face (offsets are used because of delta blend shapes) and \(\overline{B}\),
|
|
\(\overline{t}\) and \(\overline{b}_0\) correspond to \(B\),
|
|
\(t\) and \(b_0\) but only contain the rows belonging to the vertices that are constrained by markers.~\autocite{directmanipulationblendshapes}
|
|
This means, \(\overline{B}\) contains \(3\) rows for each placed marker.
|
|
Since the direct manipulation allows targeting expressions outside the model's expression space,
|
|
an exact solution is generally not possible.
|
|
Also, the above minimization problem will be under-constrained in most cases (unless the animator places possibly hundreds of markers),
|
|
so additional constraints or regularization terms need to be introduced.
|
|
|
|
To improve temporal continuity of the animation, weight differences between updates can be minimized by introducing \(\alpha||w-w_0||^2\) to the problem,
|
|
where \(w_0\) are the previous weights:~\autocite{directmanipulationblendshapes}
|
|
\[w=\arg\min\limits_w||\overline{B}w-m||^2+\alpha||w-w_0||^2,\]
|
|
Lewis and Anjyo add \(\mu||w||^2\) as another regularization term,
|
|
with the intention to oppose \textquote{extreme} solutions,
|
|
for example caused by weight growth due to numerical errors in \textquote{oscillating} animations:~\autocite{directmanipulationblendshapes}
|
|
\[w=\arg\min\limits_w||\overline{B}w-m||^2+\alpha||w-w_0||^2+\mu||w||^2.\]
|
|
This term is also important since the weights \(w\) are not constrained to \([0, 1]\) for the inverse problem (at least not in this solution).
|
|
|
|
The parameters \(\alpha\) and \(\mu\) are set to small values to not significantly alter the main objective,
|
|
Lewis and Anjyo use \(\alpha=0.1\) and \(\mu=0.001\).~\autocite{directmanipulationblendshapes}
|
|
|
|
\section{Solving the Inverse Kinematics Minimization Problem}
|
|
\label{sec:solvingblendshapeinversekinematics}
|
|
|
|
The goal is the minimization of
|
|
\[||\overline{B}w-m||^2+\alpha||w-w_0||^2+\mu||w||^2,\]
|
|
which is quadratic in \(w\).
|
|
Since we are using the euclidian norm,
|
|
it follows that
|
|
\[||x||^2=\sqrt{x_1^2+\dots+x_n^2}^2=x_1^2+\dots+x_n^2=x^T x\]
|
|
for \(x\in\mathbb{R}^n\), which allows us to rewrite the term as
|
|
\begin{align*}
|
|
&(\overline{B}w-m)^T(\overline{B}w-m)+\alpha(w-w_0)^T(w-w_0)+\mu w^T w\\
|
|
=\ &(w^T \overline{B}\,^T \overline{B}w-2w^T \overline{B}\,^T m+m^T m)\\
|
|
&+\alpha(w^T w-2w^T w_0+w_0^T w_0)\\
|
|
&+\mu(w^T w).
|
|
\end{align*}
|
|
Deriving this (using some slightly sketchy matrix differential notation\footnote{
|
|
Sketchy example:\\
|
|
\(d\phi(w)=d(w^T\overline{B}\,^T\overline{B}w)=(dw)^T\overline{B}\,^T\overline{B}w+w^T\overline{B}\,^T\overline{B}(dw)=2w^T\overline{B}\,^T\overline{B}(dw)\Leftrightarrow\frac{d\phi}{dw}=2\overline{B}\,^T\overline{B}w\)
|
|
}) leads to the following derivative:
|
|
\[2\overline{B}\,^T\overline{B}w-2\overline{B}\,^T m+2\alpha w-2\alpha w_0+2\mu w.\]
|
|
Solving for \(0\) to find the local extremum leads to
|
|
\begin{align*}
|
|
2\overline{B}\,^T\overline{B}w+2\alpha w+2\mu w&=2\overline{B}\,^T m+2\alpha w_0\\
|
|
\Leftrightarrow\left(\overline{B}\,^T\overline{B}+(\alpha+\mu)I\right)w&=\overline{B}\,^T m+\alpha w_0,
|
|
\end{align*}
|
|
which is a \(n\times n\) linear system (\(n\) is the number of blend shapes/model parameters),
|
|
where \(I\in\mathbb{R}^{n\times n}\) is the identity matrix.
|
|
The above condition is sufficient, since the problem is convex.
|
|
|
|
Because \(\overline{B}\,^T\overline{B}\) and \((\alpha+\mu)I\) are both (usually) positive definite\footnote{
|
|
For a matrix \(A\), \(A^T A\) is positive definite if \(Ax\neq 0\) for any non-zero vector \(x\),
|
|
since \(x^T A^T Ax=(Ax)^T(Ax)=||Ax||^2\).
|
|
This is probably true for \(\overline{B}\),
|
|
since \(\overline{B}\)'s columns (the partial delta blend shape targets) should be linearly independent.
|
|
}\ \footnote{
|
|
The identity matrix and its scalar multiples are positive definite.
|
|
} and equal to their (conjugate\footnote{
|
|
We only deal with real numbers (coordinates from \(\mathbb{R}^3\)).
|
|
}) transposes\footnote{
|
|
\((A^T A)^T=A^T(A^T)^T=A^T A\).
|
|
}, an efficient Cholesky solver can be applied to obtain \(w\).~\autocite{computeranimation}
|
|
|
|
\subsection{Inverse Kinematics Instability}
|
|
\label{subsec:pseudoinverseinstability}
|
|
|
|
Disregarding implementation-specific numerical instabilities and assuming that \(\overline{B}\) is invertible because all delta blend shape targets are linearly independent\footnote{
|
|
If \(\exists\ i, w: b_i = \sum\limits_{j\neq i} w_j b_j\), \(b_i\) can be removed, as it does not add information to the model.
|
|
} (so \(\overline{B}\) has full rank),
|
|
we have the following equivalence:
|
|
|
|
\[
|
|
\left(\overline{B}\,^T\overline{B}+(\alpha+\mu)I\right)w=\overline{B}\,^T m+\alpha w_0
|
|
\Leftrightarrow w=\left(\overline{B}\,^T\overline{B}+(\alpha+\mu)I\right)^{-1}\left(\overline{B}\,^T m+\alpha w_0\right)
|
|
\]
|
|
|
|
This is a (regularized) Moore-Penrose pseudo-inverse\footnote{
|
|
This is only true if \(\overline{B}\)'s columns are linearly independent, which I will assume from now on.
|
|
Under these circumstances, it follows for a full-rank matrix \(A\):
|
|
\begin{align*}
|
|
A&=U\Sigma V^T=U\left((\Sigma^+)^T\Sigma\right)\Sigma V^T=U(\Sigma^+)^T (V^T V)\Sigma (U^T U)\Sigma V^T\\
|
|
&=(V\Sigma^+ U^T)^T(U\Sigma V^T)^T(U\Sigma V^T)=\left((U\Sigma V^T)^+\right)^T(U\Sigma V^T)^T(U\Sigma V^T)\\
|
|
&=(A^+)^T A^T A\Leftrightarrow A^T=(A^T A) A^+\Leftrightarrow A^+=(A^T A)^{-1}A^T.
|
|
\end{align*}
|
|
Full rank is required for the property \((\Sigma^+)^T\Sigma=I\).
|
|
} or \textquote{damped-least-squares} method:
|
|
If we set \(\alpha=0\) and \(\mu=0\) we obtain
|
|
|
|
\[w=\left(\overline{B}\,^T\overline{B}\right)^{-1}\overline{B}\,^T m=\overline{B}\,^+ m.\]
|
|
|
|
This can lead to instabilities when \textquote{dragging} or animating a placed marker/constraint:~\autocite{transpositiondirectmanipulationblendshapes}
|
|
If we expand our forward kinematics problem \(F(w)=b_0+Bw\) using a singular value decomposition, we end up with
|
|
|
|
\[F(w)=b_0+U\Sigma V^T w,\]
|
|
|
|
where the largest singular values in \(\Sigma\) will have the most influence on the generated facial expression.
|
|
These are the values that should mainly be used to solve the inverse kinematics problem.
|
|
|
|
Now, looking at the singular value decomposition of the unregularized inverse kinematics problem \(w=\overline{B}\,^+ m\), we obtain
|
|
|
|
\[w=(U\Sigma V^T)^+ m=(V^T)^+\Sigma^+U^+ m=V\Sigma^+U^T m,\]
|
|
|
|
because \(V\) and \(U\) are orthogonal matrices\footnote{
|
|
\(U^+=U^{-1}=U^T\).
|
|
}.
|
|
|
|
Since we take the pseudo-inverse \(\Sigma^+\),
|
|
the non-zero diagonal entries of \(\Sigma\) are inverted,
|
|
which means this inverse kinematics solution is influenced strongly by the smaller singular values from the forward kinematics problem.
|
|
In consequence, the facial deformations produced to satisfy a marker constraint might also occur outside this marker's local environment\footnote{
|
|
In some cases this might be even desirable,
|
|
for example when a smile produces deformations in the eye-region,
|
|
but this makes the model slightly unpredictable.
|
|
}, which could lead to inconsistencies during animation.
|
|
|
|
A possible solution is given by the transposition-based solution of the inverse kinematics problem.~\autocite{transpositiondirectmanipulationblendshapes}~\autocite{jacobiantranspose}
|
|
|
|
\section{Corrective and Intermediate Blend Shapes}
|
|
\label{sec:correctiveblendshapes}
|
|
|
|
Although the \textquote{basis vectors} of the spanned expression space (the blend shape targets) are valid facial expressions,
|
|
arbitrary linear combinations can still produce unnatural anomalies.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{subfigure}[b]{0.25\textwidth}
|
|
\includegraphics[scale=0.2]{img/unconstrained_weights.png}
|
|
\end{subfigure}
|
|
\caption{Consequences of unconstrained blend shape weights.~\autocite{directmanipulationblendshapes}}
|
|
\label{fig:unconstrainedweights}
|
|
\end{figure}
|
|
|
|
For this reason, \textquote{corrective} blend shapes are used:
|
|
By adding blend shape targets with additional weights (not part of the manual model parameters) that depend on other weights,
|
|
anomalies caused by special weight combinations can be fixed.
|
|
|
|
For example, if an anomaly occurs when activating weights \(w_j\) and \(w_k\) for blend shape targets \(b_j\) and \(b_k\),
|
|
a corrective blend shape \(b_{(j,k)}\) (\textit{not} in delta blend shape form!) can be modeled for the expression produced by the weights \(w_j=w_k=1\) and \(w_i=0\ \forall i\notin\{j, k\}\).
|
|
By setting the weight to \(w_{(j,k)}=w_j\cdot w_k\), the blend shape automatically activates when both \(b_j\) and \(b_k\) become active.
|
|
|
|
The resulting blend shape model is quadratic with respect to its parameters \(w\):
|
|
\begin{align*}
|
|
F_C(w)&=F(w)+\sum\limits_{(i,j)\in C}w_{(i,j)} b_{(i,j)}\\
|
|
&=b_0+\sum\limits_{i=1}^{n}w_i(b_i-b_0)+\sum\limits_{(i,j)\in C}w_i w_j b_{(i,j)},
|
|
\end{align*}
|
|
where \(C=\{1,\dots,n\}\times\{1,\dots,n\}\) and \(b_{(i,j)}\) denotes the corrective blend shape for blend shape targets \(b_i\) and \(b_j\).
|
|
Because of the quadratic nature of this model,
|
|
the inverse kinematics problem can no longer be solved by a simple convex optimization.
|
|
A possible solution for the quadratic case is given in~\autocite{nonconvexblendshapes},
|
|
although corrective blend shapes could also be applied in a third-order fashion or higher (correcting anomalies caused by three or more overlapping blend shape targets).
|
|
|
|
Another problem arises from the linear nature of the blend shape model:
|
|
Certain rotational movements like closing eyelids or eye rotations can only be represented in a linear fashion.
|
|
When animating an eyeball movement (like a gaze from center to right),
|
|
the eyeball will loose a part of its volume in the first half of the animation and regain it in the second half,
|
|
since each individual vertex can only move in a straight line.
|
|
To solve this problem, \textquote{intermediate} blend shapes are used:
|
|
An additional blend shape target is modeled for a single (or multiple) intermediate animation state(s),
|
|
leading to a piecewise linear interpolation/approximation (see \autoref{fig:combinationvsintermediate}).
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\begin{subfigure}[b]{0.45\textwidth}
|
|
\includegraphics[scale=0.3]{img/combination_vs_intermediate.png}
|
|
\end{subfigure}
|
|
\caption{Corrective blend shapes lead to smooth interpolation that becomes significant if all weights approach \(1\) (left),
|
|
intermediate blend shapes lead to non-smooth interpolation (right).~\autocite{practicetheoryblendshapes}
|
|
}
|
|
\label{fig:combinationvsintermediate}
|
|
\end{figure}
|
|
|
|
This approach can be handled by the in \autoref{sec:blendshapeinversekinematics} described inverse kinematics solution,
|
|
as intermediate blend shapes are just appended to the existing linear model.
|
|
|
|
\section{Combining Skeletal and Blend Shape Animation}
|
|
\label{sec:combiningskeletalandblendshapes}
|
|
|
|
Instead of using intermediate blend shapes to better approximate rotational deformations,
|
|
certain facial movements like eye-, eyelid- or jaw rotations can be modeled using skeletal animation and skinning\footnote{
|
|
\textquote{Skinning} is the translation of abstract bone movements into actual skin/vertex movements.
|
|
}.
|
|
This does not require modeling all facial animation using bones,
|
|
as blend shape animation and skeletal animation can be combined.
|
|
|
|
To achieve this, the blend shape inverse kinematics problem is solved first,
|
|
as the deformations caused by skinning do not match the blend shape targets.
|
|
The skeletal inverse kinematics problem can be solved later,
|
|
because vertex deformations caused by the blend shape model do not modify bone positions.
|
|
|
|
Although this approach might achieve better performance with rotational movement,
|
|
the resulting expressions loose some control,
|
|
as the interactions between blend shape and skinning deformations are unclear\footnote{
|
|
It is unclear to me if this model is viable.
|
|
}.
|
|
|
|
A different combinational model is possible by applying corrective blend shapes to a primarily skeleton-based model:
|
|
In cases where memory availability is low,
|
|
skeletal models may be preferred over blend shape models\footnote{
|
|
Skeletal models are more taxing on CPU/GPU computations,
|
|
since skinning is more computationally expensive than linear interpolation.
|
|
Blend shape models are more taxing on CPU/GPU memory (especially for detailed models with many blend shapes or situtations with many different characters with their own blend shapes),
|
|
since the face is stored many times with different deformations (although data blend shapes help here, as the localized deformations lead to sparse delta blend shape targets).
|
|
}.
|
|
To still achieve realistic skin movement (which might be hard only through skinning),
|
|
corrective blend shapes can be applied after the skinning,
|
|
with weights depending on certain joint angles.
|
|
In that case, only skeletal inverse kinematics is involved.
|