Antibody Fingerprinting
In this Science paper by Georgiev et al. (2013), the authors use a panel of antibody clusters against a panel of virus strains to generate a reference panel of antibody fingerprints that are then used to characterize multiclonal sera. The notation and some clues about recoding their work in R
is given below. (Images in this post are best visualized using the Safari browser. Apologies to users on other browsers as I attempt to fix this.)
Antibody Clustering
Stendarting with the antibody clustering analysis (on page 4 of the supplemental), let:
\begin{equation} N_j = \{n_{1j}, n_{2j}, \cdots, n_{ij}, \cdots, n_{vj}\} , \end{equation}
where $N_j$ is the neutralization fingerprint for antibody $j$, and where $n_{ij}$ is the neutralization potency, for which antibody $j$ neutralizes virus strain $i$, and $v$ represents the total number of virus strains. Then, they transform $N_j$ into a rank vector $N_j^R$, where the potency for strain $i$ is replaced by the rank, obtained by sorting the $n_{ij}$ values by potency (highest potency $=1$). For any two given antibodies $j_1$ and $j_2$, the Spearman correlation is calculated for the two rank vectors $N_{j_1}^R$ and $N_{j_2}^R$. To copy their example in R
:
They then use hierarchical clustering in Mathematica. You can cluster using Manhattan distance in R
using the following:
Serum Specificity Delineation
Let $S$ be the set of all virus strains, and let $K$ be the set of antibody clusters:
\begin{equation} S = \{\mathsf{6101.10}, ~~\mathsf{Bal.01}, ~~\cdots, ~~\mathsf{ZM55.28a}\} \end{equation}
\begin{equation} K = \{\mathsf{VRC01-like}, ~~\mathsf{b12-like}, ~~\cdots, \mathsf{10E8-like}\} \end{equation}
For a given cluster $k \in K$, the neutralization fingerprint $R_k$ is defined as:
\begin{equation} R_k = \{\mu_{ik} \left| i \in S \right . \}, \end{equation}
where $\mu_{ik}$ is the median of the ranks for strain $i$ with all antibodies $j$ within antibody cluster $k$. Let $\boldsymbol{R}$ be the matrix of representative neutralization fingerprints for all antibody clusters: $\boldsymbol{R} = \{R_k \left| k \in K \right. \}$, where
\begin{equation} \left[R_{k=1}\right] = \left[\mu_{i=1, k=1} ~~ \mu_{i=2, k=1} ~~ \cdots ~~ \mu_{i=S_T, k=1}\right], \end{equation}
and where $\boldsymbol{R}$ is the reference set of epitope-specific neutralization fingerprints. This is the same as the data in the file neut-rank.csv
, and saved as the object abMatrixRanks
, where the first column of the .csv
gives the strain names $S$ and the first row gives the antibody clusters $K$. Let $v$ be the total number of strains, and $K_T$ be the total number of clusters.
Otherwise represented as:
\begin{equation} \boldsymbol{R} = \left( \begin{array}{cccc} \left[R_{k=1}\right]^T & \left[R_{k=2}\right]^T & \cdots & \left[R_{k=K_T}\right]^T \end{array} \right). \end{equation}
Analagous to the definition ($N_j$) of monoclonal antibody neutralization fingerprinting, let $N_m=\{n_{im} \left| i \in S \right.\}$ be the neutralization pattern for serum $m \in M$, where $M$ represents the set of all sera being tested. You can then transform the serum neutralization pattern $N_m$ into a rank vector $\boldsymbol{D_m}$, based on neutralization / binding potency. Basically, at this point, they want to minimize the residual error between $\boldsymbol{D_m}$ and the neutralization fingerprints $\boldsymbol{R}$ multiplied by some antibody cluster coefficients $\boldsymbol{C_m} = \{c_k^m \left| k \in K \right. \}$.
We want to minimize:
\begin{equation} \parallel \boldsymbol{D_m}-\boldsymbol{R} \cdot \boldsymbol{C_m} \parallel \end{equation}
where we have constrained $\sum_{k \in K} c_k^m =1$, and $0 \leq c_k^m \leq 1$. In our case $R$ is a $\{v \times K_T\}$ matrix, $C_m$ is a $\{K_T \times 1\}$ column vector, and $D_m$ is a $\{1 \times v\}$ row vector, and $\parallel x\parallel$ is the Euclidean or $\ell^2$ norm of $x$: $\parallel x\parallel = \sqrt{x_1^2 + \cdots + x_n^2}$.
We can think of this as a constrained version of the minimization that we perform for ordinary linear regression, without an intercept:
\begin{equation} \parallel \boldsymbol{y}-\boldsymbol{X} \cdot \boldsymbol{\beta} \parallel \end{equation}
Where $\boldsymbol{y}= \boldsymbol{D_m}$, $\boldsymbol{X}=\boldsymbol{R}$, and $\boldsymbol{\beta}=\boldsymbol{C_m}$.
The key words here (to search on Google) are constrained optimization and regression. We can use nnls
(non-negative least squares, Lawson-Hanson-flavored). 1 All that is left to do, is to scale the coefficients such that they sum to 1 (divide by the sum of the coefficients).
For instance, performing the calculation on the first row (first serum) would give us:
or
Here is the final script ab-fp-script.R
:
Result from original Mathematica vs.
Result from my R script
Alternative visualization using stacked bars
-
(Alternative options may exist with the
optim
orconstrOptim
functions in theR
packagestats
.) ↩