Antibody i-Patch: Paratope prediction

Paratope:

Antibody i-Patch: Antibody contact residues prediction

Binding likelihood score of each residue $a$:

$p_a = \frac{\frac{f^{con}_a}{f^{non}_a}}{\frac{\sum f^{con}_b}{\sum f^{non}_b}}$

where $p_a$ is the propensity to be in contact, $f^{con}_a$ and $f^{non}_a$ are the surface frequency of residue $a$ to be in contact ($con$) and not to be in contact ($non$).

Clustering algorithm of Antibody i-Patch:

$a_{ji}$: $i$-th residue in the $j$-th sequence in the Multi-sequence alignment (MSA). Calculating over a triangle of amino acids ($a_{ji}, a_{jk}, a_{jl}$) which correspond to 3 categories ($C_{ji}, C_{jk}, C_{jl}$). Average triangle propensity $S_i$ among all possible triangles between:

• Site $i_t \in \Pi(i)$ on protein A;
• Surface $k$ exposed sites on protein B; and
• Residue $l$ – a structural neighbour of either $i_t$ or $k$.

Some more notations:

• $M$ = number of sequences in MSA;
• $G(i_t)$ set of sequences from MSA which do not have a gap at position $i_t$.

$S_i = \frac{1}{\Pi(i)}\sum_{i_t \in \Pi(i)} (w_{ii_t}^{intra} S_{i_t}^{Triangle})$

WHERE:

• $S_{i_t}^{Triangle} = \frac{1}{|{k on B}|} \sum_{k on B} \frac{1}{|\Pi(i_t) \cup \Pi(k)|} \times \sum_{l \in \Pi(i_t) \cup \Pi(k)} \frac{1}{M-|G(i_t)|} \times \sum^{M-|G(i_t)|}_{j=1} w^{pair} (C_{jl}|C_{ji_t},C_{jk}) \frac{p^{con}(C_{jl}, C_{ji_t}, C_{jk})}{p^{non}(C_{jl}, C_{ji_t}, C_{jk})}$
• $p^{con}(C_{jl},C_{ji_t},C_{jk})=\frac{f^{con}_{C_{jl},C_{ji_t},C_{jk}}/f^{all}_{C_{jl},C_{ji_t},C_{jk}}}{\sum_{T \in Triangles} f^{con}_T / \sum_{T \in Triangles} f^{all}_T}$
• $p^{non}(C_{jl},C_{ji_t},C_{jk})=\frac{f^{non}_{C_{jl},C_{ji_t},C_{jk}}/f^{all}_{C_{jl},C_{ji_t},C_{jk}}}{\sum_{T \in Triangles} f^{non}_T / \sum_{T \in Triangles} f^{all}_T}$
• $w^{pair} (C_2|C_1,C_3) =\frac{w^{con}_{intra}(C_2|C_1,C_3)}{w^{non}_{intra}(C_2|C_1,C_3)}$
• $w^{con}_{intra}(C_2|C_1,C_3) = \frac{f^{con}_{C_2 \in N(P)}/f^{all}_{C_2 \in N(C_1,C_3)}}{f^{con}_{C_1,C_3}/f^{all}_{C_1,C_3}}$
• $w^{non}_{intra}(C_2|C_1,C_3) = \frac{f^{non}_{C_2 \in N(C_1,C_3)}/f^{all}_{C_2 \in N(C_1,C_3)}}{f^{non}_{C_1,C_3}/f^{all}_{C_1,C_3}}$
• $w_{ii_t}^{intra} = \frac{1}{M-|G(i_t) \cup G(i)|} \sum_{j \in G^c(i_t) \cup G^c (i)} w^{intra} (C_{ji_t}|C_{ji})$
• $w^{intra} (C_2|C_1) = \frac{w^{con}_{intra}(C_2|C_1)}{w^{non}_{intra}(C_2|C_1)}$
• $w^{con}_{intra}(C_2|C_1) = \frac{f^{con}_{C_2 \in N(C_1)}/f^{all}_{C_2 \in N(C_1)}}{f^{con}_{C_1}/f^{all}_{C_1}}$
• $w^{non}_{intra}(C_2|C_1) = \frac{f^{non}_{C_2 \in N(C_1)}/f^{all}_{C_2 \in N(C_1)}}{f^{non}_{C_1}/f^{all}_{C_1}}$

$p$ refers to the propensity; $w$ refers to the weight; $f^{con}_{C_1}$ is the frequency of residues in $C_1$ in the training set which are in contact; likewise for $f^{non}_{C_1}$. Propensity calculations of the triangles (residue triples) are weighted over the number of triangles of residue categories $T$ over all possible unordered combinations $Triangles$.

$C_2 \in N(C_1, C_3)$ denotes the frequency of $(C_1, C_3)$ with $C_2$ within 4.5 $\AA$ of either $C_1$ or $C_3$.

When these calculations were translated to the Ab-Ag context, the directionality of interactions (some antibodies have lower binding propensities) is considered; whereas the MSA (as this is not available to Ab-Ag case) is removed.

Removal of MSA: $C_{ij}$ for the $j$-th residue in the $i$-th MSA sequence, hence $i = 1$.

Directionality: Antibody residues  $\rightarrow$ Antigen residues propensities were calculated.

These considerations turn the original i-Patch for generic proteins into the antibody i-Patch:

$S^{AB}_i = \frac{1}{\Pi(i)} \sum_{i_t \in \Pi(i)} (w^{ABintra}_{ii_t} S^{ABTriangle}_{i_t})$

$S^{ABTriangle}_{i_t} = \frac{1}{|{k \in Ag}|} \sum_{k \in AG} \frac{1}{|\Pi (i_t) \cup \Pi (k)|} \times [\sum_{l_{ab} \in \Pi(i_t)} (w^{ABpair}(C^{AB}_{1l_{ab}}|C^{AB}_{1i_t}, C^{AG}_{1k}) \frac{p^{ABcon} (C^{AB}_{1l_{ab}},C^{AB}_{1i_t},C^{AG}_{1k})}{p^{ABnon} (C^{AB}_{1l_{ab}},C^{AB}_{1i_t},C^{AG}_{1k})}) + \sum_{l_{ag} \in \Pi(k)} (w^{AGpair}(C^{AG}_{1l_{ag}}|C^{AB}_{1i_t}, C^{AG}_{1k}) \frac{p^{AGcon} (C^{AG}_{1l_{ag}},C^{AB}_{1i_t},C^{AG}_{1k})}{p^{AGnon} (C^{AG}_{1l_{ag}},C^{AB}_{1i_t},C^{AG}_{1k})} ]$

How to assess the energetic importance of this score?

A typical experimental procedure is Alanine scanning – replacing each residue with alanine to assess the structural and functional importance of the replaced residue. The choice of alanine is due to its mimicry of secondary structure preferences that many other residue species adopt. Here a computational alanine scanning assessed with interaction energy change $\Delta \Delta G$ (using FoldX in this paper of interest): stabilizing if $\Delta \Delta G < -0.25 kcal/mol$ or destabilizing if $\Delta \Delta G > 0.25 kcal/mol$.

How do they bind?

The exciting docking – this would drive another blog post!

Source: