Wing Ki (Catherine) Wong

PhD in Bioinformatics, Scientist in Antibody Development

A crash course in Medicinal Chemistry

In January 2019 – April 2019, I undertook an internship at a pharmaceutical company, UCB, working with the Computer-aided drug discvoery team. Coming from a non-chemistry background, the first few weeks involve picking up the theories and terminology in basic medicinal chemistry. Here I put together a list of key concepts that have been useful to get me started.

Pharmacokinetics

 E+S \overset{Binding}\rightleftharpoons ES \overset{Catalysis}\rightleftharpoons E+P

 

where E = enzyme, S = substrate, P = product.

Theories

Michaelis-Menten equation:
 V = \frac{V_{max}[S]}{K_m+[S]}

 

Turning it into an equation with a slope:
\frac{1}{V} = \frac{K_m+[S]}{V_{max}[S]} = \frac{K_m}{V_{max}} \frac{1}{[S]} + \frac{1}{V_{max}}

which renders the Lineweaver–Burk plot:

 

Assume:  aA + bB \rightleftharpoons xAB

The equilibrium between forward and backward reaction is: K_D = \frac{[A]^a[B]^b}{[AB]^x}

Clark’s occupancy theory suggests: \frac{E}{E_{max}} = \frac{[L]}{[L]+K_D}

Drug A: Efficacious but not potent;
Drug B: Potent but not efficacious

Rearranging the Scatchard equation, which describes the ratio of bound ligands to the total number of available binding sites, we get this equation for the Scatchard plot: \frac{[PL]}{[L]} = -\frac{1}{K_D}[PL]+\frac{[PL]_{max}}{K_D} where K_D = \frac{[PL]}{[P][L]}

Cheng-Prusoff Equation: K_i = \frac{IC_{50}}{1+\frac{[S]}{K_D}}

Lipinski’s rule of five (RO5)

  • Molecular Mass \leq 500 Da
  • Lipophilicity (logP) \leq 5
  • Number of Hydrogen Bond Acceptors \leq 10
  • Number of Hydrogen Bond Donors \leq 5

Rule of three (RO3)

  • Molecular Mass \leq 300 Da
  • log P \leq 3
  • Number of Hydrogen Bond Acceptors \leq  3
  • Number of Hydrogen Bond Donors \leq  3
  • Number of rotatable bonds \leq  3

 

Lipophilicity is primarily associated with solubility, absorption, membrane penetration and distribution, i.e. the ADME and PKPD properties of the drug. This is usually taken as a ratio between octan-1-ol and water: log P_{octanol/water}.

 

Lipophilic efficiency: LiPE = pIC_{50}-log P

Quality drug candidates usually have high LiPE >6.

Terminology

Better if higher

E_{max}

Better if lower

EC_{50}, IC_{50}, K_D, K_I

Nomenclature

Knowing how your co-workers speak speed up your daily work. Here is the image from Wikipedia’s page on heterocycles – a structure that is often seen in medicinal chemistry:

A few notes on protein sequence analysis

A branch of protein bioinformatics looks at the sequence-structure relationship between protein structures and their encoding sequences. This is most prominent in homology modelling, where we align a query sequence to a database of sequences with known structures, pick the hit with the highest similarity and cast the structure to the query sequence. These different steps can be a field on its own: sequence alignment, similarity measure and structural remodelling. Here I mention a number of tools used in my current research group:

(more…)

Long-short term memory in time-series data

Time-series data such as those in the stock market is usually dependent on the previous n historical data points.

Recurrent Neural Network (RNN) is applied to sequence data to capture the intra-sequence dependence. Long-short term memoy network (LSTM) is a variant of RNN, capturing long-term impact on short-term signal behaviour.

Key difference between a simple RNN and LSTM:

  • Simple RNN: later output nodes of the network are less sensitive to the input at time t=1: gradient vanishes.
  • LSTM: Preseves gradient by implementing “forget” and “input” gates.

LSTM holds the following components in each layer:

  1. Inputs: Previous ouptut ($ h_{t-1} $) and current input ($ x_{t} $)
  2. Forget gate:
    • System: $ \sigma $ decides whehter to throw away information from the current cell state $ C_t $
    • Ouput: $f_t =\sigma( W_f \times [h_{t-1}, x_t] + b_f) $ A number between 0 and 1 for each number in cell state $ C_{t-1} $.
  3. Input gate:
    • System 1: $ \sigma $ decides which values will be updated
      • Output: $ i_t = \sigma(W_i \times [h_{t-1}, x_t]+b_i) $
    • System 2: $ tanh $ creates a vector of new candidate values
      • Output: $ \tilde{C_t} = tanh(W_C \times [h_{t-1}, x_t] + b_C) $
    • Output: $ i_t \times \tilde{C_t} $
  4. Summation of Forget and Input gates:
    • $ C_t = f_t \times C_{t-1} + i_t \times \tilde{C_t} $
  5. Final Process:
    • System 1: $ \sigma $ decides what parts of the cell state $ C_t $ will be outputed
      • Output: $ o_t = \sigma(W_o \times [h_{t-1}, x_t] + b_o) $
    • System 2: $ tanh $ to generate a value between -1 and 1, multiply by $ o_t $
    • Output: $ h_t = o_t \times tanh(C_t) $

Further reading:

Here are two implementations of codes using Tensorflow and Pytorch:

Tensorflow is a bit more convolved but you can play around with the architecture:

https://www.datacamp.com/community/tutorials/lstm-python-stock-market

Pytorch has a built-in LSTM model:

https://github.com/jessicayung/blog-code-snippets/blob/master/lstm-pytorch/lstm-baseline.py

 

UCB PhD Day

24/9/2018 (Mon)

Students sponsored by the UCB Pharma were summoned in London to present our research to scientists and fellow PhD students. It was a fruitful day talking to a number of fellow students from the same and different institutions who work on similar areas, but would otherwise not met because of the difference in approaches – experimental and computational studies.

img_1409.jpg

Booklet and name tag

[Repost from Blopig] Maps are useful. But first, you need to build, store and read a map.

Earlier I wrote a blog post for OPIG: 

 

Here I’m attaching the codes that I used:

using namespace std;

void save(map const& obj, string const& fname) {
std::ofstream ofs(fname, std::ios::binary);
{
boost::iostreams::filtering_ostreambuf fos;

// push the ofstream and the compressor

fos.push(boost::iostreams::zlib_compressor(boost::iostreams::zlib::best_compression));
fos.push(ofs);

// start the archive on the filtering buffer:
boost::archive::binary_oarchive bo(fos);
bo <> obj;
return obj;
}
}