Module 2: Programming

“Program, not programme. ”

Hello World! Indeed I have previously said hello to five worlds: MATLAB, C++, Python, Fortran and Perl. Fairly standard for an engineer like myself. However to many biological scientists who are starting out just like myself, they have been alienated from programming, potentially throughout their lives. This two-week course would have been extremely intensive for them, to learn two languages (Python and C) side by side, and produce a program on either microplate reading or “Game of Life”.
I, on the other hand, opted for the advance project, where we were given a task (later an extra one was added) to work on for two weeks. The first task was about clustering Complementarity determining regions (CDRs). CDRs are formed from six hypervariable loops – three on the light chains and three on the heavy chains. Interestingly, these loops only adopt several conformations, although the multiple combinations of conformations give rise to the hypervariability. Our task was to use hierarchial clustering method – Unweighted Pair Group Method with Arithmetic Mean (UPGMA) to cluster them into canonical groups of similar structures, using Root-Mean-Square Deviation between atoms of the structures.
The second task was a lot harder for me. The background of the second task arose from the protein folding predictions, and the likelihood of these candidate structures (decoys) to be structurally similar to the true conformation. Experimentally observed distances between specific types of atoms were categorised. Residue-specific all-atom probability discriminatory function (RAPDF) calculates the probability of finding the distance between two atoms in the decoy, in the nature. The more likely it is found in the nature, the closer the decoy structure is to the true structure. I did manage to write a code in Python that runs for three-hour for one file (we have 300). Then in 10 hours I rewrote the code in C++, and the processing time reduces to 12 minutes. Unfortunately I had not been able to optimise the code for more effective memory usage. Through this I discovered the next item that would concern me a lot in programming: high performance computing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s