# Module 7: Bioinformatics and Statistical Genetics

Yet another course on applied statistics. We had lectures and practicals on Phylogenetics, Haplotype Phasing, Statistical alignment and population genetics (depicting how geographical migration evolves the genetic composition for the well-adapted species to prosper). Additionally, we spent indeed most of our time on an intense group project – we had only until Thursday to finish up theoretical reading, coding and presentation!

My group, as usual, was an interdisciplinary group with a chemist, biologist and computer scientist, doing a project on the Reflexively Autocatalytic Food-generated set in Chemical Reaction Network. It was hypothesised that, in the very beginning of life-forms, organic compounds (RNA or metabolites) formed spontaneously in the Earth’s early atmosphere. The Miller-Urey experiment detected the formation of amino acids in an experimental set-up which simulated the early Earth’s early stage. Two hypotheses on the starting molecules being RNA, or metabolites, were proposed which rooted some believes on the necessity of these reactions being autocatalytic to drive the existence of life (by Stuart Kauffman in 1970s). We therefore ran some experiments using computational methods, to suggest how likely such a reflexively autocatalytic set would emerge out of a randomly created food-generation network (reproduced from http://dx.doi.org/10.1016/j.biosystems.2016.12.002).

The workflow is as follows:

1. Define a closed system of molecules which are produced solely from the defined food source;
2. Reduce the system by evaluating the system against two criteria:
a. The reaction must be catalysed by a member of the closure set
b. All reactants of the reaction must be a member of the closure set
Otherwise, the individual reaction will be removed
3. Repeat steps 1 and 2 until there are no reactions which are against the rules – this gives us the maximum RAF set. (We can further reduce this into smaller RAF sets to remove reactions which are not necessary to maintain the smallest unit of autocatalysis)

After this, we attempted generating chemical reaction networks using absolutely no knowledge on chemistry: by randomly initialising the number of food reactants, molecules and reactions, and linking them using random reaction edges and random catalyses edges. Our simulation results find that, to generate reaction networks which are realistic – with 50% of chance of finding a RAF set (inferring the spontaneity of the existence of RAF set in nature), 0.4 catalyses/molecule is needed.

We also inspected how we can introduce chemical knowledge into the generation of a less random chemical reaction network, using Graph Grammar (or Graph Rewriting) to introduce the knowledge of reactions between chemical structures by segregating chemical structures at reaction sites (assumed that we know the chemical structure of the metabolites).

Given the short duration of the project time, the benefit of working in an interdisciplinary team become apparent that we can dedicate specialists to read up on specialists theories and then reconvene to exchange assimilated ideas, which is supposedly more efficient than asking an engineer to read about chemistry, and a biochemist to code from theories. It is more beneficial for the progress of the project and helps with presenting ideas to non-specialist audience.