Module 5: Statistics

A crash course of Statistics – covering Bayesian theories, Generalised Linear model, Markov Chain and its extension: Hidden Markov model and Monte Carlo Markov Chain. Lectures, exercises and practicals are rather intense and require substantial prior experience with statistics and R. However what seem to be able to stay on our minds is the group project that we do. Groups were pre-allocated and luckily, my group has quite a diverse range of expertise and we all complement each other’s skills.

Our project was on denoising images using Ising Markov Random Fields. Markov Random Fields suggest the formula for calculating the probability distribution of undirected graphs:

$ P(x) = \frac{1}{Z} exp(\varepsilon(x))$

Ising MRF is a special case that accounts for the fact that neighbouring nodes usually have the same characteristic (spin):

$ \varepsilon (x)=\frac{1}{2} \sum\limits_{i} \sum\limits_{j \in \mathcal{N}_i} \beta_{i,j} \delta_{x_i x_j}$
where $ delta_{x_i x_j}=
\begin{cases}1 & \quad \text{if } x_i = x_j\\
0 & \quad \text{otherwise} \\
\end{cases}$

The algorithm initially calculates the probability of a randomly generated image being the true image before the noise was introduced. We then change the states of the pixels which have a lower probability of being the true state. Since we are only dealing with binary images (Black and White, no grayscale), this is significantly simplified to flipping black pixels into white or vice versa. After this is done, we recalculate the probability of the new image being the true image: if the probability is lowered, we reject the configuration; otherwise we set this as the new guess image.

Updating the probability calculations after every pixel update is not the most efficient way to do so. Therefore we implemented the Gibbs update function which maximises the pixel’s probability by comparing the probability of it being black and white, picking the state with a higher probability, then immediately updating the current guess configuration and move on to the next pixel (at random order). It sped up the optimisation incredibly – the resultant image converges after 2 sweeps.

And then we saw another problem with parameter optimisation – the parameter changes the optimal configuration: so when we change the parameter, we get a different optimal configuration. How do we optimise both variables together? Gibbs Sampling comes again: we first sample the possible optimal images; then with the distribution of optimal images, we sample the best distribution of our parameters. Taking the distribution of parameters, we sample the distribution of optimal images. This process can iterate forever – we can only observe a convergence of parameter and an “average” configuration. This demonstrates the MCMC method of optimsation.

Nothing is better than putting theories into practice. I had really enjoyed this project – and acquired a lot more insights into different modules of statistics. Our final presentation was mind-blowing with groups presenting on neural network, clustering algoirthms based on statistics and fun applications on financial modelling (my pet!) – a very nice way to end the course!