Our highly-connected world means that infectious diseases now have a global reach—as we jet-set around the planet, so do the pathogens we carry. In this set of studies, we looked at how infectious diseases, particularly influenza and dengue, spread using a combination of machine learning, statistical methods and high-performance computing.

Some of my early work involved a genomic analysis of the dengue outbreak in Singapore back in 2009. The image below shows the relationship between viral genetic “distance” (nucleotide differences) and the temporal distance (in days). Interestingly, infections far apart in time can have very similar genetic structure, suggesting unfit viral samples are purified by selection over time. In other words, many of the mutants die out.

denguegenome

As an example of our work on influenza, the two videos below shows the difference in the spread of bird-flu on a complex network of people with and without a hub node (a super-spreader). Removing the super-spreader results in a fewer infected (at a given time), but a more drawn out outbreak.

More recently, I’ve been working on combining modern probabilistic machine learning with network metrics, with a particular emphasis on mitigating disease spread with limited resources. The figure below shows what happens when we estimate network centrality (which correspond to the hub nodes) using a machine learning method (VBC-GP) and vaccinate those individuals. On the TRAIN network (which the model was trained on), the reduction in the epidemic peak 62%; significantly better than a random vaccination. This effect carried over to a much larger TEST network with 4000 people that the model had never seen before).

vbc_epidemic

Related Papers

Predicting Networks Centralities from Node Attributes
Harold Soh
NIPS Workshop on Networks: From Graphs to Rich Data, Montreal, Canada, 2014. [ PDF | Extended version on arXiv ]

Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)
Harold Soh and Yiannis Demiris
GECCO’11: Proceedings of the 2011 conference on Genetic and Evolutionary Computation, 713-720, 2011
MR-POMDP Datasets

Temporal factors in school closure policy for mitigating the spread of influenza
Tianyou Zhang, Xiuju Fu, Chee Keong Kwoh, Gaoxi Xiao, Limsoon Wong, Stefan Ma, Harold Soh, Gary Kee Khoon Lee, Terence Hung, Michael Lees
J Public Health Policy, 32(2):180-97, 2011

Multi-reward policies for medical applications: anthrax attacks and smart wheelchairs
Harold Soh and Yiannis Demiris
MedGEC 2011, 7th GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, 471-478, 2011

Weighted complex network analysis of travel routes on the Singapore public transportation system
Harold Soh, Sonja Lim, Tianyou Zhang, Xiuju Fu, Gary Kee Khoon Lee, Terence Gih Guang Hung, Pan Di, Silvester Prakasam, Limsoon Wong
Physica A: Statistical Mechanics and its Applications 389 (24), 5852-5863, 2010

Robustness of scale-free networks under rewiring operations
S Xiao, G Xiao, TH Cheng, S Ma, X Fu, H Soh
EPL (Europhysics Letters) 89, 38002, 2010

Genomic epidemiology of a dengue virus epidemic in urban Singapore
Mark J Schreiber, Edward C Holmes, Swee Hoe Ong, Harold SH Soh, Wei Liu, Lukas Tanner, Pauline PK Aw, Hwee Cheng Tan, Lee Ching Ng, Yee Sin Leo, Jenny GH Low, Adrian Ong, Eng Eong Ooi, Subhash G Vasudevan, Martin L Hibberd
Journal of virology 83 (9), 4163-4173, 2009