Our highly-connected world means that infectious diseases now have a global reach—as we jet-set around the planet, so do the pathogens we carry. In this set of studies, we looked at how infectious diseases, particularly influenza and dengue, spread using a combination of machine learning, statistical methods and high-performance computing.

Some of my early work involved a genomic analysis of the dengue outbreak in Singapore back in 2009. The image below shows the relationship between viral genetic “distance” (nucleotide differences) and the temporal distance (in days). Interestingly, infections far apart in time can have very similar genetic structure, suggesting unfit viral samples are purified by selection over time. In other words, many of the mutants die out.

As an example of our work on influenza, the two videos below shows the difference in the spread of bird-flu on a complex network of people with and without a hub node (a *super-spreader).* Removing the super-spreader results in a fewer infected (at a given time), but a more drawn out outbreak.

More recently, I’ve been working on combining modern probabilistic machine learning with network metrics, with a particular emphasis on mitigating disease spread with limited resources. The figure below shows what happens when we estimate network centrality (which correspond to the hub nodes) using a machine learning method (VBC-GP) and vaccinate those individuals. On the TRAIN network (which the model was trained on), the reduction in the epidemic peak 62%; significantly better than a random vaccination. This effect carried over to a much larger TEST network with 4000 people that the model had never seen before).

### Related Papers

**Predicting Networks Centralities from Node Attributes**

**Harold Soh**

NIPS Workshop on Networks: From Graphs to Rich Data, Montreal, Canada, 2014. [ PDF |** **Extended version on arXiv** **]

**Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)**

**Harold Soh** and Yiannis Demiris

GECCO’11: Proceedings of the 2011 conference on Genetic and Evolutionary Computation, 713-720, 2011

**MR-POMDP Datasets**

**Temporal factors in school closure policy for mitigating the spread of influenza
**Tianyou Zhang, Xiuju Fu, Chee Keong Kwoh, Gaoxi Xiao, Limsoon Wong, Stefan Ma,

**Harold Soh**, Gary Kee Khoon Lee, Terence Hung, Michael Lees

J Public Health Policy, 32(2):180-97, 2011

**Multi-reward policies for medical applications: anthrax attacks and smart wheelchairs**

**Harold Soh** and Yiannis Demiris

MedGEC 2011, 7th GECCO Workshop on Medical Applications of Genetic and Evolutionary Computation, 471-478, 2011

**Weighted complex network analysis of travel routes on the Singapore public transportation system**

**Harold Soh**, Sonja Lim, Tianyou Zhang, Xiuju Fu, Gary Kee Khoon Lee, Terence Gih Guang Hung, Pan Di, Silvester Prakasam, Limsoon Wong

Physica A: Statistical Mechanics and its Applications 389 (24), 5852-5863, 2010

**Robustness of scale-free networks under rewiring operations**

S Xiao, G Xiao, TH Cheng, S Ma, X Fu, **H Soh**

EPL (Europhysics Letters) 89, 38002, 2010

**Genomic epidemiology of a dengue virus epidemic in urban Singapore**

Mark J Schreiber, Edward C Holmes, Swee Hoe Ong, **Harold SH Soh**, Wei Liu, Lukas Tanner, Pauline PK Aw, Hwee Cheng Tan, Lee Ching Ng, Yee Sin Leo, Jenny GH Low, Adrian Ong, Eng Eong Ooi, Subhash G Vasudevan, Martin L Hibberd

Journal of virology 83 (9), 4163-4173, 2009