Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)

Plans and decisions in many real-world scenarios are made under uncertainty and to satisfy multiple, possibly conflicting, objectives. In this work, we contribute the multi-reward partially-observable Markov decision process (MR-POMDP) as a general modelling framework. To solve MR-POMDPs, we present two hybrid (memetic) multi-objective evolutionary algorithms that generate non-dominated sets of policies (in the form of stochastic finite state controllers). Performance comparisons between the methods on multi-objective problems in robotics (with 2, 3 and 5 objectives), web-advertising (with 3, 4 and 5 objectives) and infectious disease control (with 3 objectives), revealed that memetic variants outperformed their original counterparts. We anticipate that the MR-POMDP along with multi-objective evolutionary solvers will prove useful in a variety of theoretical and real-world applications.

Download | ACM Digital Library Link

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s