Deep reinforcement learning has become increasingly relevant in fields such as recommendation systems, finance, and healthcare. While games have been a popular area of study  in deep reinforcement learning due to their clearly defined spaces that fit within Markov decision processes, most research has focused on zero-sum environments, such as board games like Go, Chess, and complex computer games. However, in real-world scenarios, interacting agents often need to consider both competitive and cooperative elements. This has motivated the extension of focus of zero-sum environments to non-zero-sum environments.

 

To this end, we propose an architecture for weighted human imitation learning that combines weighted batch learning and deep reinforcement learning methods. While previous research has focused on performance-based metrics for evaluation, we introduce clustering and classification methods as novel ways of assessing the performance of human-AI classification. By conducting human trials, we aim to demonstrate that humans have a preference for human-imitation agents in collaborative and non-zero-sum environments, independent of performance-based measures.

 

Research by:
Ken Liu

PhD Candidate
School of Computer Science