Show simple item record

dc.contributor.authorKiourti, Panagiotaen_US
dc.contributor.authorWardega, Kacperen_US
dc.contributor.authorJha, Susmiten_US
dc.contributor.authorLi, Wenchaoen_US
dc.date.accessioned2019-03-15T14:49:50Z
dc.date.available2019-03-15T14:49:50Z
dc.identifier.urihttps://hdl.handle.net/2144/34292
dc.description.abstractRecent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we show that existing Trojan defense mechanisms for classification tasks are not effective in the reinforcement learning setting.en_US
dc.language.isoen_US
dc.subjectReinforcement learningen_US
dc.subjectAdversarial machine learningen_US
dc.subjectSecurity and privacyen_US
dc.subjectDeep learningen_US
dc.titleTrojDRL: Trojan Attacks on Deep Reinforcement Learning Agentsen_US
dc.typeTechnical Reporten_US


This item appears in the following Collection(s)

Show simple item record