TrojDRL: evaluation of backdoor attacks on deep reinforcement learning
Files
Accepted manuscript
Date
2020
Authors
Kiourti, Panagiota
Wardega, Kacper
Jha, Susmit
Li, Wenchao
Version
Accepted manuscript
OA Version
Citation
Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li. "TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning." 57th ACM/EDAC/IEEE Design Automation Conference,
Abstract
We present TrojDRL, a tool for exploring and evaluating
backdoor attacks on deep reinforcement learning agents.
TrojDRL exploits the sequential nature of deep reinforcement
learning (DRL) and considers different gradations of threat
models. We show that untargeted attacks on state-of-the-art
actor-critic algorithms can circumvent existing defenses built
on the assumption of backdoors being targeted. We evaluated
TrojDRL on a broad set of DRL benchmarks and showed that
the attacks require only poisoning as little as 0.025% of training
data. Compared with existing works of backdoor attacks on
classification models, TrojDRL provides a first step towards
understanding the vulnerability of DRL agents.
Description
License
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.