TrojDRL: evaluation of backdoor attacks on deep reinforcement learning

Kiourti, Panagiota; Wardega, Kacper; Jha, Susmit; Li, Wenchao

TrojDRL: evaluation of backdoor attacks on deep reinforcement learning

Files

2094_file_Paper (3).pdf(275.94 KB)

Accepted manuscript

Date

2020

DOI

10.1109/DAC18072.2020.9218663

Authors

Kiourti, Panagiota

Wardega, Kacper

Jha, Susmit

Li, Wenchao

Version

Accepted manuscript

URI

https://hdl.handle.net/2144/41836

Citation

Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li. "TrojDRL: Evaluation of Backdoor Attacks on Deep Reinforcement Learning." 57th ACM/EDAC/IEEE Design Automation Conference,

Abstract

We present TrojDRL, a tool for exploring and evaluating backdoor attacks on deep reinforcement learning agents. TrojDRL exploits the sequential nature of deep reinforcement learning (DRL) and considers different gradations of threat models. We show that untargeted attacks on state-of-the-art actor-critic algorithms can circumvent existing defenses built on the assumption of backdoors being targeted. We evaluated TrojDRL on a broad set of DRL benchmarks and showed that the attacks require only poisoning as little as 0.025% of training data. Compared with existing works of backdoor attacks on classification models, TrojDRL provides a first step towards understanding the vulnerability of DRL agents.

License

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Collections

BU Open Access Articles
ENG: Electrical and Computer Engineering: Scholarly Papers

Full item page