notesum.ai
Published at November 8Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
cs.LG
cs.AI
cs.DC
Released Date: November 8, 2024
Authors: Zhihong Liu1, Xin Xu1, Peng Qiao1, Dongsheng Li1
Aff.: 1National University of Defense Technology, China

| Methods | Computing parallelism Types | Implementation Details | Major Results | ||||
| CC | MP/MT | GPU | FPGA | TPU | |||
| Gorila(Nair2015, ) | ✓ | 31 machines | 10× speedup over GPU implementation | ||||
| Ape-X(Horgan2018, ) | ✓ | ✓ | 360 CPU cores and 1 P100 GPU | 4× median scores over Gorila | |||
| R2D2(kapturowski2018recurrent, ) | ✓ | ✓ | 256 actors and 1 GPU | 4× median scores over Ape-X | |||
| IMPALA(Espeholt2018, ) | ✓ | ✓ | ✓ | 500 CPU cores and 8 P100 GPUs | 250K FPS and multi-task setting | ||
| Ray RLlib (liang2017ray, ) | ✓ | ✓ | ✓ | 8,192 CPU cores on EC2 | completes training Mojoco in 3.7mins | ||
| ARS(Mania2018, ) | ✓ | 48 CPU cores on EC2 | 15× speedup over ES-based method(salimans2017evolution, ) | ||||
| A3C(Mnih2016, ) | ✓ | 16 CPU cores | 2× speedup over K40 GPU implementation | ||||
| Reactor(Gruslys2018, ) | ✓ | ✓ | 20 CPU cores | 4× speedup over A3C | |||
| DBA3C(Adamski2018, ) | ✓ | ✓ | 64 nodes with 768 CPU cores | completes training Atrai 2600 in 21 mins | |||
| DPPO(Heess2017, ) | ✓ | 64 actors | >20× speedup over A3C | ||||
| D4PG(Radients2018, ) | ✓ | 64 CPU cores | 4× higher return than PPO | ||||
| SampleFactory(Petrenko2020, ) | ✓ | ✓ | 36 CPU cores and a 2080Ti GPU | 4× speedup over SEED_RL | |||
| GA3C(Babaeizadeh2017, ) | ✓ | ✓ | 16 CPU cores and 1 Titan X GPU | 45× speedup over A3C | |||
| PAAC(clemente2017efficient, ) | ✓ | ✓ | 4 CPU cores and a GTX 980 Ti GPU | >6× speedup over Gorila | |||
| rlpyt(Stooke, )(Stooke2018, ) | ✓ | ✓ | 8 P100 GPUs and 40 CPU cores | 6× speedup using 8 GPUs relative to 1 GPU | |||
| Dactyl(Andrychowicz2020, ) | ✓ | ✓ | ✓ | 384 nodes (6144 cores and 8 GPUs) | 5.5× speedup over implementation with 1 GPU and 768 CPU cores | ||
| DD-PPO(Wijmans2019, ) | ✓ | ✓ | 256 V100 GPUs | 196× speed up over 1 V100 GPU | |||
| MSRL(zhu2023msrl, ) | ✓ | ✓ | 64 GPUs | 3× speedup over Ray RLlib | |||
| SRL(mei2023srl, ) | ✓ | ✓ | 15K CPU cores and 32 A100 GPUs | 5× speedup over OpenAI Rapid(berner2019dota, ) | |||
| SpeedyZero(mei2023speedyzero, ) | ✓ | ✓ | 192 CPU cores and 20 A100 GPUs | mastering Atari benchmark within 35 minutes using only 300k samples. | |||
| NNQL(Su2017, ) | ✓ | Arria 10 AX066 FPGA | 346× speedup over GTX 760 GPU | ||||
| TRPO_FPGA(Shao2017, ) | ✓ | Intel Stratix-V FPGA | 19.29× speedup over i7 CPU | ||||
| DDPG_FPGA(Guo2019, ) | ✓ | Intel Stratix-V FPGA | 4.53× speedup over i7-6700 CPU core | ||||
| FA3C(Cho2019, ) | ✓ | ✓ | Xilinx VCU1525 VU9P FPGA | 27.9% better than Tesla P100 | |||
| PPO_FPGA(Meng2020, ) | ✓ | ✓ | Xilinx Alveo U200 | 27.5× speedup against Titan Xp GPU | |||
| On-chip replay(Meng2022, ) | ✓ | ✓ | Xilinx Alveo U200 acceler | 4.3× higher IPS over GTX 3090 GPU | |||
| AlphaZero(Silver2017, ) | ✓ | ✓ | ✓ | 5000 TPUs v1 and 64 TPUs v2 cores | defeats world-champion program by training within 24 hours | ||
| AlphaStar(vinyals2019grandmaster, ) | ✓ | ✓ | ✓ | 3,072 TPU v3 and 50,400 CPU cores | achieves above 99.8% of ranked human players by training in 44 days | ||
| OpenAI Five(berner2019dota, ) | ✓ | ✓ | ✓ | 1,536 GPUs and 172,800 CPU cores | defeats Dota 2 world champion (Team OG) by training in 10 months | ||
| GATO(Reed2022, ) | ✓ | ✓ | ✓ | 256 TPU v3 cores | handles 604 distinct tasks with a single network | ||
| SEED_RL(espeholt2020seed, ) | ✓ | ✓ | ✓ | 520 CPU and 8 TPU v3 cores | 11× faster than the IMPALA with a P100 GPU | ||