Algorithm Overview

Please refer to DI-zoo

Atari Benchmark

  • DQN

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    20

    n/a

    20

    Qbert

    10M

    17866

    7307

    7968

    9496

    Space Invaders

    10M

    1880

    812

    1001

    622

  • C51

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    20

    n/a

    n/a

    Qbert

    10M

    19034

    16245

    15780

    n/a

    Space Invaders

    10M

    2396

    989

    1025

    n/a

  • QRDQN

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    20

    n/a

    n/a

    Qbert

    10M

    18906

    14990

    n/a

    n/a

    Space Invaders

    10M

    2082

    938

    n/a

    n/a

  • IQN

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    20

    n/a

    n/a

    Qbert

    10M

    16331

    15520

    n/a

    n/a

    Space Invaders

    10M

    1493

    1370

    n/a

    n/a

  • Rainbow

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    n/a

    n/a

    n/a

    Qbert

    10M

    20363

    n/a

    ~20000

    n/a

    Space Invaders

    10M

    2229

    n/a

    ~2000

    n/a

  • A2C

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    n/a

    n/a

    20

    Qbert

    10M

    4388

    n/a

    3620

    3882

    Space Invaders

    10M

    755

    n/a

    692

    627

  • PPO

    Env Steps

    Ours

    Tianshou

    Rllib(25M)

    Sb3

    Pong

    10M

    20

    n/a

    n/a

    20

    Qbert

    10M

    16775

    n/a

    14247

    15627

    Space Invaders

    10M

    1459

    n/a

    944

    960

  • PPG

    Env Steps

    Ours

    Tianshou

    Rllib

    Sb3

    Pong

    10M

    20

    n/a

    n/a

    n/a

    Qbert

    10M

    17013

    n/a

    n/a

    n/a

    Space Invaders

    10M

    1200

    n/a

    n/a

    n/a

Mujoco Benchmark

  • DDPG

    Env Steps

    Ours

    Tianshou

    Spiningup

    Ant

    1M

    1700

    990

    840

    HalfCheetah

    1M

    10600

    11719

    11000

    Hopper

    1M

    3700

    2197

    1800

    Walker2d

    1M

    3100

    1401

    1950

  • TD3

    Env Steps

    Ours

    Tianshou

    Spiningup

    Ant

    1M

    4927

    5116

    3880

    HalfCheetah

    1M

    8200

    10201

    9750

    Hopper

    1M

    3581

    3472

    2860

    Walker2d

    1M

    4570

    3982

    4000

  • SAC

    Env Steps

    Ours

    Tianshou

    Spiningup

    Ant

    1M

    5700

    5850

    3980

    HalfCheetah

    1M

    12000

    12139

    11520

    Humanoid

    1M

    5319

    5489

    n/a

    Walker2d

    1M

    5208

    5007

    4250

  • PPO

    Env Steps

    Ours

    Tianshou

    Spiningup

    HalfCheetah

    1M

    3120

    5784

    1670

    Hopper

    1M

    2300

    2609

    1850

    Walker2d

    1M

    4000

    3589

    1230

SMAC Benchmark

  • QMIX

    Env Steps

    Ours(win rate)

    Ours(time)

    Pymarl(win rate)

    Pymarl(time)

    3s5z

    2M

    1

    3.2h

    1

    9.5h

    5m vs. 6m

    2M

    0.73

    6.5h

    0.76

    7.5h

  • CollaQ

    Env Steps

    Ours(win rate)

    Ours(time)

    Pymarl(win rate)

    Pymarl(time)

    3s5z

    2M

    1

    7.6h

    1

    28h

    5m vs. 6m

    2M

    0.82

    9.5h

    0.8

    24h

  • COMA

    Env Steps

    Ours(win rate)

    Ours(time)

    Pymarl(win rate)

    Pymarl(time)

    3s5z

    2M

    0.9

    2.9h

    0

    2.7h