DDPG

DDPGPolicy

class ding.policy.ddpg.DDPGPolicy(cfg: dict, model: Optional[Union[type, torch.nn.modules.module.Module]] = None, enable_field: Optional[List[str]] = None)[source]
Overview:

Policy class of DDPG algorithm.

Property:

learn_mode, collect_mode, eval_mode

Config:

ID

Symbol

Type

Default Value

Description

Other(Shape)

1

type

str

ddpg

RL policy register name, refer
to registry POLICY_REGISTRY
this arg is optional,
a placeholder

2

cuda

bool

True

Whether to use cuda for network

3

random_
collect_size

int

25000

Number of randomly collected
training samples in replay
buffer when training starts.
Default to 25000 for
DDPG/TD3, 10000 for
sac.

4

model.twin_
critic


bool

False

Whether to use two critic
networks or only one.


Default False for
DDPG, Clipped Double
Q-learning method in
TD3 paper.

5

learn.learning
_rate_actor

float

1e-3

Learning rate for actor
network(aka. policy).


6

learn.learning
_rate_critic

float

1e-3

Learning rates for critic
network (aka. Q-network).


7

learn.actor_
update_freq


int

2

When critic network updates
once, how many times will actor
network update.

Default 1 for DDPG,
2 for TD3. Delayed
Policy Updates method
in TD3 paper.

8

learn.noise




bool

False

Whether to add noise on target
network’s action.



Default False for
DDPG, True for TD3.
Target Policy Smoo-
thing Regularization
in TD3 paper.

9

learn.-
ignore_done

bool

False

Determine whether to ignore
done flag.
Use ignore_done only
in halfcheetah env.

10

learn.-
target_theta


float

0.005

Used for soft update of the
target network.


aka. Interpolation
factor in polyak aver
aging for target
networks.

11

collect.-
noise_sigma



float

0.1

Used for add noise during co-
llection, through controlling
the sigma of distribution


Sample noise from dis
tribution, Ornstein-
Uhlenbeck process in
DDPG paper, Guassian
process in ours.
_forward_collect(data: dict)dict[source]
Overview:

Forward function of collect mode.

Arguments:
  • data (dict): Dict type data, including at least [‘obs’].

Returns:
  • output (dict): Dict type data, including at least inferred action according to input obs.

_forward_eval(data: dict)dict[source]
Overview:

Forward function of collect mode, similar to self._forward_collect.

Arguments:
  • data (dict): Dict type data, including at least [‘obs’].

Returns:
  • output (dict): Dict type data, including at least inferred action according to input obs.

_forward_learn(data: dict)Dict[str, Any][source]
Overview:

Forward and backward function of learn mode.

Arguments:
  • data (dict): Dict type data, including at least [‘obs’, ‘action’, ‘reward’, ‘next_obs’]

Returns:
  • info_dict (Dict[str, Any]): Including at least actor and critic lr, different losses.

_init_collect()None[source]
Overview:

Collect mode init method. Called by self.__init__. Init traj and unroll length, collect model.

_init_eval()None[source]
Overview:

Evaluate mode init method. Called by self.__init__. Init eval model. Unlike learn and collect model, eval model does not need noise.

_init_learn()None[source]
Overview:

Learn mode init method. Called by self.__init__. Init actor and critic optimizers, algorithm config, main and target models.

_process_transition(obs: Any, model_output: dict, timestep: collections.namedtuple)Dict[str, Any][source]
Overview:

Generate dict type transition data from inputs.

Arguments:
  • obs (Any): Env observation

  • model_output (dict): Output of collect model, including at least [‘action’]

  • timestep (namedtuple): Output after env step, including at least [‘obs’, ‘reward’, ‘done’]

    (here ‘obs’ indicates obs after env step, i.e. next_obs).

Return:
  • transition (Dict[str, Any]): Dict type transition data.