DDPG¶
DDPGPolicy¶
- class ding.policy.ddpg.DDPGPolicy(cfg: dict, model: Optional[Union[type, torch.nn.modules.module.Module]] = None, enable_field: Optional[List[str]] = None)[source]¶
- Overview:
Policy class of DDPG algorithm.
- Property:
learn_mode, collect_mode, eval_mode
Config:
ID
Symbol
Type
Default Value
Description
Other(Shape)
1
typestr
ddpg
RL policy register name, referto registryPOLICY_REGISTRYthis arg is optional,a placeholder2
cudabool
True
Whether to use cuda for network3
random_collect_sizeint
25000
Number of randomly collectedtraining samples in replaybuffer when training starts.Default to 25000 forDDPG/TD3, 10000 forsac.4
model.twin_criticbool
False
Whether to use two criticnetworks or only one.Default False forDDPG, Clipped DoubleQ-learning method inTD3 paper.5
learn.learning_rate_actorfloat
1e-3
Learning rate for actornetwork(aka. policy).6
learn.learning_rate_criticfloat
1e-3
Learning rates for criticnetwork (aka. Q-network).7
learn.actor_update_freqint
2
When critic network updatesonce, how many times will actornetwork update.Default 1 for DDPG,2 for TD3. DelayedPolicy Updates methodin TD3 paper.8
learn.noisebool
False
Whether to add noise on targetnetwork’s action.Default False forDDPG, True for TD3.Target Policy Smoo-thing Regularizationin TD3 paper.9
learn.-ignore_donebool
False
Determine whether to ignoredone flag.Use ignore_done onlyin halfcheetah env.10
learn.-target_thetafloat
0.005
Used for soft update of thetarget network.aka. Interpolationfactor in polyak averaging for targetnetworks.11
collect.-noise_sigmafloat
0.1
Used for add noise during co-llection, through controllingthe sigma of distributionSample noise from distribution, Ornstein-Uhlenbeck process inDDPG paper, Guassianprocess in ours.- _forward_collect(data: dict) → dict[source]¶
- Overview:
Forward function of collect mode.
- Arguments:
data (
dict): Dict type data, including at least [‘obs’].
- Returns:
output (
dict): Dict type data, including at least inferred action according to input obs.
- _forward_eval(data: dict) → dict[source]¶
- Overview:
Forward function of collect mode, similar to
self._forward_collect.- Arguments:
data (
dict): Dict type data, including at least [‘obs’].
- Returns:
output (
dict): Dict type data, including at least inferred action according to input obs.
- _forward_learn(data: dict) → Dict[str, Any][source]¶
- Overview:
Forward and backward function of learn mode.
- Arguments:
data (
dict): Dict type data, including at least [‘obs’, ‘action’, ‘reward’, ‘next_obs’]
- Returns:
info_dict (
Dict[str, Any]): Including at least actor and critic lr, different losses.
- _init_collect() → None[source]¶
- Overview:
Collect mode init method. Called by
self.__init__. Init traj and unroll length, collect model.
- _init_eval() → None[source]¶
- Overview:
Evaluate mode init method. Called by
self.__init__. Init eval model. Unlike learn and collect model, eval model does not need noise.
- _init_learn() → None[source]¶
- Overview:
Learn mode init method. Called by
self.__init__. Init actor and critic optimizers, algorithm config, main and target models.
- _process_transition(obs: Any, model_output: dict, timestep: collections.namedtuple) → Dict[str, Any][source]¶
- Overview:
Generate dict type transition data from inputs.
- Arguments:
obs (
Any): Env observationmodel_output (
dict): Output of collect model, including at least [‘action’]- timestep (
namedtuple): Output after env step, including at least [‘obs’, ‘reward’, ‘done’] (here ‘obs’ indicates obs after env step, i.e. next_obs).
- timestep (
- Return:
transition (
Dict[str, Any]): Dict type transition data.