template.q_learning

Please Reference ding/ding/docs/source/api_doc/model/template/q_learning.py for usage

DQN

class ding.model.template.q_learning.DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]
__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)None[source]
Overview:

Init the DQN (encoder + head) Model according to input arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].

  • action_shape (Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.

  • dueling (dueling): Whether choose DuelingHead or DiscreteHead(default).

  • head_hidden_size (Optional[int]): The hidden_size of head network.

  • head_layer_num (int): The number of layers used in the head network to compute Q value output

  • activation (Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU()

  • norm_type (Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.

forward(x: torch.Tensor)Dict[source]
Overview:

DQN forward computation graph, input observation tensor to predict q_value.

Arguments:
  • x (torch.Tensor): Observation inputs

Returns:
  • outputs (Dict): DQN forward outputs, such as q_value.

ReturnsKeys:
  • logit (torch.Tensor): Discrete Q-value output of each action dimension.

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is obs_shape

  • logit (torch.FloatTensor): \((B, M)\), where B is batch size and M is action_shape

Examples:
>>> model = DQN(32, 6)  # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])

C51DQN

class ding.model.template.q_learning.C51DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]
__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)None[source]
Overview:

Init the C51 Model according to input arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation’s space.

  • action_shape (Union[int, SequenceType]): Action’s space.

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder

  • head_hidden_size (Optional[int]): The hidden_size to pass to Head.

  • head_layer_num (int): The num of layers used in the network to compute Q value output

  • activation (Optional[nn.Module]):

    The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]):

    The type of normalization to use, see ding.torch_utils.fc_block for more details`

  • n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor)Dict[source]
Overview:

Use observation tensor to predict C51DQN’s output. Parameter updates with C51DQN’s MLPs forward setup.

Arguments:
  • x (torch.Tensor):

    The encoded embedding tensor w/ (B, N=head_hidden_size).

Returns:
  • outputs (Dict):

    Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:
  • logit (torch.Tensor): Logit tensor with same size as input x.

  • distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.

  • logit (torch.FloatTensor): \((B, M)\), where M is action_shape.

  • distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:
>>> model = C51DQN(128, 64)  # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 128)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> # default head_hidden_size: int = 64,
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int = 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

QRDQN

class ding.model.template.q_learning.QRDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]
__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)None[source]
Overview:

Init the QRDQN Model according to input arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation’s space.

  • action_shape (Union[int, SequenceType]): Action’s space.

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder

  • head_hidden_size (Optional[int]): The hidden_size to pass to Head.

  • head_layer_num (int): The num of layers used in the network to compute Q value output

  • num_quantiles (int): Number of quantiles in the prediction distribution.

  • activation (Optional[nn.Module]):

    The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]):

    The type of normalization to use, see ding.torch_utils.fc_block for more details`

forward(x: torch.Tensor)Dict[source]
Overview:

Use observation tensor to predict QRDQN’s output. Parameter updates with QRDQN’s MLPs forward setup.

Arguments:
  • x (torch.Tensor):

    The encoded embedding tensor with (B, N=hidden_size).

Returns:
  • outputs (Dict):

    Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:
  • logit (torch.Tensor): Logit tensor with same size as input x.

  • q (torch.Tensor): Q valye tensor tensor of size (B, N, num_quantiles)

  • tau (torch.Tensor): tau tensor of size (B, N, 1)

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.

  • logit (torch.FloatTensor): \((B, M)\), where M is action_shape.

  • tau (torch.Tensor): \((B, M, 1)\)

Examples:
>>> model = QRDQN(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles : int = 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])

IQN

class ding.model.template.q_learning.IQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]
__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)None[source]
Overview:

Init the IQN Model according to input arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation space shape.

  • action_shape (Union[int, SequenceType]): Action space shape.

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder

  • head_hidden_size (Optional[int]): The hidden_size to pass to Head.

  • head_layer_num (int): The num of layers used in the network to compute Q value output

  • num_quantiles (int): Number of quantiles in the prediction distribution.

  • activation (Optional[nn.Module]):

    The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]):

    The type of normalization to use, see ding.torch_utils.fc_block for more details.

forward(x: torch.Tensor)Dict[source]
Overview:

Use encoded embedding tensor to predict IQN’s output. Parameter updates with IQN’s MLPs forward setup.

Arguments:
  • x (torch.Tensor):

    The encoded embedding tensor with (B, N=hidden_size).

Returns:
  • outputs (Dict):

    Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:
  • logit (torch.Tensor): Logit tensor with same size as input x.

  • q (torch.Tensor): Q valye tensor tensor of size (num_quantiles, N, B)

  • quantiles (torch.Tensor): quantiles tensor of size (quantile_embedding_size, 1)

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.

  • logit (torch.FloatTensor): \((B, M)\), where M is action_shape

  • quantiles (torch.Tensor): \((P, 1)\), where P is quantile_embedding_size.

Examples:
>>> model = IQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64]
>>> # default quantile_embedding_size: int = 128
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])

RainbowDQN

class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]
Overview:

RainbowDQN network (C51 + Dueling + Noisy Block)

Note

RainbowDQN contains dueling architecture by default

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)None[source]
Overview:

Init the Rainbow Model according to arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation space shape.

  • action_shape (Union[int, SequenceType]): Action space shape.

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder

  • head_hidden_size (Optional[int]): The hidden_size to pass to Head.

  • head_layer_num (int): The num of layers used in the network to compute Q value output

  • activation (Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details`

  • n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor)Dict[source]
Overview:

Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.

Arguments:
  • x (torch.Tensor):

    The encoded embedding tensor with (B, N=hidden_size).

Returns:
  • outputs (Dict):

    Run MLP with RainbowHead setups and return the result prediction dictionary.

ReturnsKeys:
  • logit (torch.Tensor): Logit tensor with same size as input x.

  • distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.

  • logit (torch.FloatTensor): \((B, M)\), where M is action_shape.

  • distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

DRQN

class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]
Overview:

RainbowDQN network (C51 + Dueling + Noisy Block)

Note

RainbowDQN contains dueling architecture by default

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)None[source]
Overview:

Init the Rainbow Model according to arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation space shape.

  • action_shape (Union[int, SequenceType]): Action space shape.

  • encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder

  • head_hidden_size (Optional[int]): The hidden_size to pass to Head.

  • head_layer_num (int): The num of layers used in the network to compute Q value output

  • activation (Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details`

  • n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor)Dict[source]
Overview:

Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.

Arguments:
  • x (torch.Tensor):

    The encoded embedding tensor with (B, N=hidden_size).

Returns:
  • outputs (Dict):

    Run MLP with RainbowHead setups and return the result prediction dictionary.

ReturnsKeys:
  • logit (torch.Tensor): Logit tensor with same size as input x.

  • distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:
  • x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.

  • logit (torch.FloatTensor): \((B, M)\), where M is action_shape.

  • distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])