template.q_learning¶

Please Reference ding/ding/docs/source/api_doc/model/template/q_learning.py for usage

DQN¶

class ding.model.template.q_learning.DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶

Overview:

Init the DQN (encoder + head) Model according to input arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].
action_shape (Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder, the last element must match head_hidden_size.
dueling (dueling): Whether choose DuelingHead or DiscreteHead(default).
head_hidden_size (Optional[int]): The hidden_size of head network.
head_layer_num (int): The number of layers used in the head network to compute Q value output
activation (Optional[nn.Module]): The type of activation function in networks if None then default set it to nn.ReLU()
norm_type (Optional[str]): The type of normalization in networks, see ding.torch_utils.fc_block for more details.

forward(x: torch.Tensor) → Dict[source]¶

Overview:

DQN forward computation graph, input observation tensor to predict q_value.

Arguments:

x (torch.Tensor): Observation inputs

Returns:

outputs (Dict): DQN forward outputs, such as q_value.

ReturnsKeys:

logit (torch.Tensor): Discrete Q-value output of each action dimension.

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is obs_shape
logit (torch.FloatTensor): \((B, M)\), where B is batch size and M is action_shape

Examples:

>>> model = DQN(32, 6)  # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 32)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])

C51DQN¶

class ding.model.template.q_learning.C51DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶

Overview:

Init the C51 Model according to input arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation’s space.
action_shape (Union[int, SequenceType]): Action’s space.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
head_hidden_size (Optional[int]): The hidden_size to pass to Head.
head_layer_num (int): The num of layers used in the network to compute Q value output
activation (Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`
n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor) → Dict[source]¶

Overview:

Use observation tensor to predict C51DQN’s output. Parameter updates with C51DQN’s MLPs forward setup.

Arguments:

x (torch.Tensor):
The encoded embedding tensor w/ (B, N=head_hidden_size).

Returns:

outputs (Dict):
Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:

logit (torch.Tensor): Logit tensor with same size as input x.
distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.
logit (torch.FloatTensor): \((B, M)\), where M is action_shape.
distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:

>>> model = C51DQN(128, 64)  # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 128)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> # default head_hidden_size: int = 64,
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int = 51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

QRDQN¶

class ding.model.template.q_learning.QRDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶

Overview:

Init the QRDQN Model according to input arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation’s space.
action_shape (Union[int, SequenceType]): Action’s space.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
head_hidden_size (Optional[int]): The hidden_size to pass to Head.
head_layer_num (int): The num of layers used in the network to compute Q value output
num_quantiles (int): Number of quantiles in the prediction distribution.
activation (Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details`

forward(x: torch.Tensor) → Dict[source]¶

Overview:

Use observation tensor to predict QRDQN’s output. Parameter updates with QRDQN’s MLPs forward setup.

Arguments:

x (torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).

Returns:

outputs (Dict):
Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:

logit (torch.Tensor): Logit tensor with same size as input x.
q (torch.Tensor): Q valye tensor tensor of size (B, N, num_quantiles)
tau (torch.Tensor): tau tensor of size (B, N, 1)

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.
logit (torch.FloatTensor): \((B, M)\), where M is action_shape.
tau (torch.Tensor): \((B, M, 1)\)

Examples:

>>> model = QRDQN(64, 64)
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles : int = 32
>>> assert outputs['q'].shape == torch.Size([4, 64, 32])
>>> assert outputs['tau'].shape == torch.Size([4, 32, 1])

IQN¶

class ding.model.template.q_learning.IQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶

Overview:

Init the IQN Model according to input arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation space shape.
action_shape (Union[int, SequenceType]): Action space shape.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
head_hidden_size (Optional[int]): The hidden_size to pass to Head.
head_layer_num (int): The num of layers used in the network to compute Q value output
num_quantiles (int): Number of quantiles in the prediction distribution.
activation (Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.

forward(x: torch.Tensor) → Dict[source]¶

Overview:

Use encoded embedding tensor to predict IQN’s output. Parameter updates with IQN’s MLPs forward setup.

Arguments:

x (torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).

Returns:

outputs (Dict):
Run with encoder and head. Return the result prediction dictionary.

ReturnsKeys:

logit (torch.Tensor): Logit tensor with same size as input x.
q (torch.Tensor): Q valye tensor tensor of size (num_quantiles, N, B)
quantiles (torch.Tensor): quantiles tensor of size (quantile_embedding_size, 1)

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.
logit (torch.FloatTensor): \((B, M)\), where M is action_shape
quantiles (torch.Tensor): \((P, 1)\), where P is quantile_embedding_size.

Examples:

>>> model = IQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default num_quantiles: int = 32
>>> assert outputs['q'].shape == torch.Size([32, 4, 64]
>>> # default quantile_embedding_size: int = 128
>>> assert outputs['quantiles'].shape == torch.Size([128, 1])

RainbowDQN¶

class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶

Overview:: RainbowDQN network (C51 + Dueling + Noisy Block)

Note

RainbowDQN contains dueling architecture by default

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶

Overview:

Init the Rainbow Model according to arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation space shape.
action_shape (Union[int, SequenceType]): Action space shape.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
head_hidden_size (Optional[int]): The hidden_size to pass to Head.
head_layer_num (int): The num of layers used in the network to compute Q value output
activation (Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details`
n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor) → Dict[source]¶

Overview:

Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.

Arguments:

x (torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).

Returns:

outputs (Dict):
Run MLP with RainbowHead setups and return the result prediction dictionary.

ReturnsKeys:

logit (torch.Tensor): Logit tensor with same size as input x.
distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.
logit (torch.FloatTensor): \((B, M)\), where M is action_shape.
distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:

>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])

DRQN¶

class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶

Overview:: RainbowDQN network (C51 + Dueling + Noisy Block)

Note

RainbowDQN contains dueling architecture by default

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶

Overview:

Init the Rainbow Model according to arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation space shape.
action_shape (Union[int, SequenceType]): Action space shape.
encoder_hidden_size_list (SequenceType): Collection of hidden_size to pass to Encoder
head_hidden_size (Optional[int]): The hidden_size to pass to Head.
head_layer_num (int): The num of layers used in the network to compute Q value output
activation (Optional[nn.Module]): The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]): The type of normalization to use, see ding.torch_utils.fc_block for more details`
n_atom (Optional[int]): Number of atoms in the prediction distribution.

forward(x: torch.Tensor) → Dict[source]¶

Overview:

Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.

Arguments:

x (torch.Tensor):
The encoded embedding tensor with (B, N=hidden_size).

Returns:

outputs (Dict):
Run MLP with RainbowHead setups and return the result prediction dictionary.

ReturnsKeys:

logit (torch.Tensor): Logit tensor with same size as input x.
distribution (torch.Tensor): Distribution tensor of size (B, N, n_atom)

Shapes:

x (torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.
logit (torch.FloatTensor): \((B, M)\), where M is action_shape.
distribution(torch.FloatTensor): \((B, M, P)\), where P is n_atom.

Examples:

>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape'
>>> inputs = torch.randn(4, 64)
>>> outputs = model(inputs)
>>> assert isinstance(outputs, dict)
>>> assert outputs['logit'].shape == torch.Size([4, 64])
>>> # default n_atom: int =51
>>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])