template.q_learning¶
Please Reference ding/ding/docs/source/api_doc/model/template/q_learning.py for usage
DQN¶
- class ding.model.template.q_learning.DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], dueling: bool = True, head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶
- Overview:
Init the DQN (encoder + head) Model according to input arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation space shape, such as 8 or [4, 84, 84].action_shape (
Union[int, SequenceType]): Action space shape, such as 6 or [2, 3, 3].encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoder, the last element must matchhead_hidden_size.dueling (
dueling): Whether chooseDuelingHeadorDiscreteHead(default).head_hidden_size (
Optional[int]): Thehidden_sizeof head network.head_layer_num (
int): The number of layers used in the head network to compute Q value outputactivation (
Optional[nn.Module]): The type of activation function in networks ifNonethen default set it tonn.ReLU()norm_type (
Optional[str]): The type of normalization in networks, seeding.torch_utils.fc_blockfor more details.
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
DQN forward computation graph, input observation tensor to predict q_value.
- Arguments:
x (
torch.Tensor): Observation inputs
- Returns:
outputs (
Dict): DQN forward outputs, such as q_value.
- ReturnsKeys:
logit (
torch.Tensor): Discrete Q-value output of each action dimension.
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N isobs_shapelogit (
torch.FloatTensor): \((B, M)\), where B is batch size and M isaction_shape
- Examples:
>>> model = DQN(32, 6) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 32) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) and outputs['logit'].shape == torch.Size([4, 6])
C51DQN¶
- class ding.model.template.q_learning.C51DQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: int = 64, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶
- Overview:
Init the C51 Model according to input arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation’s space.action_shape (
Union[int, SequenceType]): Action’s space.encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoderhead_hidden_size (
Optional[int]): Thehidden_sizeto pass toHead.head_layer_num (
int): The num of layers used in the network to compute Q value output- activation (
Optional[nn.Module]): The type of activation function to use in
MLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()
- activation (
- norm_type (
Optional[str]): The type of normalization to use, see
ding.torch_utils.fc_blockfor more details`
- norm_type (
n_atom (
Optional[int]): Number of atoms in the prediction distribution.
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
Use observation tensor to predict C51DQN’s output. Parameter updates with C51DQN’s MLPs forward setup.
- Arguments:
- x (
torch.Tensor): The encoded embedding tensor w/
(B, N=head_hidden_size).
- x (
- Returns:
- outputs (
Dict): Run with encoder and head. Return the result prediction dictionary.
- outputs (
- ReturnsKeys:
logit (
torch.Tensor): Logit tensor with same size as inputx.distribution (
torch.Tensor): Distribution tensor of size(B, N, n_atom)
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.logit (
torch.FloatTensor): \((B, M)\), where M is action_shape.distribution(
torch.FloatTensor): \((B, M, P)\), where P is n_atom.
- Examples:
>>> model = C51DQN(128, 64) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 128) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> # default head_hidden_size: int = 64, >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default n_atom: int = 51 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
QRDQN¶
- class ding.model.template.q_learning.QRDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶
- Overview:
Init the QRDQN Model according to input arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation’s space.action_shape (
Union[int, SequenceType]): Action’s space.encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoderhead_hidden_size (
Optional[int]): Thehidden_sizeto pass toHead.head_layer_num (
int): The num of layers used in the network to compute Q value outputnum_quantiles (
int): Number of quantiles in the prediction distribution.- activation (
Optional[nn.Module]): The type of activation function to use in
MLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()
- activation (
- norm_type (
Optional[str]): The type of normalization to use, see
ding.torch_utils.fc_blockfor more details`
- norm_type (
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
Use observation tensor to predict QRDQN’s output. Parameter updates with QRDQN’s MLPs forward setup.
- Arguments:
- x (
torch.Tensor): The encoded embedding tensor with
(B, N=hidden_size).
- x (
- Returns:
- outputs (
Dict): Run with encoder and head. Return the result prediction dictionary.
- outputs (
- ReturnsKeys:
logit (
torch.Tensor): Logit tensor with same size as inputx.q (
torch.Tensor): Q valye tensor tensor of size(B, N, num_quantiles)tau (
torch.Tensor): tau tensor of size(B, N, 1)
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.logit (
torch.FloatTensor): \((B, M)\), where M is action_shape.tau (
torch.Tensor): \((B, M, 1)\)
- Examples:
>>> model = QRDQN(64, 64) >>> inputs = torch.randn(4, 64) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default num_quantiles : int = 32 >>> assert outputs['q'].shape == torch.Size([4, 64, 32]) >>> assert outputs['tau'].shape == torch.Size([4, 32, 1])
IQN¶
- class ding.model.template.q_learning.IQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, num_quantiles: int = 32, quantile_embedding_size: int = 128, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶
- Overview:
Init the IQN Model according to input arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation space shape.action_shape (
Union[int, SequenceType]): Action space shape.encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoderhead_hidden_size (
Optional[int]): Thehidden_sizeto pass toHead.head_layer_num (
int): The num of layers used in the network to compute Q value outputnum_quantiles (
int): Number of quantiles in the prediction distribution.- activation (
Optional[nn.Module]): The type of activation function to use in
MLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()
- activation (
- norm_type (
Optional[str]): The type of normalization to use, see
ding.torch_utils.fc_blockfor more details.
- norm_type (
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
Use encoded embedding tensor to predict IQN’s output. Parameter updates with IQN’s MLPs forward setup.
- Arguments:
- x (
torch.Tensor): The encoded embedding tensor with
(B, N=hidden_size).
- x (
- Returns:
- outputs (
Dict): Run with encoder and head. Return the result prediction dictionary.
- outputs (
- ReturnsKeys:
logit (
torch.Tensor): Logit tensor with same size as inputx.q (
torch.Tensor): Q valye tensor tensor of size(num_quantiles, N, B)quantiles (
torch.Tensor): quantiles tensor of size(quantile_embedding_size, 1)
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.logit (
torch.FloatTensor): \((B, M)\), where M is action_shapequantiles (
torch.Tensor): \((P, 1)\), where P is quantile_embedding_size.
- Examples:
>>> model = IQN(64, 64) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 64) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default num_quantiles: int = 32 >>> assert outputs['q'].shape == torch.Size([32, 4, 64] >>> # default quantile_embedding_size: int = 128 >>> assert outputs['quantiles'].shape == torch.Size([128, 1])
RainbowDQN¶
- class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶
- Overview:
RainbowDQN network (C51 + Dueling + Noisy Block)
Note
RainbowDQN contains dueling architecture by default
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶
- Overview:
Init the Rainbow Model according to arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation space shape.action_shape (
Union[int, SequenceType]): Action space shape.encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoderhead_hidden_size (
Optional[int]): Thehidden_sizeto pass toHead.head_layer_num (
int): The num of layers used in the network to compute Q value outputactivation (
Optional[nn.Module]): The type of activation function to use inMLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()norm_type (
Optional[str]): The type of normalization to use, seeding.torch_utils.fc_blockfor more details`n_atom (
Optional[int]): Number of atoms in the prediction distribution.
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.
- Arguments:
- x (
torch.Tensor): The encoded embedding tensor with
(B, N=hidden_size).
- x (
- Returns:
- outputs (
Dict): Run
MLPwithRainbowHeadsetups and return the result prediction dictionary.
- outputs (
- ReturnsKeys:
logit (
torch.Tensor): Logit tensor with same size as inputx.distribution (
torch.Tensor): Distribution tensor of size(B, N, n_atom)
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.logit (
torch.FloatTensor): \((B, M)\), where M is action_shape.distribution(
torch.FloatTensor): \((B, M, P)\), where P is n_atom.
- Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 64) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default n_atom: int =51 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])
DRQN¶
- class ding.model.template.q_learning.RainbowDQN(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51)[source]¶
- Overview:
RainbowDQN network (C51 + Dueling + Noisy Block)
Note
RainbowDQN contains dueling architecture by default
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], encoder_hidden_size_list: ding.utils.type_helper.SequenceType = [128, 128, 64], head_hidden_size: Optional[int] = None, head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None, v_min: Optional[float] = - 10, v_max: Optional[float] = 10, n_atom: Optional[int] = 51) → None[source]¶
- Overview:
Init the Rainbow Model according to arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation space shape.action_shape (
Union[int, SequenceType]): Action space shape.encoder_hidden_size_list (
SequenceType): Collection ofhidden_sizeto pass toEncoderhead_hidden_size (
Optional[int]): Thehidden_sizeto pass toHead.head_layer_num (
int): The num of layers used in the network to compute Q value outputactivation (
Optional[nn.Module]): The type of activation function to use inMLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()norm_type (
Optional[str]): The type of normalization to use, seeding.torch_utils.fc_blockfor more details`n_atom (
Optional[int]): Number of atoms in the prediction distribution.
- forward(x: torch.Tensor) → Dict[source]¶
- Overview:
Use observation tensor to predict Rainbow output. Parameter updates with Rainbow’s MLPs forward setup.
- Arguments:
- x (
torch.Tensor): The encoded embedding tensor with
(B, N=hidden_size).
- x (
- Returns:
- outputs (
Dict): Run
MLPwithRainbowHeadsetups and return the result prediction dictionary.
- outputs (
- ReturnsKeys:
logit (
torch.Tensor): Logit tensor with same size as inputx.distribution (
torch.Tensor): Distribution tensor of size(B, N, n_atom)
- Shapes:
x (
torch.Tensor): \((B, N)\), where B is batch size and N is head_hidden_size.logit (
torch.FloatTensor): \((B, M)\), where M is action_shape.distribution(
torch.FloatTensor): \((B, M, P)\), where P is n_atom.
- Examples:
>>> model = RainbowDQN(64, 64) # arguments: 'obs_shape' and 'action_shape' >>> inputs = torch.randn(4, 64) >>> outputs = model(inputs) >>> assert isinstance(outputs, dict) >>> assert outputs['logit'].shape == torch.Size([4, 64]) >>> # default n_atom: int =51 >>> assert outputs['distribution'].shape == torch.Size([4, 64, 51])