template.QAC¶
Please Reference ding/model/template/QAC.py for usage
QAC¶
- class ding.model.template.QAC(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶
- Overview:
The QAC model.
- Interfaces:
__init__,forward,compute_actor,compute_critic
- __init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶
- Overview:
Init the QAC Model according to arguments.
- Arguments:
obs_shape (
Union[int, SequenceType]): Observation’s space.action_shape (
Union[int, SequenceType]): Action’s space.actor_head_type (
str): Whether chooseregressionorreparameterization.twin_critic (
bool): Whether include twin critic.actor_head_hidden_size (
Optional[int]): Thehidden_sizeto pass to actor-nn’sHead.- actor_head_layer_num (
int): The num of layers used in the network to compute Q value output for actor’s nn.
- actor_head_layer_num (
critic_head_hidden_size (
Optional[int]): Thehidden_sizeto pass to critic-nn’sHead.- critic_head_layer_num (
int): The num of layers used in the network to compute Q value output for critic’s nn.
- critic_head_layer_num (
- activation (
Optional[nn.Module]): The type of activation function to use in
MLPthe afterlayer_fn, ifNonethen default set tonn.ReLU()
- activation (
- norm_type (
Optional[str]): The type of normalization to use, see
ding.torch_utils.fc_blockfor more details.
- norm_type (
- compute_actor(inputs: torch.Tensor) → Dict[source]¶
- Overview:
Use encoded embedding tensor to predict output. Execute parameter updates with
'compute_actor'mode Use encoded embedding tensor to predict output.- Arguments:
- inputs (
torch.Tensor): The encoded embedding tensor, determined with given
hidden_size, i.e.(B, N=hidden_size).hidden_size = actor_head_hidden_size
- inputs (
mode (
str): Name of the forward mode.
- Returns:
outputs (
Dict): Outputs of forward pass encoder and head.
- ReturnsKeys (either):
action (
torch.Tensor): Continuous action tensor with same size asaction_shape.- logit (
torch.Tensor): Logit tensor encoding
muandsigma, both with same size as inputx.
- logit (
- Shapes:
inputs (
torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds tohidden_sizeaction (
torch.Tensor): \((B, N0)\)logit (
list): 2 elements, mu and sigma, each is the shape of \((B, N0)\).q_value (
torch.FloatTensor): \((B, )\), B is batch size.
- Examples:
>>> # Regression mode >>> model = QAC(64, 64, 'regression') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> assert actor_outputs['action'].shape == torch.Size([4, 64]) >>> # Reparameterization Mode >>> model = QAC(64, 64, 'reparameterization') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> actor_outputs['logit'][0].shape # mu >>> torch.Size([4, 64]) >>> actor_outputs['logit'][1].shape # sigma >>> torch.Size([4, 64])
- compute_critic(inputs: Dict) → Dict[source]¶
- Overview:
Execute parameter updates with
'compute_critic'mode Use encoded embedding tensor to predict output.- Arguments:
obs,actionencoded tensors.mode (
str): Name of the forward mode.
- Returns:
outputs (
Dict): Q-value output.
- ReturnKeys:
q_value (
torch.Tensor): Q value tensor with same size as batch size.
- Shapes:
obs (
torch.Tensor): \((B, N1)\), where B is batch size and N1 isobs_shapeaction (
torch.Tensor): \((B, N2)\), where B is batch size and N2 isaction_shapeq_value (
torch.FloatTensor): \((B, )\), where B is batch size.
- Examples:
>>> inputs = {'obs': torch.randn(4, N), 'action': torch.randn(4, 1)} >>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression') >>> model(inputs, mode='compute_critic')['q_value'] # q value tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)
- forward(inputs: Union[torch.Tensor, Dict], mode: str) → Dict[source]¶
- Overview:
Use bbservation and action tensor to predict output. Parameter updates with QAC’s MLPs forward setup.
- Arguments:
- Forward with
'compute_actor': - inputs (
torch.Tensor): The encoded embedding tensor, determined with given
hidden_size, i.e.(B, N=hidden_size). Whetheractor_head_hidden_sizeorcritic_head_hidden_sizedepend onmode.
- inputs (
- Forward with
'compute_critic', inputs (Dict) Necessary Keys: obs,actionencoded tensors.
mode (
str): Name of the forward mode.
- Forward with
- Returns:
outputs (
Dict): Outputs of network forward.- Forward with
'compute_actor', Necessary Keys (either): action (
torch.Tensor): Action tensor with same size as inputx.- logit (
torch.Tensor): Logit tensor encoding
muandsigma, both with same size as inputx.
- logit (
- Forward with
'compute_critic', Necessary Keys: q_value (
torch.Tensor): Q value tensor with same size as batch size.
- Forward with
- Actor Shapes:
inputs (
torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds tohidden_sizeaction (
torch.Tensor): \((B, N0)\)q_value (
torch.FloatTensor): \((B, )\), where B is batch size.
- Critic Shapes:
obs (
torch.Tensor): \((B, N1)\), where B is batch size and N1 isobs_shapeaction (
torch.Tensor): \((B, N2)\), where B is batch size and N2 is``action_shape``logit (
torch.FloatTensor): \((B, N2)\), where B is batch size and N3 isaction_shape
- Actor Examples:
>>> # Regression mode >>> model = QAC(64, 64, 'regression') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> assert actor_outputs['action'].shape == torch.Size([4, 64]) >>> # Reparameterization Mode >>> model = QAC(64, 64, 'reparameterization') >>> inputs = torch.randn(4, 64) >>> actor_outputs = model(inputs,'compute_actor') >>> actor_outputs['logit'][0].shape # mu >>> torch.Size([4, 64]) >>> actor_outputs['logit'][1].shape # sigma >>> torch.Size([4, 64])
- Critic Examples:
>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)} >>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression') >>> model(inputs, mode='compute_critic')['q_value'] # q value tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)