template.QAC¶

Please Reference ding/model/template/QAC.py for usage

QAC¶

class ding.model.template.QAC(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]¶

Overview:: The QAC model.
Interfaces:: __init__, forward, compute_actor, compute_critic

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None) → None[source]¶

Overview:

Init the QAC Model according to arguments.

Arguments:

obs_shape (Union[int, SequenceType]): Observation’s space.
action_shape (Union[int, SequenceType]): Action’s space.
actor_head_type (str): Whether choose regression or reparameterization.
twin_critic (bool): Whether include twin critic.
actor_head_hidden_size (Optional[int]): The hidden_size to pass to actor-nn’s Head.
actor_head_layer_num (int):
The num of layers used in the network to compute Q value output for actor’s nn.
critic_head_hidden_size (Optional[int]): The hidden_size to pass to critic-nn’s Head.
critic_head_layer_num (int):
The num of layers used in the network to compute Q value output for critic’s nn.
activation (Optional[nn.Module]):
The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()
norm_type (Optional[str]):
The type of normalization to use, see ding.torch_utils.fc_block for more details.

compute_actor(inputs: torch.Tensor) → Dict[source]¶

Overview:

Use encoded embedding tensor to predict output. Execute parameter updates with 'compute_actor' mode Use encoded embedding tensor to predict output.

Arguments:

inputs (torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). hidden_size = actor_head_hidden_size
mode (str): Name of the forward mode.

Returns:

outputs (Dict): Outputs of forward pass encoder and head.

ReturnsKeys (either):

action (torch.Tensor): Continuous action tensor with same size as action_shape.
logit (torch.Tensor):
Logit tensor encoding mu and sigma, both with same size as input x.

Shapes:

inputs (torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds to hidden_size
action (torch.Tensor): \((B, N0)\)
logit (list): 2 elements, mu and sigma, each is the shape of \((B, N0)\).
q_value (torch.FloatTensor): \((B, )\), B is batch size.

Examples:

>>> # Regression mode
>>> model = QAC(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QAC(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])

compute_critic(inputs: Dict) → Dict[source]¶

Overview:

Execute parameter updates with 'compute_critic' mode Use encoded embedding tensor to predict output.

Arguments:

obs, action encoded tensors.
mode (str): Name of the forward mode.

Returns:

outputs (Dict): Q-value output.

ReturnKeys:

q_value (torch.Tensor): Q value tensor with same size as batch size.

Shapes:

obs (torch.Tensor): \((B, N1)\), where B is batch size and N1 is obs_shape
action (torch.Tensor): \((B, N2)\), where B is batch size and N2 is action_shape
q_value (torch.FloatTensor): \((B, )\), where B is batch size.

Examples:

>>> inputs = {'obs': torch.randn(4, N), 'action': torch.randn(4, 1)}
>>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression')
>>> model(inputs, mode='compute_critic')['q_value'] # q value
tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)

forward(inputs: Union[torch.Tensor, Dict], mode: str) → Dict[source]¶

Overview:

Use bbservation and action tensor to predict output. Parameter updates with QAC’s MLPs forward setup.

Arguments:

Forward with 'compute_actor':

inputs (torch.Tensor):
The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.

Forward with 'compute_critic', inputs (Dict) Necessary Keys:

obs, action encoded tensors.

mode (str): Name of the forward mode.

Returns:

outputs (Dict): Outputs of network forward.
Forward with 'compute_actor', Necessary Keys (either):
action (torch.Tensor): Action tensor with same size as input x.

logit (torch.Tensor):
Logit tensor encoding mu and sigma, both with same size as input x.
Forward with 'compute_critic', Necessary Keys:
q_value (torch.Tensor): Q value tensor with same size as batch size.

Actor Shapes:

inputs (torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds to hidden_size
action (torch.Tensor): \((B, N0)\)
q_value (torch.FloatTensor): \((B, )\), where B is batch size.

Critic Shapes:

obs (torch.Tensor): \((B, N1)\), where B is batch size and N1 is obs_shape
action (torch.Tensor): \((B, N2)\), where B is batch size and N2 is``action_shape``
logit (torch.FloatTensor): \((B, N2)\), where B is batch size and N3 is action_shape

Actor Examples:

>>> # Regression mode
>>> model = QAC(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QAC(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])

Critic Examples:

>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)}
>>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression')
>>> model(inputs, mode='compute_critic')['q_value'] # q value
tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)