template.QAC

Please Reference ding/model/template/QAC.py for usage

QAC

class ding.model.template.QAC(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)[source]
Overview:

The QAC model.

Interfaces:

__init__, forward, compute_actor, compute_critic

__init__(obs_shape: Union[int, ding.utils.type_helper.SequenceType], action_shape: Union[int, ding.utils.type_helper.SequenceType], actor_head_type: str, twin_critic: bool = False, actor_head_hidden_size: int = 64, actor_head_layer_num: int = 1, critic_head_hidden_size: int = 64, critic_head_layer_num: int = 1, activation: Optional[torch.nn.modules.module.Module] = ReLU(), norm_type: Optional[str] = None)None[source]
Overview:

Init the QAC Model according to arguments.

Arguments:
  • obs_shape (Union[int, SequenceType]): Observation’s space.

  • action_shape (Union[int, SequenceType]): Action’s space.

  • actor_head_type (str): Whether choose regression or reparameterization.

  • twin_critic (bool): Whether include twin critic.

  • actor_head_hidden_size (Optional[int]): The hidden_size to pass to actor-nn’s Head.

  • actor_head_layer_num (int):

    The num of layers used in the network to compute Q value output for actor’s nn.

  • critic_head_hidden_size (Optional[int]): The hidden_size to pass to critic-nn’s Head.

  • critic_head_layer_num (int):

    The num of layers used in the network to compute Q value output for critic’s nn.

  • activation (Optional[nn.Module]):

    The type of activation function to use in MLP the after layer_fn, if None then default set to nn.ReLU()

  • norm_type (Optional[str]):

    The type of normalization to use, see ding.torch_utils.fc_block for more details.

compute_actor(inputs: torch.Tensor)Dict[source]
Overview:

Use encoded embedding tensor to predict output. Execute parameter updates with 'compute_actor' mode Use encoded embedding tensor to predict output.

Arguments:
  • inputs (torch.Tensor):

    The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). hidden_size = actor_head_hidden_size

  • mode (str): Name of the forward mode.

Returns:
  • outputs (Dict): Outputs of forward pass encoder and head.

ReturnsKeys (either):
  • action (torch.Tensor): Continuous action tensor with same size as action_shape.

  • logit (torch.Tensor):

    Logit tensor encoding mu and sigma, both with same size as input x.

Shapes:
  • inputs (torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds to hidden_size

  • action (torch.Tensor): \((B, N0)\)

  • logit (list): 2 elements, mu and sigma, each is the shape of \((B, N0)\).

  • q_value (torch.FloatTensor): \((B, )\), B is batch size.

Examples:
>>> # Regression mode
>>> model = QAC(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QAC(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])
compute_critic(inputs: Dict)Dict[source]
Overview:

Execute parameter updates with 'compute_critic' mode Use encoded embedding tensor to predict output.

Arguments:
  • obs, action encoded tensors.

  • mode (str): Name of the forward mode.

Returns:
  • outputs (Dict): Q-value output.

ReturnKeys:
  • q_value (torch.Tensor): Q value tensor with same size as batch size.

Shapes:
  • obs (torch.Tensor): \((B, N1)\), where B is batch size and N1 is obs_shape

  • action (torch.Tensor): \((B, N2)\), where B is batch size and N2 is action_shape

  • q_value (torch.FloatTensor): \((B, )\), where B is batch size.

Examples:
>>> inputs = {'obs': torch.randn(4, N), 'action': torch.randn(4, 1)}
>>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression')
>>> model(inputs, mode='compute_critic')['q_value'] # q value
tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)
forward(inputs: Union[torch.Tensor, Dict], mode: str)Dict[source]
Overview:

Use bbservation and action tensor to predict output. Parameter updates with QAC’s MLPs forward setup.

Arguments:
Forward with 'compute_actor':
  • inputs (torch.Tensor):

    The encoded embedding tensor, determined with given hidden_size, i.e. (B, N=hidden_size). Whether actor_head_hidden_size or critic_head_hidden_size depend on mode.

Forward with 'compute_critic', inputs (Dict) Necessary Keys:
  • obs, action encoded tensors.

  • mode (str): Name of the forward mode.

Returns:
  • outputs (Dict): Outputs of network forward.

    Forward with 'compute_actor', Necessary Keys (either):
    • action (torch.Tensor): Action tensor with same size as input x.

    • logit (torch.Tensor):

      Logit tensor encoding mu and sigma, both with same size as input x.

    Forward with 'compute_critic', Necessary Keys:
    • q_value (torch.Tensor): Q value tensor with same size as batch size.

Actor Shapes:
  • inputs (torch.Tensor): \((B, N0)\), B is batch size and N0 corresponds to hidden_size

  • action (torch.Tensor): \((B, N0)\)

  • q_value (torch.FloatTensor): \((B, )\), where B is batch size.

Critic Shapes:
  • obs (torch.Tensor): \((B, N1)\), where B is batch size and N1 is obs_shape

  • action (torch.Tensor): \((B, N2)\), where B is batch size and N2 is``action_shape``

  • logit (torch.FloatTensor): \((B, N2)\), where B is batch size and N3 is action_shape

Actor Examples:
>>> # Regression mode
>>> model = QAC(64, 64, 'regression')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> assert actor_outputs['action'].shape == torch.Size([4, 64])
>>> # Reparameterization Mode
>>> model = QAC(64, 64, 'reparameterization')
>>> inputs = torch.randn(4, 64)
>>> actor_outputs = model(inputs,'compute_actor')
>>> actor_outputs['logit'][0].shape # mu
>>> torch.Size([4, 64])
>>> actor_outputs['logit'][1].shape # sigma
>>> torch.Size([4, 64])
Critic Examples:
>>> inputs = {'obs': torch.randn(4,N), 'action': torch.randn(4,1)}
>>> model = QAC(obs_shape=(N, ),action_shape=1,actor_head_type='regression')
>>> model(inputs, mode='compute_critic')['q_value'] # q value
tensor([0.0773, 0.1639, 0.0917, 0.0370], grad_fn=<SqueezeBackward1>)