rl_utils.a2c

a2c

a2c_error

Overview:

Implementation of A2C(Advantage Actor-Critic) (arXiv:1602.01783)

Arguments:
  • data (namedtuple): a2c input data with fieids shown in a2c_data

Returns:
  • a2c_loss (namedtuple): the a2c loss item, all of them are the differentiable 0-dim tensor

Shapes:
  • logit (torch.FloatTensor): \((B, N)\), where B is batch size and N is action dim

  • action (torch.LongTensor): \((B, )\)

  • value (torch.FloatTensor): \((B, )\)

  • adv (torch.FloatTensor): \((B, )\)

  • return (torch.FloatTensor): \((B, )\)

  • weight (torch.FloatTensor or None): \((B, )\)

  • policy_loss (torch.FloatTensor): \(()\), 0-dim tensor

  • value_loss (torch.FloatTensor): \(()\)

  • entropy_loss (torch.FloatTensor): \(()\)