rl_utils.a2c¶

a2c¶

Overview:

Implementation of A2C(Advantage Actor-Critic) (arXiv:1602.01783)

Arguments:

Returns:

a2c_loss (namedtuple): the a2c loss item, all of them are the differentiable 0-dim tensor

Shapes:

logit (torch.FloatTensor): \((B, N)\), where B is batch size and N is action dim
action (torch.LongTensor): \((B, )\)
value (torch.FloatTensor): \((B, )\)
adv (torch.FloatTensor): \((B, )\)
return (torch.FloatTensor): \((B, )\)
weight (torch.FloatTensor or None): \((B, )\)
policy_loss (torch.FloatTensor): \(()\), 0-dim tensor
value_loss (torch.FloatTensor): \(()\)
entropy_loss (torch.FloatTensor): \(()\)