rl_utils.a2c¶
a2c¶
a2c_error¶
- Overview:
Implementation of A2C(Advantage Actor-Critic) (arXiv:1602.01783)
- Arguments:
data (
namedtuple): a2c input data with fieids shown ina2c_data
- Returns:
a2c_loss (
namedtuple): the a2c loss item, all of them are the differentiable 0-dim tensor
- Shapes:
logit (
torch.FloatTensor): \((B, N)\), where B is batch size and N is action dimaction (
torch.LongTensor): \((B, )\)value (
torch.FloatTensor): \((B, )\)adv (
torch.FloatTensor): \((B, )\)return (
torch.FloatTensor): \((B, )\)weight (
torch.FloatTensororNone): \((B, )\)policy_loss (
torch.FloatTensor): \(()\), 0-dim tensorvalue_loss (
torch.FloatTensor): \(()\)entropy_loss (
torch.FloatTensor): \(()\)