Red¶
red_irl_model¶
RedRewardModel¶
- class ding.reward_model.red_irl_model.RedRewardModel(config: Dict, device: str, tb_logger: SummaryWriter)[source]¶
- Overview:
The implement of reward model in RED (https://arxiv.org/abs/1905.06750)
- Interface:
estimate,train,load_expert_data,collect_data,clear_date,__init__,_train- Properties:
online_net (:obj: SENet): The reward model, in default initialized once as the training begins.
- __init__(config: Dict, device: str, tb_logger: SummaryWriter) → None[source]¶
- Overview:
Initialize
self.Seehelp(type(self))for accurate signature.- Arguments:
cfg (
Dict): Training configdevice (
str): Device usage, i.e. “cpu” or “cuda”tb_logger (
str): Logger, defaultly set as ‘SummaryWriter’ for model summary
- clear_data()[source]¶
- Overview:
Collecting clearing data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in clear_data method
- collect_data(data) → None[source]¶
- Overview:
Collecting training data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in collect_data method
- estimate(data: list) → None[source]¶
- Overview:
Estimate reward by rewriting the reward key
- Arguments:
data (
list): the list of data used for estimation, with at leastobsandactionkeys.
- Effects:
This is a side effect function which updates the reward values in place.