Pdeil¶
pdeil_irl_model¶
PdeilRewardModel¶
- class ding.reward_model.pdeil_irl_model.PdeilRewardModel(cfg: dict, device, tb_logger: SummaryWriter)[source]¶
- Overview:
The Pdeil reward model class
- Interface:
estimate,train,load_expert_data,collect_data,clear_date,__init__,_train,_batch_mn_pdf
- __init__(cfg: dict, device, tb_logger: SummaryWriter) → None[source]¶
- Overview:
Initialize
self.Seehelp(type(self))for accurate signature. Some rules in naming the attributes ofself.:e_: expert values_sigma_: standard division valuesp_: current policy values_s_: states_a_: actions
- Arguments:
cfg (
Dict): Training configdevice (
str): Device usage, i.e. “cpu” or “cuda”tb_logger (
str): Logger, defaultly set as ‘SummaryWriter’ for model summary
- clear_data()[source]¶
- Overview:
Clearing training data. This is a side effect function which clears the data attribute in
self
- collect_data(item: list)[source]¶
- Overview:
Collecting training data by iterating data items in the input list
- Arguments:
data (
list): Raw training data (e.g. some form of states, actions, obs, etc)
- Effects:
This is a side effect function which updates the data attribute in
selfby iterating data items in the input data items’ list
- estimate(data: list) → None[source]¶
- Overview:
Estimate reward by rewriting the reward keys.
- Arguments:
data (
list): the list of data used for estimation, with at leastobsandactionkeys.
- Effects:
This is a side effect function which updates the reward values in place.