worker.learner¶
learner_hook¶
Please Reference ding/worker/learner/learner_hook.py for usage
Hook¶
LearnerHook¶
LoadCkptHook¶
- class ding.worker.learner.learner_hook.LoadCkptHook(*args, ext_args: easydict.EasyDict = {}, **kwargs)[source]¶
- Overview:
Hook to load checkpoint
- Interfaces:
__init__, __call__
- Property:
name, priority, position
SaveCkptHook¶
- class ding.worker.learner.learner_hook.SaveCkptHook(*args, ext_args: easydict.EasyDict = {}, **kwargs)[source]¶
- Overview:
Hook to save checkpoint
- Interfaces:
__init__, __call__
- Property:
name, priority, position
LogShowHook¶
- class ding.worker.learner.learner_hook.LogShowHook(*args, ext_args: easydict.EasyDict = {}, **kwargs)[source]¶
- Overview:
Hook to show log
- Interfaces:
__init__, __call__
- Property:
name, priority, position
LogReduceHook¶
- class ding.worker.learner.learner_hook.LogReduceHook(*args, ext_args: easydict.EasyDict = {}, **kwargs)[source]¶
- Overview:
Hook to reduce the distributed(multi-gpu) logs
- Interfaces:
__init__, __call__
- Property:
name, priority, position
register_learner_hook¶
- Overview:
Add a new LearnerHook class to hook_mapping, so you can build one instance with build_learner_hook_by_cfg.
- Arguments:
name (
str): name of the register hookhook_type (
type): the register hook_type you implemented that realize LearnerHook
- Examples:
>>> class HookToRegister(LearnerHook): >>> def __init__(*args, **kargs): >>> ... >>> ... >>> def __call__(*args, **kargs): >>> ... >>> ... >>> ... >>> register_learner_hook('name_of_hook', HookToRegister) >>> ... >>> hooks = build_learner_hook_by_cfg(cfg)
build_learner_hook_by_cfg¶
- Overview:
Build the learner hooks in hook_mapping by config. This function is often used to initialize
hooksaccording to cfg, while add_learner_hook() is often used to add an existing LearnerHook to hooks.- Arguments:
cfg (
EasyDict): Config dict. Should be like {‘hook’: xxx}.
- Returns:
hooks (
Dict[str, List[Hook]): Keys should be in [‘before_run’, ‘after_run’, ‘before_iter’, ‘after_iter’], each value should be a list containing all hooks in this position.
- Note:
Lower value means higher priority.
merge_hooks¶
- Overview:
Merge two hooks dict, which have the same keys, and each value is sorted by hook priority with stable method.
- Arguments:
hooks1 (
Dict[str, List[Hook]): hooks1 to be merged.hooks2 (
Dict[str, List[Hook]): hooks2 to be merged.
- Returns:
new_hooks (
Dict[str, List[Hook]): New merged hooks dict.
- Note:
This merge function uses stable sort method without disturbing the same priority hook.
base_learner¶
Please Reference ding/worker/learner/base_learner.py for usage
BaseLearner¶
- class ding.worker.learner.base_learner.BaseLearner(cfg: easydict.EasyDict, policy: collections.namedtuple = None, tb_logger: Optional[SummaryWriter] = None, dist_info: Tuple[int, int] = None)[source]¶
- Overview:
Base class for model learning.
- Interface:
train, call_hook, register_hook, save_checkpoint, start, setup_dataloader, close
- Property:
learn_info, priority_info, last_iter, name, rank, world_size, policy monitor, log_buffer, logger, tb_logger
- __init__(cfg: easydict.EasyDict, policy: collections.namedtuple = None, tb_logger: Optional[SummaryWriter] = None, dist_info: Tuple[int, int] = None) → None[source]¶
- Overview:
Init method. Load config and use
self._cfgto build common learner components, e.g. logger, hooks. Policy is not initialized here, but set afterwards through policy setter.- Arguments:
cfg (
EasyDict): Learner config, you can view cfg for ref.rank (
int): Process number in multi-gpu training
- Notes:
If you want to debug in sync CUDA mode, please add the following code at the beginning of
__init__.os.environ['CUDA_LAUNCH_BLOCKING'] = "1" # for debug async CUDA
- _setup_hook() → None[source]¶
- Overview:
Setup hook for base_learner. Hook is the way to implement some functions at specific time point in base_learner. You can refer to
learner_hook.py.
- _setup_wrapper() → None[source]¶
- Overview:
Use
_time_wrapperto gettrain_time.- Note:
data_timeis wrapped insetup_dataloader.
- call_hook(name: str) → None[source]¶
- Overview:
Call the corresponding hook plugins according to position name.
- Arguments:
name (
str): Hooks in which position to call, should be in [‘before_run’, ‘after_run’, ‘before_iter’, ‘after_iter’].
- close() → None[source]¶
- Overview:
[Only Used In Parallel Mode] Close the related resources, e.g. dataloader, tensorboard logger, etc.
- register_hook(hook: ding.worker.learner.learner_hook.LearnerHook) → None[source]¶
- Overview:
Add a new learner hook.
- Arguments:
hook (
LearnerHook): The hook to be addedr.
- save_checkpoint(ckpt_name: Optional[str] = None) → None[source]¶
- Overview:
Directly call
save_ckpt_after_runhook to save checkpoint.- Note:
Must guarantee that “save_ckpt_after_run” is registered in “after_run” hook. This method is called in:
auto_checkpoint(torch_utils/checkpoint_helper.py), which is designed for saving checkpoint whenever an exception raises.serial_pipeline(entry/serial_entry.py). Used to save checkpoint when reaching new highest evaluation reward.
- setup_dataloader() → None[source]¶
- Overview:
[Only Used In Parallel Mode] Setup learner’s dataloader.
Note
Only in parallel mode will we use attributes
get_dataand_dataloaderto get data from file system; Instead, in serial version, we can fetch data from memory directly.In parallel mode,
get_datais set byLearnerCommHelper, and should be callable. Users don’t need to know the related details if not necessary.
- train(data: dict, envstep: int = - 1) → None[source]¶
- Overview:
Given training data, implement network update for one iteration and update related variables. Learner’s API for serial entry. Also called in
startfor each iteration’s training.- Arguments:
data (
dict): Training data which is retrieved from repaly buffer.
Note
_policymust be set before calling this method._policy.forwardmethod contains: forward, backward, grad sync(if in multi-gpu mode) and parameter update.before_iterandafter_iterhooks are called at the beginning and ending.
create_learner¶
- Overview:
Given the key(learner_name), create a new learner instance if in learner_mapping’s values, or raise an KeyError. In other words, a derived learner must first register, then can call
create_learnerto get the instance.- Arguments:
cfg (
EasyDict): Learner config. Necessary keys: [learner.import_module, learner.learner_type].
- Returns:
learner (
BaseLearner): The created new learner, should be an instance of one of learner_mapping’s values.