How to randomly collect some data sample at the beginning?¶
Guideline¶
For some policies and environments, it is better to collect some data samples at the very beginning, with a completely random policy. So in this section, we will introduce how to write config and env info, and how serial_pipeline randomly collect data.
How to write config and env info. (User View)¶
Config
Specify how many data samples to collect at the beginning:
cartpole_rainbow_config = dict( # ... policy=dict( # ... random_collect_size=2000, ), )
Make sure action space is an available attribute in env
Env manager will get env_info from an env_ref, so you must make sure that act_space is available in env’s
infomethod.def info(self) -> BaseEnvInfo: T = EnvElementInfo return BaseEnvInfo( # Discrete action act_space=T( (1, ), { # [min, max) 'min': 0, 'max': 2, 'dtype': int, }, ), # ... # Continuous action act_space=T( (3, ), { 'min': 0., 'max': 1., 'dtype': np.float32, }, ), )
How DI-engine randomly collect? (Developer View)¶
We will take DI-engine
serial_pipelineas an example, to demonstrate how to use random_policy(ding/ding/policy/policy_factory.py) to collect random data ifrandom_collect_sizeis set in config.# Accumulate plenty of data at the beginning of training. if cfg.policy.get('random_collect_size', 0) > 0: # Acquire action space from env. action_space = collector_env.env_info().act_space # `action_space` is used by random_policy to generate legal actions. random_policy = PolicyFactory.get_random_policy(policy.collect_mode, action_space=action_space) # Reset collector's policy to random_policy collector.reset_policy(random_policy) # Randomly collect data and push them into buffer new_data = collector.collect(n_sample=cfg.policy.random_collect_size, policy_kwargs=collect_kwargs) replay_buffer.push(new_data, cur_collector_envstep=0) # Switch collector's policy back to the collect_mode policy collector.reset_policy(policy.collect_mode)DI-engine use different methods to generate different types of actions (discrete and continuous, whether has upper or lower bound, etc.).
You can refer to
ding/ding/policy/policy_factory.py, seePolicyFactory’sget_random_policy’sforwardfor more details.