How to randomly collect some data sample at the beginning?¶

Guideline¶

For some policies and environments, it is better to collect some data samples at the very beginning, with a completely random policy. So in this section, we will introduce how to write config and env info, and how serial_pipeline randomly collect data.

How to write config and env info. (User View)¶

Config

Specify how many data samples to collect at the beginning:

cartpole_rainbow_config = dict(
    # ...
    policy=dict(
        # ...
        random_collect_size=2000,
    ),
)

Make sure action space is an available attribute in env

Env manager will get env_info from an env_ref, so you must make sure that act_space is available in env’s info method.

def info(self) -> BaseEnvInfo:
    T = EnvElementInfo
    return BaseEnvInfo(
        # Discrete action
        act_space=T(
            (1, ),
            {
                # [min, max)
                'min': 0,
                'max': 2,
                'dtype': int,
            },
        ),
        # ...
        # Continuous action
        act_space=T(
            (3, ),
            {
                'min': 0.,
                'max': 1.,
                'dtype': np.float32,
            },
        ),
    )

How DI-engine randomly collect? (Developer View)¶

We will take DI-engine serial_pipeline as an example, to demonstrate how to use random_policy(ding/ding/policy/policy_factory.py) to collect random data if random_collect_size is set in config.
# Accumulate plenty of data at the beginning of training.
if cfg.policy.get('random_collect_size', 0) > 0:
    # Acquire action space from env.
    action_space = collector_env.env_info().act_space
    # `action_space` is used by random_policy to generate legal actions.
    random_policy = PolicyFactory.get_random_policy(policy.collect_mode, action_space=action_space)
    # Reset collector's policy to random_policy
    collector.reset_policy(random_policy)
    # Randomly collect data and push them into buffer
    new_data = collector.collect(n_sample=cfg.policy.random_collect_size, policy_kwargs=collect_kwargs)
    replay_buffer.push(new_data, cur_collector_envstep=0)
    # Switch collector's policy back to the collect_mode policy
    collector.reset_policy(policy.collect_mode)
DI-engine use different methods to generate different types of actions (discrete and continuous, whether has upper or lower bound, etc.).

You can refer to ding/ding/policy/policy_factory.py, see PolicyFactory’s get_random_policy’s forward for more details.