template.qmix¶
Please Reference ding/model/template/qmix.py for usage
Mixer¶
- class ding.model.template.qmix.Mixer(agent_num, state_dim, mixing_embed_dim, hypernet_embed=64)[source]¶
- Overview:
mixer network in QMIX, which mix up the independent q_value of each agent to a total q_value
- Interface:
__init__, forward
- __init__(agent_num, state_dim, mixing_embed_dim, hypernet_embed=64)[source]¶
- Overview:
initialize pymarl mixer network
- Arguments:
agent_num (
int): the number of agentstate_dim(
int): the dimension of global observation statemixing_embed_dim (
int): the dimension of mixing state emdeddinghypernet_embed (
int): the dimension of hypernet emdedding, default to 64
- forward(agent_qs, states)[source]¶
- Overview:
forward computation graph of pymarl mixer network
- Arguments:
agent_qs (
torch.FloatTensor): the independent q_value of each agentstates (
torch.FloatTensor): the emdedding vector of global state
- Returns:
q_tot (
torch.FloatTensor): the total mixed q_value
- Shapes:
agent_qs (
torch.FloatTensor): \((B, N)\), where B is batch size and N is agent_numstates (
torch.FloatTensor): \((B, M)\), where M is embedding_sizeq_tot (
torch.FloatTensor): \((B, )\)
QMix¶
- class ding.model.template.qmix.QMix(agent_num: int, obs_shape: int, global_obs_shape: int, action_shape: int, hidden_size_list: list, mixer: bool = True, lstm_type: str = 'gru', dueling: bool = False)[source]¶
- Overview:
QMIX network
- Interface:
__init__, forward, _setup_global_encoder
- __init__(agent_num: int, obs_shape: int, global_obs_shape: int, action_shape: int, hidden_size_list: list, mixer: bool = True, lstm_type: str = 'gru', dueling: bool = False) → None[source]¶
- Overview:
initialize Qmix network
- Arguments:
agent_num (
int): the number of agentobs_shape (
int): the dimension of each agent’s observation stateglobal_obs_shape (
int): the dimension of global observation stateaction_shape (
int): the dimension of action shapehidden_size_list (
list): the list of hidden sizemixer (
bool): use mixer net or not, default to Trueuse_gru (
bool): use lstm type or not, default to Falseuse_pmixer (
bool): use pymarl mixer net or not, default to False. When mixer is False, we can’t use pymarl mixer net or normal mixer net
- _setup_global_encoder(global_obs_shape: int, embedding_size: int) → torch.nn.modules.module.Module[source]¶
- Overview:
Used to encoder global observation.
- Arguments:
global_obs_shape (
int): the dimension of global observation stateembedding_size (
int): the dimension of state emdedding
- Return:
outputs (
torch.nn.Module): Global observation encoding network
- forward(data: dict, single_step: bool = True) → dict[source]¶
- Overview:
forward computation graph of qmix network
- Arguments:
- data (
dict): input data dict with keys [‘obs’, ‘prev_state’, ‘action’] agent_state (
torch.Tensor): each agent local state(obs)global_state (
torch.Tensor): global state(obs)prev_state (
list): previous rnn stateaction (
torch.Tensoror None): if action is None, use argmax q_value index as action to calculateagent_q_act
- data (
single_step (
bool): whether single_step forward, if so, add timestep dim before forward and remove it after forward
- Returns:
ret (
dict): output data dict with keys [total_q,logit,next_state]total_q (
torch.Tensor): total q_value, which is the result of mixer networkagent_q (
torch.Tensor): each agent q_valuenext_state (
list): next rnn state
- Shapes:
agent_state (
torch.Tensor): \((T, B, A, N)\), where T is timestep, B is batch_size A is agent_num, N is obs_shapeglobal_state (
torch.Tensor): \((T, B, M)\), where M is global_obs_shapeprev_state (
list): math:(B, A), a list of length B, and each element is a list of length Aaction (
torch.Tensor): \((T, B, A)\)total_q (
torch.Tensor): \((T, B)\)agent_q (
torch.Tensor): \((T, B, A, P)\), where P is action_shapenext_state (
list): math:(B, A), a list of length B, and each element is a list of length A
CollaQMultiHeadAttention¶
- class ding.model.template.qmix.CollaQMultiHeadAttention(n_head: int, d_model_q: int, d_model_v: int, d_k: int, d_v: int, d_out: int, dropout: float = 0.0)[source]¶
- Overview:
The head of collaq attention module.
- Interface:
__init__, forward
- __init__(n_head: int, d_model_q: int, d_model_v: int, d_k: int, d_v: int, d_out: int, dropout: float = 0.0)[source]¶
- Overview:
initialize the head of collaq attention module
- Arguments:
n_head (
int): the num of headd_model_q (
int): the size of input qd_model_v (
int): the size of input vd_k (
int): the size of k, used by Scaled Dot Product Attentiond_v (
int): the size of v, used by Scaled Dot Product Attentiond_out (
int): the size of output q
- forward(q, k, v, mask=None)[source]¶
- Overview:
forward computation graph of collaQ multi head attention net.
- Arguments:
q (
torch.nn.Sequential): the transformer information qk (
torch.nn.Sequential): the transformer information kv (
torch.nn.Sequential): the transformer information v
- Output:
q (
torch.nn.Sequential): the transformer output q
CollaQSMACAttentionModule¶
- class ding.model.template.qmix.CollaQSMACAttentionModule(q_dim: int, v_dim: int, self_feature_range: List[int], ally_feature_range: List[int], attention_size: int)[source]¶
- Overview:
Collaq attention module. Used to get agent’s attention observation. It includes agent’s observation and agent’s part of the observation information of the agent’s concerned allies
- Interface:
__init__, _cut_obs, forward
- __init__(q_dim: int, v_dim: int, self_feature_range: List[int], ally_feature_range: List[int], attention_size: int)[source]¶
- Overview:
initialize collaq attention module
- Arguments:
q_dim (
int): the dimension of transformer output qv_dim (
int): the dimension of transformer output vself_features (
torch.Tensor): output self agent’s attention observationally_features (
torch.Tensor): output ally agent’s attention observationattention_size (
int): the size of attention net layer
- _cut_obs(obs: torch.Tensor)[source]¶
- Overview:
cut the observed information into self’s observation and allay’s observation
- Arguments:
obs (
torch.Tensor): input each agent’s observation
- Return:
self_features (
torch.Tensor): output self agent’s attention observationally_features (
torch.Tensor): output ally agent’s attention observation
CollaQ¶
- class ding.model.template.qmix.CollaQ(agent_num: int, obs_shape: int, alone_obs_shape: int, global_obs_shape: int, action_shape: int, hidden_size_list: list, attention: bool = False, self_feature_range: Optional[List[int]] = None, ally_feature_range: Optional[List[int]] = None, attention_size: int = 32, mixer: bool = True, lstm_type: str = 'gru', dueling: bool = False, use_pmixer: bool = False)[source]¶
- Overview:
CollaQ network
- Interface:
__init__, forward, _setup_global_encoder
- __init__(agent_num: int, obs_shape: int, alone_obs_shape: int, global_obs_shape: int, action_shape: int, hidden_size_list: list, attention: bool = False, self_feature_range: Optional[List[int]] = None, ally_feature_range: Optional[List[int]] = None, attention_size: int = 32, mixer: bool = True, lstm_type: str = 'gru', dueling: bool = False, use_pmixer: bool = False) → None[source]¶
- Overview:
initialize Collaq network
- Arguments:
agent_num (
int): the number of agentobs_shape (
int): the dimension of each agent’s observation statealone_obs_shape (
int): the dimension of each agent’s observation state without other agentsglobal_obs_shape (
int): the dimension of global observation stateaction_shape (
int): the dimension of action shapehidden_size_list (
list): the list of hidden sizeattention (
bool): use attention module or not, default to Falseself_feature_range (
Union[List[int], None]): the agent’s feature rangeally_feature_range (
Union[List[int], None]): the agent ally’s feature rangeattention_size (
int): the size of attention net layermixer (
bool): use mixer net or not, default to True
- _setup_global_encoder(global_obs_shape: int, embedding_size: int) → torch.nn.modules.module.Module[source]¶
- Overview:
Used to encoder global observation.
- Arguments:
global_obs_shape (
int): the dimension of global observation stateembedding_size (
int): the dimension of state emdedding
- Return:
outputs (
torch.nn.Module): Global observation encoding network
- forward(data: dict, single_step: bool = True) → dict[source]¶
- Overview:
forward computation graph of collaQ network
- Arguments:
- data (
dict): input data dict with keys [‘obs’, ‘prev_state’, ‘action’] agent_state (
torch.Tensor): each agent local state(obs)agent_alone_state (
torch.Tensor): each agent’s local state alone, in smac setting is without ally feature(obs_along)global_state (
torch.Tensor): global state(obs)prev_state (
list): previous rnn state, should include 3 parts: one hidden state of q_network, and two hidden state if q_alone_network for obs and obs_alone inputsaction (
torch.Tensoror None): if action is None, use argmax q_value index as action to calculateagent_q_act
- data (
single_step (
bool): whether single_step forward, if so, add timestep dim before forward and remove it after forward
- Return:
- ret (
dict): output data dict with keys [‘total_q’, ‘logit’, ‘next_state’] total_q (
torch.Tensor): total q_value, which is the result of mixer networkagent_q (
torch.Tensor): each agent q_valuenext_state (
list): next rnn state
- ret (
- Shapes:
agent_state (
torch.Tensor): \((T, B, A, N)\), where T is timestep, B is batch_size A is agent_num, N is obs_shapeglobal_state (
torch.Tensor): \((T, B, M)\), where M is global_obs_shapeprev_state (
list): math:(B, A), a list of length B, and each element is a list of length Aaction (
torch.Tensor): \((T, B, A)\)total_q (
torch.Tensor): \((T, B)\)agent_q (
torch.Tensor): \((T, B, A, P)\), where P is action_shapenext_state (
list): math:(B, A), a list of length B, and each element is a list of length A