libcity.model.road_representation.GAT¶
-
class
libcity.model.road_representation.GAT.
GAT
(config, data_feature)[source]¶ Bases:
libcity.model.abstract_traffic_state_model.AbstractTrafficStateModel
-
calculate_loss
(batch)[source]¶ - Parameters
batch – dict, need key ‘node_features’, ‘node_labels’, ‘mask’
Returns:
-
forward
(batch)[source]¶ 自回归任务 :param batch: dict, need key ‘node_features’ contains tensor shape=(N, feature_dim)
- Returns
N, feature_dim
- Return type
torch.tensor
-
training
: bool¶
-
-
class
libcity.model.road_representation.GAT.
GATLayer
(num_in_features, num_out_features, num_of_heads, device, layer_type, concat=True, activation=ELU(alpha=1.0), dropout_prob=0.6, add_skip_connection=True, bias=True, log_attention_weights=False)[source]¶ Bases:
torch.nn.modules.module.Module
Base class for all implementations as there is much code that would otherwise be copy/pasted.
-
head_dim
= 1¶
-
init_params
(layer_type)[source]¶ - The reason we’re using Glorot (aka Xavier uniform) initialization is because it’s a default TF initialization:
https://stackoverflow.com/questions/37350131/what-is-the-default-variable-initializer-in-tensorflow
The original repo was developed in TensorFlow (TF) and they used the default initialization. Feel free to experiment - there may be better initializations depending on your problem.
-
training
: bool¶
-
-
class
libcity.model.road_representation.GAT.
GATLayerImp1
(num_in_features, num_out_features, num_of_heads, concat=True, activation=ELU(alpha=1.0), dropout_prob=0.6, add_skip_connection=True, bias=True, log_attention_weights=False)[source]¶ Bases:
libcity.model.road_representation.GAT.GATLayer
This implementation is only suitable for a transductive setting. It would be fairly easy to make it work in the inductive setting as well but the purpose of this layer is more educational since it’s way less efficient than implementation 3.
-
forward
(data)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
-
class
libcity.model.road_representation.GAT.
GATLayerImp2
(num_in_features, num_out_features, num_of_heads, concat=True, activation=ELU(alpha=1.0), dropout_prob=0.6, add_skip_connection=True, bias=True, log_attention_weights=False)[source]¶ Bases:
libcity.model.road_representation.GAT.GATLayer
Implementation #2 was inspired by the official GAT implementation: https://github.com/PetarV-/GAT It’s conceptually simpler than implementation #3 but computationally much less efficient. Note: this is the naive implementation not the sparse one and it’s only suitable for a transductive setting. It would be fairly easy to make it work in the inductive setting as well but the purpose of this layer is more educational since it’s way less efficient than implementation 3.
-
forward
(data)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
-
class
libcity.model.road_representation.GAT.
GATLayerImp3
(num_in_features, num_out_features, num_of_heads, device, concat=True, activation=ELU(alpha=1.0), dropout_prob=0.6, add_skip_connection=True, bias=True, log_attention_weights=False)[source]¶ Bases:
libcity.model.road_representation.GAT.GATLayer
Implementation #3 was inspired by PyTorch Geometric: https://github.com/rusty1s/pytorch_geometric But, it’s hopefully much more readable! (and of similar performance) It’s suitable for both transductive and inductive settings. In the inductive setting we just merge the graphs into a single graph with multiple components and this layer is agnostic to that fact! <3
-
aggregate_neighbors
(nodes_features_proj_lifted_weighted, edge_index, in_nodes_features, num_of_nodes)[source]¶
-
forward
(data)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
head_dim
= 1¶
-
lift
(scores_source, scores_target, nodes_features_matrix_proj, edge_index)[source]¶ Lifts i.e. duplicates certain vectors depending on the edge index. One of the tensor dims goes from N -> E (that’s where the “lift” comes from).
-
neighborhood_aware_softmax
(scores_per_edge, trg_index, num_of_nodes)[source]¶ As the fn name suggest it does softmax over the neighborhoods. Example: say we have 5 nodes in a graph. Two of them 1, 2 are connected to node 3. If we want to calculate the representation for node 3 we should take into account feature vectors of 1, 2 and 3 itself. Since we have scores for edges 1-3, 2-3 and 3-3 in scores_per_edge variable, this function will calculate attention scores like this: 1-3/(1-3+2-3+3-3) (where 1-3 is overloaded notation it represents the edge 1-3 and it’s (exp) score) and similarly for 2-3 and 3-3
i.e. for this neighborhood we don’t care about other edge scores that include nodes 4 and 5.
Note: Subtracting the max value from logits doesn’t change the end result but it improves the numerical stability and it’s a fairly common “trick” used in pretty much every deep learning framework. Check out this link for more details: https://stats.stackexchange.com/questions/338285/how-does-the-subtraction-of-the-logit-maximum-improve-learning
-
nodes_dim
= 0¶
-
src_nodes_dim
= 0¶
-
training
: bool¶
-
trg_nodes_dim
= 1¶
-