Graph Attention Networks, introduced by Petar Velickovic and colleagues including Yoshua Bengio in a paper submitted to arXiv on October 30, 2017, applied the self-attention idea to graph-structured data. Earlier graph convolutional models combined a node’s neighbors using fixed weights derived from the graph structure. GAT instead lets the model learn how much attention each neighbor deserves.
The architecture uses masked self-attentional layers: for each node, the model computes attention coefficients over its neighbors and forms a weighted combination of their features. The masking restricts attention to actual graph neighbors, so the operation stays local and does not require knowing the full graph in advance or performing expensive matrix decompositions. Multiple attention heads run in parallel to stabilize learning.
Because attention weights are learned per edge, GAT can emphasize the most informative neighbors and works in both transductive settings (a single fixed graph) and inductive settings (generalizing to new graphs). The authors reported competitive or state-of-the-art results on the Cora, Citeseer, and Pubmed citation networks and on a protein-protein interaction dataset.
For practitioners, GAT offered a more flexible and often more interpretable graph model: the learned attention weights hint at which connections drive a prediction, which is valuable when explaining results on customer networks, recommendation graphs, or molecular data.