DeepWalk: Online Learning of Social Representations

DeepWalk, presented by Bryan Perozzi, Rami Al-Rfou, and Steven Skiena in a paper submitted to arXiv on March 26, 2014, was an early and influential method for learning vector representations of network nodes. It introduced the now-standard trick of borrowing language-modeling techniques to analyze graphs.

The method generates many short, truncated random walks starting from each node. Each walk is a sequence of nodes, which DeepWalk treats as the equivalent of a sentence and the nodes as words. Feeding these sequences into a skip-gram language model, the same kind used for word embeddings, produces a dense vector for every node such that nodes appearing together in walks end up close in the embedding space. This captures local community structure without any hand-engineered features.

The authors showed that DeepWalk produced strong results on multi-label classification of nodes, particularly when labeled training data was scarce, and that the approach was scalable and parallelizable for large real-world networks. Because walks can be generated independently, the method fit naturally into streaming and distributed settings.

DeepWalk’s importance is largely historical and conceptual: it opened the door to a wave of graph embedding research, including node2vec and later neural graph models. For a general reader, it marked the moment when the powerful word-embedding idea jumped from text to networks of any kind.

DeepWalk: Online Learning of Social Representations

Sources

Related