Graph Convolution Networks (GCNs)  deal with graphs where the data form with a graph structure. A typical graph is represented as
G(V, E), where
V is the collection of all the nodes and
Eis the collection of all the edges.
Imagine the social network as an example, where each person can be represented as a node, and if two people are friends, there is an edge between these two nodes. In practice, there are many graph-structured data like money laundry where the transactions are considered as directed edges between two bank account nodes. There are some standard graphs, for instance, an image can be treated as a grid graph/network .
Working with the graph is hard as the scalability is often a problem. Say when we have millions of the nodes, standard methods may fail due to the complexity of the algorithm itself or the limitations of the machine computation. Graphs are very common in our real word rather than money laundry. We applied concept graphs (a directed one) in our prerequisite chain paper. In our specific problem, each node is considered to be a concept, for example, “Activation Functions” and “Seq2seq”. The directed edge will be the prerequisite relation. In our paper, we applied Variational Graph Auto-Encoders , an extension of GCN.
What GCN does?
To my understanding, GCN is used to extract features for each node by representing it with a vector. Similarly, it is possible to represent words by word embeddings. Also, we represent each node by looking at its neighbor nodes, or the structure.
The author has a wonderful post about GCN in . I will add my thinkings about it in this section.
As any other stacked neural layers, GCN can be multiple layers. For each layer,
The input matrix
Ais the adjacency matrix. If there is a connection between two nodes, then the element is 1, otherwise 0. is the graph-level outputs of layer
l. In other words, to generate the output for the next layer, we take the current layer as well as the adjacency matrix, then apply a non-linear function
The final output of GCN at the laster layer is a matrix
Z , and the shape is
n is the number of nodes,
f is the number of output features per node.
Initially, we need ,
X here is a matrix with node features with the shape being
d is the number of input features. So to start the GCN, we need to provide
X, eventually the output is
Z which is able to provide meaningful embedding for each node in the graph.
When choosing the function, a simple way is to apply a ReLu and we add a weight matrix
W to each layer, then the function becomes:
It looks like any generalized format of a fully-connected layer where we multiply the weight matrix. However, in the author’s opinion, simply by multiplying with adjacency matrix
A may lead to two problems. The first one is the adjacency matrix is not considering itself for each node unless it is self-connected. In this case, we force the diagonal elements to be 1, or add an identity matrix
A. The other problem is the normalization problem. The author chooses a symmetric normalization way to work on
The definition is from lecture notes in . If you are interested, I refer you the Spectral Graph Theory.
So eventually, the function
where the weight matrix can be learned during training.
The author provided nice codes for GCN here. The code is pretty clear and easy to read. The network structure is the same as the cover image of the blog post  (shown below).
models.py, you can find there are two
GraphConvolution layers defined along with activation function and dropout layer.
GraphConvolution layer is defined in
layers.py, check out the
Variational Graph Auto-Encoders
There are other works that take good advantage of GCNs, i.e., Variational Graph Auto-Encoders (VGAE) in . I will update soon when I have a better understanding of this, stay tuned!
References and Readings
 Slides by the author Thomas Kipf: Structured deep models: deep learning on graphs and beyond.