In machine learning – we are accustomed to traditional ways of representing data, where we assume a bunch of i.i.d data points (features, and optionally their labels) and we try to learn a statistical model on this dataset. In real life, the i.i.d assumption is often violated, and individual data points are interconnected through rich, informative relationships (Reminiscent of the entity-relationship model). Imagine the data representing social networks, search engines, geospatial systems, question answering systems, protein interaction networks, even molecular structures. We need better representation for these use-cases, which is a) more expressive, b) able to model non-euclidean spaces, and c) able to represent and exploit relational information between these points. Graphs are a natural choice to represent these use cases. We connect data points(entities/nodes) with relationships(edges) and optionally embed each node/edge with auxiliary features/additional information.
Can we apply machine learning to graphs? Yes! recent literature has shown successful applications of Graph Neural Networks(GNNs) (deep learning methods that operate on graphs) to tasks like node classification, link prediction, clustering and learning high-quality embeddings. Graph Convolutional Networks(GCNs) – Convolutional Neural Networks generalized to work on graphs have shown promise in Semi-Supervised Learning (SSL) tasks. Supervised data is expensive/time-consuming to obtain – SSL algorithms improve sample efficiency by leveraging a large amount of unlabelled data in conjunction with labeled data. In graph terms, we can label a small subset of nodes or edges in the graph, and propagate them to the corresponding full set.
This session will be a deep-dive code-walkthrough into the GCN rabbit-hole for semi-supervised node classification – generalizable to solve common data science problems in the industry.
Key Takeaways:
- Domains and datasets which can be modeled as graphs, where graph machine learning is applicable
- An in-depth introduction to GCNs in the context of semi-supervised classification
- Practical considerations to implement these models in practice