Graph Machine Learning in Industry

Criteo AI Lab is excited to be presenting Graph Machine Learning in Industry. Please join us on Thursday, September 23rd, at 17:00 Paris time. This page will be updated with video links after the workshop.

Motivation

Many problems in data mining, machine learning, and computer science can be formulated as graph problems. From modelling relationships in social networks and recommender systems to identifying the strengths of molecule reactions, graphs are a natural way to represent certain systems.  Research into this area has recently demonstrated the viability of this approach with many recent success stories. 

At the same time, research and deployment of graph machine learning solutions in an industrial setting present new and unique challenges. These include training the models at scale, dealing with heterogeneous data format, storing and updating large graphs, identifying new applications, among many others. 

In this spirit, the goal of Graph Machine Learning in Industry workshop is to gather the community of graph practitioners in the industry and to present recent ML solutions that are successful in solving real-world problems. 

Registration

Please register HERE (you will get reminders and follow-up information).

You can also watch the live stream  on YouTube.

Programme

Each talk is about 20-25 minutes + Q&A. 

Sergey Ivanov (Criteo) (17:00-17:15)

Opening Remarks: Graph Machine Learning World. (Slides)

Jian Zhang (AWS) (17:15-17:45)

Challenges and Thinking in Go-production of GNN + DGL. (Slides)


As one of the hottest AI topics in academia, GNN has shown its potentials in graph-based data and tasks, even on the CV and NLP domain. In real-world cases, however, applying GNN to solve real problems and put it on-production is actually a long journey. From the Data, Model, Architecture, and Explainability four perspectives, Dr. Zhang will discuss the challenges we faced along the road of go-production of GNN + DGL, and hope the entire community, both academic and industrial, to think thoroughly for research ideas and solutions to tackle these challenges.

Charles Tapley Hoyt (Harvard) (17:45-18:15)

Current Issues in Theory, Reproducibility, and Utility of Graph Machine Learning in the Life Sciences.


The complexity inherent in the life sciences encourages the use of the newest and most powerful methods and models in machine learning artificial intelligence. Because their development and evaluation are typically done in an academic, preliminary setting, there remain several challenges in translating to an industrial setting. This short talk will introduce several of those issues, including the formulation and application of evaluation metrics, the discrepancies between benchmark knowledge graphs and reality, the difficulties of constructing biomedical knowledge graphs, and methodological issues related to randomness, data leakage, and robustness.

Anton Tsitsulin (Google) (18:15-18:45)

Graph Learning for Billion Scale Graphs.

This talk touches on what graphs are, why they are important, and where they appear in the world of big data. The talk then dives into two core tools that make up the Graph Mining and Learning toolbox, and lays out several use cases. The first part of the talk covers the Grale graph building framework, a highly scalable tool for generating learned similarity graphs from arbitrary data. The second part of the talk discusses the challeges of running Graph Neural Network (GNN) algorithms for very large-scale graphs.

Cheng Ye (AstraZeneca) (19:00-19:30)

Predicting Potential Drug Targets Using Tensor Factorisation and Knowledge Graph Embeddings.  (Slides)


The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery process in the recent decade, including at the earliest stage - identification of druggable disease genes. In this paper, we have developed a new tensor factorisation model to predict potential drug targets (i.e., genes or proteins) for diseases. The result shows that incorporating knowledge graph embeddings significantly improves the prediction accuracy and that training tensor factorisation alongside a dense neural network outperforms other methods. In summary, our framework combines two actively studied machine learning approaches to disease target identification, tensor factorisation and knowledge graph representation learning, which could be a promising avenue for further exploration in data-driven drug discovery. 

Rocío Mercado (MIT) (19:30-20:00)

Accelerating Molecular Design Using Graph-Based Deep Generative Models. (Slides)


Drug discovery and development is a highly complex enterprise, where non-clinical programs such as hit discovery and lead optimization typically take up a few years in the early stages of a drug development project. In this talk, I will dive into the field of graph-based deep molecular generative models and show how they are promising tools for de novo molecular design. Last year, my team at AstraZeneca and I introduced a platform for graph-based molecular design; this platform is called GraphINVENT and uses graph neural networks and reinforcement learning to generate molecular graphs that satisfy a set of specified constraints. In this talk, I will speak about the advantages of graph-based molecular generation, as well as current challenges in the development of graph-based deep generative models for drug discovery applications.

Lingfei Wu (JD.com) (20:00-20:30)

Deep Learning On Graphs for Natural Language Processing.


Due to its great power in modeling non-Euclidean data like graphs or manifolds, deep learning on graph techniques (i.e., Graph Neural Networks (GNNs)) have opened a new door to solving challenging graph-related NLP problems. There has seen a surge of interests in applying deep learning on graph techniques to NLP, and has achieved considerable success in many NLP tasks, ranging from classification tasks like sentence classification, semantic role labeling and relation extraction, to generation tasks like machine translation, question generation and summarization. Despite these successes, deep learning on graphs for NLP still face many challenges, including automatically transforming original text sequence data into highly graph-structured data, and effectively modeling complex data that involves mapping between graph-based inputs and other highly structured output data such as sequences, trees, and graph data with multi-types in both nodes and edges. In this talk, I will talk about relevant and interesting topics on applying deep learning on graph techniques to NLP, ranging from the foundations to the applications. 

Organizers