Abstract: Graph Neural Networks have demonstrated incredible results in different domains where a network is present. The workshop objective is to use the Yelp Dataset to create business recommendations for users exploiting the network composed of reviews, users, friends, tips, and businesses. The workshop will start from the downloaded jsons of the Yelp dataset from which we will create csvs for the import on a Neo4j Database. The Neo4j library will allow us to visualize and better understand the schema and properties of the underline graph. Once we have a glimpse of our training graph we will start the preprocessing of the data, analyzing each node label and relationship type in order to understand how to transform properties into appropriate input features for the GNN. Another preprocessing step will be the drop of the features with too many missing values created from the translation from jsons to a graph. After that, we will create an in-memory graph with the DGL library that will also help us to split the data into training, validation, and test sets and to create a minibatch sampler for each set. Once all is done we will train our graph neural network for the business recommendation task. To complete the cycle we will need a tool for the inference of new business recommendations. We will save our business and user vectors in an open-source vector database (Qdrant), which offers good speed, accuracy, and flexibility. In this way, we can perform a fast and accurate similarity search and retrieval of our business vectors based on their compatibility with the user.
download the Yelp dataset jsons and create the csvs for neo4j import for visualization
preprocess and clean the dataset
create the DGL in-memory graph
train a heterogeneous graph neural network for business recommendation
use the trained model together with a vector database for inference
Bio: Valerio Piccioni is an AI Engineer at LARUS who primarily focuses on Graph Neural Networks, but also likes to have a go with other deep learning fields like NLP and Computer Vision. He is also interested in MLOps as building machine learning models that can arrive into production is harder than it seems. Currently he is working on a project regarding fraud detection with graphs.