Model Evaluation in the land of deep learning

Abstract: Model evaluation metrics are typically tied to the predictive learning tasks. There are different metrics for classification (ROC-AUC, confusion matrix), regression (RMSE, R2 score], ranking metrics (precision-recall, F1 score), and so on. These metrics, coupled with cross-validation or hold-out validation techniques, might help analysts and data scientists select a performant model. However, model performance decays over time because of the variability in the data. At this point in time, point estimate-based metrics are not enough, and a better understanding of the why what, and how of the categorization process is needed.

Evaluating model decisions might still be easy for linear models but gets difficult in the world of deep neural networks (DNNs). This complexity might increase multifold for use cases related to computer vision (image classification, image captioning or visual QnA(VQA), text classification), sentiment analysis, or topic modeling. ResNets, a recently published state-of-the-art DNN, has over 200 layers. Interpreting input features and it output categorization over multiple layers is challenging. The lack of decomposability and intuitiveness associated with DNNs prevents widespread adoption even with their superior performance compared to more classical machine learning approaches. The faithful interpretation of DNNs will not only help in providing insight about the failure modes (false positives and false negatives) but will also help the humans in the loop evaluate the robustness of the model against noise. This brings in trust and transparency to the predictive algorithm.

In this workshop, I will share how to enable class-discriminative visualizations for computer vision/NLP problems when using convolutional neural networks (CNN) and approach to help enable transparency of CNN's by not only capturing metrics during the validation step but also highlighting salient features in the image and text which are driving prediction.
I will also be talking briefly about the open source project("Skater": and how it can help in solving our interpretation needs.

Bio: Pramit Choudhary is an applied machine learning research scientist. He focuses on optimizing and applying Machine Learning to solve real-world problems. His research area includes scaling and optimizing machine learning algorithms. Currently, he is exploring better ways to explain a model’s learned decision policies to reduce the chaos in building effective models to close the gap between a prototype and operationalized models.

Open Data Science Conference