Abstract: We are witnessing the rapid adoption of Pandas as a main library for representation and manipulation of structured data in python. In the contrary, when it comes to the Natural Language Processing (NLP) applications, we are usually using various NLP libraries with complex and more importantly incompatible output structures. That makes the integration of NLP features and solutions into the machine learning and data science pipeline difficult and time-consuming. To resolve this issue, the Center for Open Source Data and AI Technologies (CODAIT) has developed Text Extensions for Pandas, an open source library of the extensions that turns Pandas data frames into the universal data structure for NLP and hence offers transparency, simplicity and compatibility. In this talk, we will first talk about how we can solve the real-world text analytic problems with NLP. We then show how the Text Extensions for Pandas make these analyses easier.
Bio: Sepideh Seifzadeh is Data Scientist/Machine Learning Engineer with IBM Data Science Elite team, based in San Francisco. After studying her PhD in the field of Computer Engineering at the Center for Pattern Analysis and Machine Intelligence at the University of Waterloo, she started my journey as Big data & Analytics consultant, she then joined IBM as open source solution engineer and was working for IBM Canada for 2 years where she was honored to be awarded “the best of IBM” in 2018. She recently moved from Toronto to San Francisco and joined DSE team. She really enjoy being on the team as she is learning how to apply Data Science in real world applications and get the chance to meet with different customers in different industries and help them in their Data Science journey. Also, she enjoys working with the top talented people in the team and have a chance to learn from them.