Abstract: As we all are aware, we have data more than we had before. With the amount of data getting more and more every day, we all work with different types of data. We can categorize the types of data into 3 parts: numbers, texts, and images. But not everything is as easy and as straight forward as numbers to work with. One has to convert everything to a number to work with. For this session, we will discuss “Topic Modeling Practices And Its Value For Businesses”.
Text data could be very unstructured and may include a lot noise. Even with current algorithms one needs to make sure to clean and prepare the data for any successful analysis and in particular with text, we need make sure we understand the algorithms, the results and how can we utilize those algorithms to bring values to our companies with feasibility and innovation in mind.
In this workshop, we will start talking about different types of text data that a company may have. Then we will talk about Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BOW), Word2vec, Latent Dirichlet Allocation (LDA) and more…
At the end, we will see real example of cleaning, structuring text data, and run a Topic Modeling algorithm on a text data, discuss the results and its value to the businesses. At the end of the session we will talk about the pros and cons of the approach and how to bring values to our business and jobs with text data. If you want to know how to take advantage and analyze your text data, this maybe a session for you.
Bio: Yunus Genes completed his Masters in Computer Science, and continuing his part time PhD at University of Central Florida. His research is focused on Applied Machine Learning, social media behavior, misinformation detection/diffusion. He has been working on this field over 4 years. His is currently working for Royal Caribbean Cruise Line LTD. He has previously held Data Science positions at Silicon Valley as well as Florida, Orlando area.
Yunus Genes, PhD
Data Scientist | Royal Caribbean Cruise Line