
Abstract: Machine learning (ML)/Artificial Intelligence (AI) is a hot topic in recent years due to advancement of computing power and availability of big data. Many successful applications got many people excited and people try to infer ML/AI to every aspects of human activity and claim that combined ML-an- big data will replace the majority of human jobs, even many professional ones.
However, there are many challenges when applying ML to real world applications. This is especially true in industrial applications because of challenges raised by industrial data set and industrial requirements In the industrial space, the environment is infinitely more complicated, where human behavior and machine operation are tangled with physical, chemical, and biological processes on mechanical, electrical, and electronic equipment. Two major problems with industrial data set are data quality and lack of labels. Noisy data and missing data are major issues in industrial applications. Failure and normal operation patterns can be context dependent and their ratio is extremely low. It is challenging to define a ‘gold dataset’ for model training. Domain knowledge plays crucial role in industrial space where physical modeling exists for long time and tons of domain specific know-hows are developed. how to integrate domain knowledge into machine learning is critical to the success in two aspects. A pure black box algorithm typically does not generate good enough performance and a pure black box system does not gain customer trust in many cases. All these requirements add new challenges for machine learning, it demands a robust algorithm which meets the data challenges, while requires high fidelity and yet interpretable results.
The presentation discusses how industrial data set and industrial requirements affect the performance of ML algorithms and practical directions/best practices to address those challenges for industrial AI application design.
Bio: Xiaohui Hu is currently principal data scientist at GE Digital, located in Boston, MA. he leads a team of data scientists to design, develop, and support data science and analytic solutions for various industrial applications and software products. Xiaohui received his PHD in electrical and computer engineering from Purdue university and his Bachelor's degree from Tsinghua University, China respectively. His main research interests machine learning/computational intelligence, data modeling, prognostics and health management, and industrial AI applications.

Xiaohui Hu, PhD
Title
Principal Data Scientist | GE DIgital
