Real-Time Machine Learning on the Mainframe

Abstract: There are a few tens of thousands of IBM mainframe systems in the world which is probably very small compared to other, more traditional server architectures. But mainframes are unique because they run the most valuable transactional workloads for the biggest companies in the world.

Traditionally, though, mainframe data has not been accessible to real-time data analytics. That’s because most businesses still use ETL (Extract, Transform, and Load) processes to move their data to a separate environment, such as a Data Lake, to make it available for analysis. However, especially banks and healthcare companies, are not keen on having to move personally identifiable finance and health information, for fear of violating regulations, or worse , in light of some recent data breaches, being actually hacked! In addition, ETL is time-consuming, error-prone, and logistically complex, and the fact that it occurs only periodically means that the resultant data can be old. These delays are increasingly unacceptable in today's business environment where real-time business intelligence can determine a company’s competitive edge, or lack thereof.
In this talk, we'll look at a few use cases from finance and healthcare to show how mainframe (VSAM / QSAM) data can directly be accessed and analyzed from within Jupyter notebooks (in Python, with / without a SQL layer) on the mainframe without the need for ETL. We'll then apply different Machine Learning algorithms and compare their performances.

Bio: Charles received his B. S. degrees in Electrical and Electronics Engineering and in Physics at Bogazici University (Istanbul, Turkey) and his Ph.D. in Physics at Brown University (Providence, RI). In transition from semiconductor physics, he held a postdoctoral position at the Cognitive and Linguistic Sciences Department at Brown, working in a neural networks inspired project. After spending a number of years at Fidelity Investments as a quantitative analyst, he’s been in Data Science roles as a practitioner as well as an instructor. Currently he is a Senior Lab Services Engineer in Data Science at Rocket Software, working on integrating Data Science tools with z/OS systems.

Open Data Science Conference