Data science for the 99%

Abstract: Data science has a reality problem. The overwhelming majority of mental energy in data science appears to be devoted to the 1% – the bleeding edge technical companies and the looming mega-corporations. But this laser focus completely misses the 99% – the entrepreneurs in non-tech fields, the people who work in small businesses, the volunteers at nonprofits, and so many others. That is, the organizations that will never have a Chief Data Officer or may not have any staff with formal data training. This presentation addresses the ways that this neglected majority could benefit from the insights of data science, primarily by encouraging all of us as data scientists to do five potentially novel things. First, we as data scientists should become better aware of the organizations that operate under our radar. This involves a willingness to step out of our familiar cognitive and physical environments, such as online “echo chambers” or tech-rich coastal cities, at least temporarily. In doing so, we can gain a better understanding of the immense variety of work that people do and the context in which they do it. Second, we need to better understand the goals, methods, and constraints of these organizations. Most of them are privately held and have no intention of selling to investors. Rather, they focus on serving local constituents and often have to make do with severely limited resources. These constraints change the approach that data work must take. Third, we need to give more care and attention to the data science methods that best meet these needs. Often, many of these organizations have no data strategy and so the simplest methods – bar charts and time charts – can have the most benefit. Decision trees and clustering can also be helpful if they can be performed easily and presented clearly. Fourth, we should take the time to involve ourselves in working with these organizations. This work can take the form of a long-term consulting arrangement or, more often, as a short term, collaborative effort like a data hackathon. These events can open the eyes of organizations to the benefits of deliberate data science and help set up a framework for continued progress. And fifth, we, as data scientists, can help the people in these organizations learn to work with data in ways that best fulfill their goals. That involves helping them learn the methods to continue their data work with the tools that they already have. It requires us to move away from AWS instances and embrace data science in spreadsheets or CRM applications. By following these suggestions, we can get past our fixation on the glamour of the 1% and provide meaningful benefits to the 99%. We can overcome our reality problem and help create the true democratization of data science.

Bio: TBD