Estimating customer budget with hierarchical probabilistic models

Abstract: At ThriveHive we deal with small and medium size businesses with different levels of marketing sophistication and budgets. One of the major challenges for us and for the businesses we serve is determining what is the right budget for a particular industry targeting a particular region. This is a difficult problem for predictive modelling as even though we have years of spend data, it is unlikely that this “realized spend” is the same as the marketing budget. We need a way to incorporate uncertainty in the estimate of this spend and in the difference between the spend and a business’ actual budget. It would also be important to allow this uncertainty to vary by industry and region. For this purpose, we used a hierarchical probabilistic modelling approach implemented in PyMC3.

In this talk we will give outline our process for estimating marketing budget at the region-industry level. First we will lay out the data resources we used, including internal sales data and external sources of marketing budget information. Then we will provide an overview of hierarchical modelling and how to incorporate fixed and random effects in regression models. Then we will walk through our process of developing a hierarchical model for estimating customer marketing budget. The process involved testing several configurations of region and industry effects for both the realized spend part of the model and the difference between spend and budget. Our results indicated that a fixed effect for each level resulted in a posterior predictive distribution that best fit the underlying data.

We will spend the last part of the talk outlining our process for productionizing the model in an API. The API allows our sales and marketing teams to obtain estimated budget ranges based on a set of customer characteristics. We will conclude the talk by discussing possible expansions to the model, including the use of deep probabilistic programming in Edward.

Bio: Benjamin is a Data Scientist at ThriveHive where he works with business stakeholders to create tools and models providing guidance to internal teams and business customers. Before ThriveHive he worked with the City of Boston’s analytics team on various modelling projects. He received his PhD in Policy Analysis from the RAND Corporation, with a focus on projects related to health, infrastructure and defense. He has experience with a wide array of methods and their applications across a diverse set of verticals.

Open Data Science Conference