CRESST: Complete Rare Event Specification Using Stochastic Treatment
CRESST: Complete Rare Event Specification Using Stochastic Treatment

Abstract: 

In the fast moving world today, rare events are becoming increasingly common. Ranging from studying incidents of safety hazards to identifying transaction fraud, they all fall under the radar of rare events. Identifying and studying rare events become of crucial importance, particularly when the underlying event conforms to a sensitive or an adverse issue. The thing to note here is, despite the probability of occurrence being very close to zero, the potential specification of the rare event could be quite extensive. For example, within the parent rare event of Product Safety, there could be multiple types of potential hazard (Fire, Electrical, Pharmaceutical, etc.), rendering the sub-classes rarer still. In this talk, we are going to discuss a novel algorithm designed to study a rare event and its sub-classes over time with primary focus on forecast and detecting anomalies.

Proper forecast of expected cases of the rare event and its sub-classes help prepare for the upcoming challenges and enforce resources accordingly. For
example, based on the historic data, if we can predict that the customer complaints regarding skin irritation will be higher than usual for a certain product in a given month, we can enforce stricter quality control on the said product or, even recall it altogether. Likewise, we need to continuously study the evolving patterns of the time series. Depending on the rarity, the complexity of identifying anomalies will vary. The rare events here will be studied over shorter periods in time. Hence, the anomalies will be relative anomalies that may not eventually contribute to the longer term trend. But identification of any deviation compared to recent states can be of significance in the industry. For instance, for seller fraud detection, a fraudulent seller will not continue to be a part of the marketplace over longer periods in time. In such cases, the overall longevity of the rare event, i.e., fraud, is very low, giving rise to shorter time series.

In this exercise, we will discuss Non Homogeneous Count Process to forecast cases of the rare event. For the deviation study, we introduce a novel technique that computes a deviation score for each unit of time. This score is a composite of two component scores- Intra & Inter DScore, each capturing a different aspect of anomaly detection in time series. The threshold for the deviation score, beyond which the unit is flagged as an anomaly, is computed via a unique clustering technique.

Traditional time series forecast divides the time series into deterministic and random components. The deterministic components are removed and the random component is modeled using stochastic regressors. In most of the traditional models, the random component is assumed to follow a Normal Distribution. In recent times, more computationally expensive methods have given rise to techniques such as XgBoost and LSTM. These techniques, although fairly accurate, require larger sets of training data. However, our research relates to forecast of rare events to be trained on shorter periods of time. Due to the low probability associated with the underlying event, the magnitude of the values of the time series will be very low. Any continuous distribution to these low-order count values will be a very poor fit. Ideally, the count values are expected to follow a Poisson or, a Negative Binomial Distribution. Hence, studying them as a Count Process is more accurate. Additionally, due to the high sparsity in the data, we will allow higher freedom in choice of stochastic regressors. The deviation detection method used here is built to suit the underlying use case. Most anomaly detection techniques in time series focus on finding anomalies within a given time series. In our case, we will try to utilize all the information available to us, even outside of the time series. Since, anomalies are to be identified over shorter periods of time, critical values will be dynamic and need to be recomputed periodically. All time series comparisons will be between warped set of points (using Dynamic Time Warping).

The session will include a brief introduction on the concepts of Rare Events, Count Processes, Poisson and Negative Binomial Distribution Origins and Dynamic Time Warping. Following this, we will be discussing one area of application for this paper in great depth - starting from the problem formulation up to the final business impact + caveats. The entire algorithm will then be explained in the light of the problem statement (generalized alternates will also be given at every point). Since the algorithm is math heavy, we will be taking small detours to explain certain key concepts (such as Concords & Discords in time series, Relative Local Density Clustering, Outlierness, Correlation Mutation, Sister Classes, etc.) used in the solution; the audience will not be expected to know advance concepts before-hand - basic idea of time series forecast and distance measures would suffice. Finally, once the solution is laid down in a manner that can be easily replicated for other business solutions - we will discuss limitations and future work + (if time permits, a few more example business problems where the solution can be applied). This is a very generic algorithm applicable to cases in multiple disciplines: Retail, IT, Finance - basically anywhere a Rare Event is feasible in the form of a Time Series!

Keywords:
Product Safety
Fire Hazards
Transaction Fraud
Malware Detection
Relevant Sister Classes
Count Time Series
Non Homogeneous Poisson Process
Discrete Space Optimization
Text Mining
GloVe
Sparsity Treatment in Text
Dynamic Time Warping
Discords
Anomaly Detection
Deviation Score
Relative Local Density
Density Based Clustering

Bio: 

Debanjana is a Data Scientist at Walmart Labs, Global Data Value Realization. At Walmart, she has been instrumental in building numerous high-functioning bots in the compliance space dealing heavily in Natural Language Processing, Optimization, Mixture Models and Rare Time Series. Currently, her focus is on extensive Shrink Research where along with her team members she is identifying potential areas of high impact for Retail Shrink.

During her 3 years of experience, Debanjana has filed 5 US patents in the field of Clustering & Anomaly Detection, Imbalance Text Classification, Travel Optimization and Stochastic Processes. In addition, she has three published papers to her credit. She presented her paper REDCLAN (Relative Density Based Clustering and Anomaly Detection) in ADCOM’18, CRESST was included at ICMLA'19 and iCASSTLE (Imbalanced Classification Algorithm for Semi Supervised Text Learning) was presented at ICMLA'18 (Orlando, FL), which was later published by IEEE.

Debanjana has a master's degree in Statistics from Indian Institute of Technology (Kanpur).