Building a Holistic Risk Profile: Near Real-Time Approach to Insider Threat Detection
Building a Holistic Risk Profile: Near Real-Time Approach to Insider Threat Detection


The typical cybersecurity application involves safeguards and mechanisms to protect against outside compromise. However, insiders are increasingly becoming one of the largest threats to an organization due to their detailed knowledge of system operations, security practices, and legitimate physical and electronic access to critical systems. Insider threat can be difficult to detect because a compromised employee is acting permissibly and is therefore not explicitly breaking rules; yet is creating a pattern of suspicious behavior.

Pandata partnered with FirstEnergy’s Security Operations Center to develop an AI solution around insider threat detection to create a Holistic Risk Profile for all employees based on physical and digital behavior. This approach builds on user behavior analytics – profiling behavior to determine first what constitutes normal, then determine what is abnormal, and finally, what is malicious, done in the absence of labeled threat data. To accomplish this task, we first developed custom stream data-pipelines and a pre-processing profiler that enriches and stores several internal operational and security data sources in a Hadoop Data Lake. The profiler component aggregates several employee “identities” across the organization into a single curated “employee security profile” to which varying levels of criticality are assigned by rules-based associations for location, job role and asset access. This strategy provides us the ability to quickly process near-real-time data activity on the order of hundreds of thousands of events per second.

Through a partnership with cybersecurity analysts, we incorporated human expertise alongside an ensemble machine learning model with a human-in-the-loop retraining component to develop a model that both detects abnormality and attributes risk to patterns of behavior. While the work is still ongoing, this AI solution built using a combination of Python and PySpark programming languages as well as several big data technologies available through the Cloudera Data Hub Platform allows cybersecurity analysts insight into long-term patterns of behavior and both prioritizes and reduces activity to investigate down from tens of thousands of events. This solution not only takes the necessary steps to relieve “security analyst burnout” a syndrome that commonly plagues the cyber security industry and often results from an analyst having to manually respond to hundreds of false positive threats daily, but empowers them to focus on the infrequent yet noteworthy events that would have otherwise been missed.


Hannah Arnson serves as Director of Data Science with Pandata – a Cleveland-based AI consulting firm. There, she leverages her 10+ years of experience to lead AI solution design and development, with a focus on ethical and approachable AI. Hannah began her career as a neuroscientist, receiving a Ph.D. in neuroscience from Washington University in St. Louis, then continuing on to do postdoctoral research. During this time, she developed statistical and mathematical models to better understand topics ranging from the sense of smell to navigation in pigeons. As a data scientist, Hannah's passions lie in finding patterns within complex datasets and educating to make these technical concepts accessible to all.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google