Each year, organizations and research firms output yearly Cybersecurity Threat and Breach predictions. Due to the Covid-19 pandemic, a global shift has occurred, forcing organizations to adapt to a “new normal” that includes a more distributed remote workforce. In turn, this has greatly affected data theft. Forrester predicts that employee’s fears of job loss coupled with ease of data movement through mediums such as email, USB, and cloud would lead to an 8% increase in insider attacks, with as much as a third (33%) of all incidents occurring internally to an organization in 2021!

An insider attack or threat refers to a malicious act targeted at an organization by employees, contractors, and business associates, potentially collaborating with a foreign nation-state or organization with malintent.  Insiders are increasingly becoming one of the largest threats to an organization due to their detailed knowledge of system operations, security practices, and legitimate physical and electronic access to critical systems. Insider threat can be difficult to detect as a compromised employee is acting permissibly and therefore not explicitly breaking rules yet is creating a pattern of suspicious behavior.

Pandata, an artificial intelligence design & development firm, and FirstEnergy, a Fortune 500 utility company, partnered to develop an approach for detecting insider threats. Within FirstEnergy, there is existing technology for monitoring organizational-wide threat vectors via a rules-based methodology. However, the ability to detect rare, complex, “risky” human behavioral patterns that are indicative of true malicious insider attacks such as fraud or physical/cyber sabotage require advanced ML approaches.

Pandata and FirstEnergy developed an AI/ML approach to characterize employee activity including physical access and digital behavior and compute a Holistic Risk Profile for employees to monitor risk both long term as well as in near-real-time. The goal of this project was to intelligently prioritize activity that may constitute risk into a manageable number of events for further investigation by FirstEnergy security analysts – tens versus tens of thousands. However, the difficulty lies in taking a very high volume of activity (physical and digital behavior of over 15,000 employees) and building a model to learn risk from an unlabeled dataset – one without known examples of insider threat behavior.

Our first approach demonstrated the ability to characterize what constitutes normal activity – patterns of behavior that employees tended to follow day after day. However, there were a significant number of abnormal events. In collaboration with the FirstEnergy security analysts, we identified that the vast majority were not activities of interest – for example, a person coming to work late for a dentist appointment or using a computer after a two-week vacation.

To reduce false flags, we took two approaches – we used information about an individual’s job role and access to assign risk to people and activities, and we developed a human-in-the-loop retraining pathway. The risk assignment took a heuristic approach based on domain knowledge of physical locations, digital access, and job description. This served to reduce the perceived riskiness of events that may be atypical yet involve a person with limited access to sensitive information. Conversely, abnormal activity done by those with access to highly sensitive information or equipment is given a higher risk score, prioritizing it for analyst investigation. This approach served to reduce the number of false flags significantly, setting the stage for analysts to investigate a manageable number of events. The human-in-the-loop model had a similar effect. We collected feedback from the analysts regarding what constituted an event or person of interest and used that to feedback into the risk score model such that the model learned over time.

Throughout this process, we have developed an ensemble model approach combined with a heuristic risk assessment to develop a Holistic Risk Profile around employee behavior. In doing so, we have identified and prioritized potential risk, creating an insider threat detection system that enhances the ability of security analysts to catch the bad actors.

To find out more about how to leverage AI to keep out the bad guys, visit our talk at ODSC East 2021, “Building a Holistic Risk Profile: Near Real-Time Approach to Insider Threat Detection.”

About the authors/ODSC East 2021 speaker on Building a Holistic Risk Profile:

Danielle Aring is an IT Security Data Engineer IV with the Transmission Security Operations Center (TSOC) at FirstEnergy. In her role, she is responsible for the design, development, implementation, and maintenance of IT security equipment and software. Danielle holds a master’s in Computer Information Science from Cleveland State University. With an extensive background in software engineering and expertise in machine learning, Danielle is guiding the transition of the TSOC away from reactionary, rules-based threat detection to preventative, predictive, threat-hunting approaches. She built her organizations’ security data lake in Hadoop from the ground up. Developed several large-scale data pipelines for near real-time security log ingest along with alerting, monitoring and metrics. Danielle is passionate about cybersecurity educational awareness and innovative applications of AI/ML to the changing threat landscape.

Hannah Arnson serves as Director of Data Science with Pandata – a Cleveland-based AI consulting firm. There, she leverages her 10+ years of experience to lead AI solution design and development, with a focus on ethical and approachable AI. Hannah began her career as a neuroscientist, receiving a Ph.D. in neuroscience from Washington University in St. Louis, then continuing on to do postdoctoral research. During this time, she developed statistical and mathematical models to better understand topics ranging from the sense of smell to navigation in pigeons. As a data scientist, Hannah’s passions lie in finding patterns within complex datasets and educating to make these technical concepts accessible to all.