Preserving Digital Privacy: AI-powered Tools for Authorship Anonymization

Abstract: 

Many people are concerned about leaving digital traces online that might be attributed to them or used against them in ways they didn't intend. Authorship of anonymous online texts has until recently not been a primary concern, despite some high-profile ""sock puppet"" cases. Recently Authorship attribution, the process of identifying the author of a text, has gained significant attention due to its implications in various domains, from forensic analysis to literary studies. The authorship of texts like the QAnon messages still seems to be a matter of great speculation. With the advent of data science technologies and powerful GPUs, authorship attribution tools are now being scaled up in ways that had not previously been feasible. These tools, powered by advanced natural language processing (NLP) techniques, can potentially identify the authors of anonymous texts, even when efforts are made to obscure stylistic elements.

In this talk, I will outline the current capabilities of web-scale authorship attribution and present our recent research on innovative methods for authorship obfuscation. These methods protect user privacy by concealing stylistic writing traits without altering the underlying message. Using the latest in NLP techniques, a combination of large language models (LLMs), reinforcement learning, and unique orthography, our research showcases how current authorship attribution tools can be eluded by authorship obfuscation tools that eliminate stylistic traits while preserving meaning. I will present ways to measure obfuscation, meaning, and fluency preservation to evaluate these novel applications of AI.

As we navigate the fine line between technological advancement and ethical considerations, the ramifications of these developments on privacy, security, and freedom of expression are profound. I will conclude with a forward-looking discussion on the future of authorship attribution and obfuscation technologies, highlighting the need for a balanced approach that safeguards individual privacy while maintaining the integrity of online content. This talk is designed for data science professionals interested in the intersection of AI, linguistics, and cybersecurity, offering insights into the latest research and practical applications in safeguarding digital anonymity.

Bio: 

Hemanth Kandula is a Research Engineer at Raytheon BBN Technologies, working within the Analytics and Machine Intelligence (AMI) department. Hemanth has many years of experience in machine learning, natural language processing, and computer vision, he focuses on DARPA and IARPA-funded projects, particularly in authorship attribution and obfuscation, multilingual information extraction, and advanced semantic segmentation. Before Raytheon BBN, Hemanth was a Machine Learning Researcher at Harvard Medical School, leading projects that merged machine learning with medical diagnostics, significantly impacting human fertility and viral diagnostics. He holds a Masters degree in Computer Science from Tufts University.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google