Unsolved ML Safety Problems


Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, I outline a roadmap for ML Safety and refine the technical problems that the field needs to address. I present three pillars of ML safety, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), and steering ML systems ("Alignment").


Dan Hendrycks is a PhD candidate at UC Berkeley, advised by Jacob Steinhardt and Dawn Song. His research aims to disentangle and concretize the components necessary for safe AI. His research is supported by the NSF GRFP and the Open Philanthropy AI Fellowship. Dan has helped contribute the GELU activation function, the default activation in most Transformers including BERT, GPT, and Vision Transformers

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google