On the Naturalness of Software

Professor Prem Devanbu
Department of Computer Science
University of California Davis


Natural Language processing (NLP) has been revolutionized by statistical language models, which capture the high degree of regularity and repetition that exists in most human speech and writing. These models have revolutionized speech recognition and translation. We have found, surprisingly, that "natural software", viz., code written by people is also highly repetitive, and can be modeled effectively by language models borrowed from NLP. We present data supporting this claim, discuss some early applications showcasing the value of language models of code, and present a vision for future research in this area.

About the Speaker:

Prem Devanbu received his B.Tech from the Indian Institute of Technology in Chennai, India, before you were born, and his PhD from Rutgers in 1994. After spending nearly 20 years at Bell Labs and its various offshoots, he escaped New Jersey to join the CS faculty at UC Davis in late 1997. He has published over 100 papers, and has won ACM SIGSOFT distinguished paper awards at ICSE 2004, ICSE 2009, and ASE 2011, and the conference best paper awards at MSR 2010 and ASE 2011. He has been program chair of ACM SIGSOFT FSE (in 2006) and ICSE (in 2010). He has served on the Editorial boards of both IEEE Transactions on Software Engineering and the ACM equivalent. He has worked in several different areas over a 25 year research career, including logic programming, knowledge representation, software tools, secure information storage in the cloud, and middleware. For the past years, he has been fascinated by the abundance of possibilities in the veritable ocean of data that is available from open-source software projects. He is funded by grants from the NSF, the AFOSR, Microsoft Research, and IBM.