Recon2012 - PREVIEW

Recon 2012

Pablo Duboue
Day Day 2 - 2012-06-15
Room Grand Salon
Start time 10:00
Duration 00:30
ID 245

Predicting English keywords from Java Bytecodes using Machine Learning

I've been working on predicting English keywords in source code comments with access only to the method's compiled bytecodes. I'm using only the comments attached a whole Java method. I've put together a collection of 330 thousand de-compiled Java methods plus their corresponding Java Doc textual comments from the Debian archive. From there, I trained a machine learning ensemble of classifiers. The machine learning takes a .class file and gives you back for each method a set of possible keywords. For example, a method with lots of fadd, fmul, fdiv could be described as a "calculation".

This is really new work, while it is still up to the community to see whether there's value on the technology itself at this stage, I'm making the data and the machine learning scripts available as part of the talk. I'll include enough Machine Learning background to entice everybody in the audience to give a try themselves to experimenting with the data.