Recon2012 - PREVIEW
Recon 2012
Speakers | |
---|---|
Pablo Duboue |
Schedule | |
---|---|
Day | Day 2 - 2012-06-15 |
Room | Grand Salon |
Start time | 10:00 |
Duration | 00:30 |
Info | |
ID | 245 |
Predicting English keywords from Java Bytecodes using Machine Learning
I've been working on predicting English keywords in source code comments with access only to the method's compiled bytecodes. I'm using only the comments attached a whole Java method. I've put together a collection of 330 thousand de-compiled Java methods plus their corresponding Java Doc textual comments from the Debian archive. From there, I trained a machine learning ensemble of classifiers. The machine learning takes a .class file and gives you back for each method a set of possible keywords. For example, a method with lots of fadd, fmul, fdiv could be described as a "calculation".
This is really new work, while it is still up to the community to see whether there's value on the technology itself at this stage, I'm making the data and the machine learning scripts available as part of the talk. I'll include enough Machine Learning background to entice everybody in the audience to give a try themselves to experimenting with the data.