Read the source code of .jar files with python


Keywords:java 


Question: 

I want to read the source code of jar files and extract the words' frequency. I know that it is possible to read the content of jar files with Java editors, but I want to do this automatically with a python script.


1 Answer: 

Do you require a Python library specifically? Krakatau is a command line tool in Python for decompiling .jar files, you can perhaps import it and use the relevant functions from inside your script.

Alternatively, you can call it, or any other command line .jar decompiler such as Procyon, using Python's Subprocess.

In the 2nd case, you would most likely like to redirect and capture stdout and/or stderr. A basic call may look something like:

import os    
from subprocess import Popen, PIPE
.
.
jar_decompiler_output = Popen(('jar_decompiler', '1stparam', '2ndparam',..), stdout= PIPE).communicate()[0].split(os.linesep)

Note that communicate() returns a tuple.