For some Friday fun, here’s a word cloud generated from my Mendeley library. It’s based on a collection of about 400 PDF files, the papers I read for my research. Neutral ecology is a bit overrepresented as I used to maintain a Mendeley group on the topic, but otherwise the keywords do reflect what I do.
The “making of”
If you’d like to make yours, the first thing is of course to extract all the text from the PDFs automatically. I used pdftotext
, a command line tool that comes with Xpdf and Poppler. On OS X with MacPorts, the installation is as simple as sudo port install poppler
. On Ubuntu it’s probably the poppler-utils
package.
On Unix-like systems, the following will create a file containing the text of all the PDFs from the current folder:
for i in *.pdf; do pdftotext "$i" - >> allpapers.txt ; done
Now we can use this data to make the cloud. There are several tools that can do this, the most popular being Wordle. I wanted finer control over the end result than what Wordle provides, so I used Mathematica instead, based on this code.
- Mathematica notebook with the word cloud code (with comments, to make it easier to tweak it to your preference).
Comments !