Intelligent Archive

Intelligent Archive (IA) is a Java application for managing corpora of texts for stylometry. It builds on a long history of world-leading stylometry research at the Centre for Literary and Linguistic Computing (CLLC), University of Newcastle, Australia.

Director: Prof. Hugh Craig

Software developers: Bill Pascoe, Jeremy Johnson, Michael Ralston, R Whipp.


  1. IA is a Java application. Ensure you have Java installed.
  2. Download the latest release below.
  3. Extract the .zip file. Note that if you double click the .zip file Microsoft may not prompt you to extract the .zip archive, but show you what's inside it. If you try to run the .jar file without extracting it from the .zip file first it will give an error. To extract/decompress the .zip file on Microsoft Windows, right click it and choose 'Extract All' (the wording may be different, such as 'Extract archive', 'Decompress', etc). You should then see a normal folder containing a .jar file and a 'Config' and 'lib' folder.
  4. Double click the .jar file to run it.


Latest Release

Previous Releases


Intelligent Archive is a Java application for stylometry, or computational and statistical analysis of style in texts, produced by the University of Newcastle's Centre for Literary and Linguistic Computing (CLLC). It can handle corpora of plain text, HTML, XML and TEI texts. IA enables you to easily organise texts into sets, manage metadata, generate word frequencies, handle XML tags, and split texts in various ways to generate results that can be exported for further analysis in statistical software. It includes some experimental stylometry techniques developed at the CLLC and special features for handling literary texts such as generating word frequencies for speaking characters in plays.

Main Features

See the release notes for a particular version for more detail.

System Requirements

IA is a Java application, and so requires the Java Runtime Environment, common on most computers, and easy to download and install from Java.

The core functionality only requires a very basic system with at least 512MB of memory. The software does not require a fast CPU, however, for very large corpora it will be able to provide its results quicker if equipped with a quicker CPU. The software is not currently multithreaded for optimal use of multiple CPUs or CPU cores.

The software itself uses less than 10MB of disk space. You will also require enough disk space to store all texts added to the text repository.