Intelligent Archive
Intelligent Archive (IA) is a Java application for managing corpora of texts for stylometry. It builds on a long history of world-leading stylometry research at the Centre for Literary and Linguistic Computing (CLLC), University of Newcastle, Australia.
Director: Prof. Hugh Craig
Software developers: Bill Pascoe, Jeremy Johnson, Michael Ralston, R Whipp.
Install
- IA is a Java application. Ensure you have Java installed.
- Download the latest release below.
- Extract the .zip file. Note that if you double click the .zip file Microsoft may not prompt you to extract the .zip archive, but show you what's inside it. If you try to run the .jar file without extracting it from the .zip file first it will give an error. To extract/decompress the .zip file on Microsoft Windows, right click it and choose 'Extract All' (the wording may be different, such as 'Extract archive', 'Decompress', etc). You should then see a normal folder containing a .jar file and a 'Config' and 'lib' folder.
- Double click the .jar file to run it.
Downloads
Latest Release
- Rosella (IA3.0) beta (.zip file 3780kb) Release notes
Previous Releases
- Corella, IA2.1 (.zip file 3780kb) Bug Fixes.
- Corella, IA2.0 (.zip file 4860kb) Release notes.
- Galah, IA1.1 (.zip file 486kb)
- Budgerigar, IA1.0 (.zip file 735kb) Documentation (pdf 318kb).
Introduction
Intelligent Archive is a Java application for stylometry, or computational and statistical analysis of style in texts, produced by the University of Newcastle's Centre for Literary and Linguistic Computing (CLLC). It can handle corpora of plain text, HTML, XML and TEI texts. IA enables you to easily organise texts into sets, manage metadata, generate word frequencies, handle XML tags, and split texts in various ways to generate results that can be exported for further analysis in statistical software. It includes some experimental stylometry techniques developed at the CLLC and special features for handling literary texts such as generating word frequencies for speaking characters in plays.
Main Features
See the release notes for a particular version for more detail.
- Import collections of texts as plain texts (.txt), html, xml or tei. (TEI is an XML standard for marking up literary and other texts, including plays and poetry).
- Add metadata to texts and include metadata in results.
- Organise texts into sets for comparison of results.
- Break texts into segments in various ways, such as by blocks of a certain length, by XML tag, or by speaking character in plays.
- Specify words, punctuation or XML to include or exclude from the results.
- Process by graphemes (individual letters or characters), useful for Chinese.
- Process by N-grams.
- Handle homographs.
- Concordance.
- Run built in 'experiments': Jensen-Shannon Divergence, Burrows' Zeta (incorporating Burrows' Iota) and Flesch Kincaid.
System Requirements
IA is a Java application, and so requires the Java Runtime Environment, common on most computers, and easy to download and install from Java.
The core functionality only requires a very basic system with at least 512MB of memory. The software does not require a fast CPU, however, for very large corpora it will be able to provide its results quicker if equipped with a quicker CPU. The software is not currently multithreaded for optimal use of multiple CPUs or CPU cores.
The software itself uses less than 10MB of disk space. You will also require enough disk space to store all texts added to the text repository.