Centre For 21st Century Humanities

Corella, IA 2.0

Release Notes

Intelligent Archive 2.0 Corella is produced by the Centre for Literary and Linguistic Computing (CLLC) at the University of Newcastle Australia. The project is lead by Prof Hugh Craig.It is available from https://www.newcastle.edu.au/research-and-innovation/centre/education-arts/cllc/research

Developers on IA have included:

Russell Whipp
Michael Ralston
Jack Elliot
Bill Pascoe

USAGE

To run simply double click the Intelligent_Archive.jar file. (It is assumed you have Java installed).

WHAT'S NEW IN VERSION 2.0

Intelligent Archive 2.0 Corella has some significant changes, mainly adding new functionality.

The default for counting operations is 1-grams, but other n-grams can now be counted. If a list of words is pasted in then the IA will collect "skip n-grams", i.e. n-grams consisting only of words in the paste-in list and ignoring those in between. N-gram terminals, distance between n-grams and location of n-grams can also now be returned. The "Zeta pairs" experiment has been deleted and Zeta by pairs, i.e. by 2-grams, can now be carried out in the regular Zeta experiment window.
There is now a "Batch Import" option in the Text Sets window so you can import all the component texts in a folder in one operation.
In the output from Frequencies there is now a new tab with segment endings, showing the first four words and the last three words of each segment, so that the segment can be identified within a text by opening the text in a text editor like Word and searching for the opening and closing words of the segment.
A Flesch-Kincaid Experiment has been added to return Reading Ease and Grade Level scores for text sets.
There are now options to obtain output in the form of an Excel file with multiple worksheets and in the form of a csv file.
Dividing texts according to speaking character or divisions marked in the texts is now done separately to "Segmentation method", so that in TEI texts the segmentation chosen through the latter can then be further divided by or character.
There is now a standard window with parts of texts to be ignored. This is populated by default with punctuation marks, but with the option to remove some of these, add other punctuation, add words, etc.
Invalid or unexpected aspects of texts which are included in counts or experiments are now recorded in the log in the command line window which appears in batch mode.
In Frequencies and Experiments pasted-in lists of words can now be hidden, included at the expense of all other words which then remain hidden, or included with all other words disregarded, i.e. not included in the analysis or in the word totals of texts.
Stage directions in TEI texts ( elements) and passages in foreign languages in TEI texts ( elements) are excluded by default but check-boxes now allow the possibility of including the contents of one or other of these in counts.
In the previous release the IA was available with a variant spelling facility (the Galah version) or without (the Budgerigar version). In the current release the IA has the variant spelling functionality greyed out. No further development on the variant spelling functionality is planned. However, it can be invoked though a batch command of the form
java -server -Dallow.variant.spelling=true -jar "./Intelligent_Archive_2.jar"
Note that to use variant spellings requires the variant spelling data in the 'database' folder. It has been excluded from this distribution because it significantly increases the size of the download. The 'database' folder may be copied from a previous installation into the same folder as the new intelligent-archive. Or it may be obtained from
https://downloads.newcastle.edu.au/cllc/ia/IA_Database.zip