Victorian Periodical Text Collection
Right click and 'Save as' to download:
- Saturday Review Corpus, 2541Kb zip file.
- Victorian Periodicals Corpus, 6822Kb zip file.
- Saturday Review Articles List, Excel file.
- Victorian Periodicals List, Excel file.
The initial collection was assembled as the working corpus for a research higher degree entitled Anonymity, Individuality and Commonality in Writing in British Periodicals between 1830 and 1890: A Computational Stylistics Approach.
Nova: The University of Newcastle's Digital Repository. Identifier: uon:6075
It was intended that there should be a sufficient number of articles to provide a good representation of the repertoire of discursive prose as it stood at the time. The 200 texts were all published in periodical journals during the sixty year period from 1829, when the three major quarterlies were dominating the scene, through the 50s 60s and 70s, when the monthlies came into their own and challenged the quarterlies for reader loyalty, through to 1890, after which both began to decline in popularity. Though most of the articles were anonymous at the time of publication, they all appear to have been reliably attributed thanks largely to the invaluable work of the Wellesley Index.
These initial articles were taken from five quarterlies (99 articles) and six monthlies (101 articles) (table 1 below) and comprised just under two million words.
|The Edinburgh Review||Blackwood's Edinburgh Magazine|
|The Quarterly Review||Cornhill Magazine|
|The Westminster Review||The Fortnightly Review (which became monthly)|
|Bentley's Quarterly Review||Fraser's Magazine for Town and Country|
|The National Review||Macmillan's Magazine|
|Tait's Edinburgh Magazine|
Eight women and fourteen men wrote the articles (table 2 below), the gender imbalance reflecting the fact that many more men than women were writing for the journals. Each author is represented by at least five texts and up to as many as fourteen. The authors represent a good spectrum of the variety of writers contributing to the journals at the time: from those who considered themselves primarily as journalists; to those who contributed articles as a side line; from those who wrote from economic necessity; to those who combined journalism with other forms of writing.
|Walter Bagehot (1826-1877)||Frances Power Cobbe (1822-1904)|
|John Stuart Blackie (1809-1895)||George Eliot (1819-1880)|
|John Hill Burton (1809-1881)||Christian Johnstone (1781-1857)|
|Thomas Carlyle (1795-1881)||Eliza Lynn Linton (1822-1898)|
|Lord Robert Cecil (1830-1903)||Harriet Martineau (1802-1876)|
|John Wilson Croker (1780-1857)||Anne Mozley (1809-1891)|
|James Anthony Froude (1819-1894)||Maragaret Oliphant (1828-1897)|
|William Rathbone Greg (1809-1881)||Elizabeth Lady Eastlake née Rigby (1809-1893)|
|Abraham Hayward (1801-1884)|
|Thomas Henry Huxley (1825-1895)|
|Charles Kingsley (1819-1875)|
|George Henry Lewes (1817-1878)|
|Thomas Babington Macaulay (1800-1859)|
|Sir Leslie Stephen (1832-1904)|
In the years following the completion of the research higher degree a number of additional texts were added to the collection for various reasons: (i) in order to pursue a particular research enquiry; (ii) because of intrinsic interest; or simply (iii) because they were available in electronic form. The corpus itself was used as one of two large corpora for testing the relative merits of different size word n-grams in authorship attribution.
Antonia, Alexis, Hugh Craig and Jack Elliott. "Language chunking, data sparseness, and the value of a long marker list: explorations with word n-grams and authorial attribution". Literary and Linguistic Computing, 2013.
Saturday Review Collection
The Saturday Review collection was initially compiled for the specific purpose of carrying out a number of attribution tests on various anonymous articles. In particular, we were interested in shedding light on the long-standing question of who wrote the rather spiteful and condescending 'Women's Movement' articles in the Saturday in the 1850s?
"Who Wrote the Women's Movement Articles in The Saturday Review?" Nineteenth-Century Gender Studies (2008)
A second line of enquiry looked at the 'Modern Women' series of articles (1866-68), demonstrating the ease with which the methods of computational stylistics could separate similar-seeming articles written by different authors.
A third line of enquiry attempted to describe the distinctiveness of the 'house style' of the Saturday by comparing the articles of 6 authors who wrote both for the Saturday and for other journals.
Craig, Hugh and Alexis Antonia. "Six Authors and the Saturday Review: A Quantitative Approach to Style". Victorian Periodicals Review, 48, 1, 2015.
Tait's Edinburgh Magazine additional texts
Following correspondence with Eileen Curran concerning her doubts about some of the Wellesley attributions in Tait's Edinburgh Magazine for two Scottish authors (John Stuart Blackie and John Hill Burton) additional texts were prepared to test some of the uncertain attribution texts against some well attributed ones.
Antonia, Alexis and Ellen Jordan. "Checking some Wellesley Index Attributions by Empirical 'Internal Evidence': The Case of Blackie and Burton." Authorship, 1.1, Fall 2011.
Christian Remembrancer Collection
A number of Christian Remembrancer texts were prepared for a series of investigations seeking to identify the contributions of Anne Mozley to the journal.
Ellen Jordan, Hugh Craig, and Alexis Antonia. "The Brontë Sisters and the Christian Remembrancer: A Pilot Study in the Use of the "Burrows Method" to Identify the Authorship of Unsigned Articles in the Nineteenth Century Periodical Press." Victorian Periodicals Review. 2006.
Antonia, Alexis and Ellen Jordan. "Identifying Anne Mozley's Contributions to the Christian Remembrancer: A Computational Stylistic Approach". Victorian Literature and Culture, 42, 2, 2014.
Acquisition of Electronic Texts
A variety of methods was used to obtain the electronic texts of the collection. Most of the texts were transcribed onto the computer from a photo-image or a microfilm copy of the journal article. Some articles were sourced from public domain electronic texts available in online collections: the National Library of Australia's online ProQuest British Periodicals Collection; the Oxford Internet Library of Early Journals both of which provided photo images of texts for either transcribing or scanning; and the Gutenberg site which allowed the downloading of texts in editable form. Other articles were sourced from microfilm copies of the Journals. Where published editions of periodical articles existed in authorial collections of writings these were photocopied and scanned or transcribed.
Editing of the Electronic Texts in Preparation for Computational Stylistic Research Work
Good electronic text preparation is vital to the success of any computational stylistics project and must be done with thoroughness and exactitude. E-texts need to be proof-read since both key-boarding and OCR scanning can produce unexpected errors. The next step is to prepare the electronic texts for counting. Various protocols have been adopted to ensure that when the counting took place, the machine was able to count only what it was supposed to count. My practice for ensuring consistency throughout the periodical collection was to use the angled bracket notation of the Text Encoded Initiative (TEI) protocol for all exclusions and changes (listed below) so that these would remain obvious and recoverable.
Exclusions and Changes
- All extraneous material (page numbers, titles, chapter headings, greetings...) was enclosed in angled brackets and thereby excluded from the count.
- Text included in the article which does not belong to the author was identified and removed from the count with TEI markers <quotation> </quotation> used to mark the location of the removed text.
- Foreign phrases which were longer than a single word or phrase and which were not part of the syntax of the sentence were identified and removed from the count with TEI notation marking the location: <foreign lang="latin"> </foreign>
- Words which are used by some authors as a single compound and by others as two separate words were identified and united suing TEI format: for example, <reg orig="can not">cannot</reg>. Similary, the various compounds of any, some, every, and no were united with one, thing, how, where.
- Negative forms such as don't and can't (which are not common in Victorian writing) were left untouched.
- Occasionally an article included tables of statistics and so on. These were generally omitted and the location marked <table> </table> . Portions of text where an author assumes a persona for illustrative or dramatic purposes, or where he or she feels obliged to use inverted commas to signal an adoption for the moment of a special way of phrasing something, were identified but left in the count. Some authors use such personas quite often, while other authors never do so.