Retelling Nineteenth-Century Childhood Through Artificial Intelligence
Joe Nockels | University of Edinburgh, National Library of Scotland
Few historical texts voice the experience of children in their own words. Marjory Fleming’s famous diary (1810-1811), of which three portfolios were presented to the National Library of Scotland (NLS) in 1930, is a rare exception. Fleming, the daughter of an affluent Kirkcaldy accountant, writes in a way that is as changeable as the Kirkcaldy wind, moving swiftly from comments on Old Testament Bible passages to how she was fined part of her allowance for biting her nails. In many cases, the contemporary reader is left with whiplash after excerpts like the following:
Mary Queen of Scots was a prisoner in Lochleven Castle. The Casuwary is a curious bird & so is the Gigantick Crane & the Pelican of the Wilderness whose mouth holds a bucket of fish & water.
Fleming’s writing, even for the least historically interested reader, forms an enjoyable chorus of witticisms and moralisms, while providing a deep insight into the world of an affluent early nineteenth-century child. While we would imagine a diary to be private by default nowadays, Marjory’s personal thoughts are interspersed with schoolbook tasks (all completed under the watchful gaze of Fleming’s tutor and older cousin Isabella Keith). Keith’s corrections can be seen throughout in the form of underscores, as shown in the excerpt below where Marjory struggles with the spelling of Anabaptist, Episcopalian and Presbyterian when describing her own Christian faith.
Subsequently, the diary serves as the authoritative record of the young writer’s life as well as an artefact of nineteenth-century schooling and education.
The collection at the NLS ends with a few pages written not in Fleming’s hand but those of Keith’s and Fleming’s Mother, Isabella Rae. They express the deep grief felt in Marjory’s untimely deathon the 19th of December 1811, from a bout of measles and a later case of meningitis. In their words, she was left powerless to act against ‘… so heavenly mercy’s plan …’, showcasing what the sociologist and historian Viviana Zelizer describes as the sacramentalization of children in the nineteenth century (investing them with religious and sentimental meaning). It appears crass to overly intellectualise the loss of a child, whatever the historical period, but the nineteenth century saw a cultural transformation in the value children played as part of the family, transitioning from objects of utility helping around the home to objects of sentiment as ‘emotionally priceless assets’. A later rise in consolation literature and mourners’ manuals was also seen, aimed at bereaved parents. Likewise, cemeteries began to be adorned with the cherubic figures we see now.
In a similar vein of sentimentality, Fleming became entrenched in high society after her death as a child genius, initially by John Brown’s glowing hagiographic essay of 1863. Describing her as a ‘bright, eager child …’, Brown led other notable writers to express the same affections. Robert Louis Stevenson, Mark Twain and Leslie Stephen all made note of Fleming’s talent. The author’s distant relation to Walter Scott became exaggerated in the process, with Brown suggesting that Scott believed her to be ‘the most extraordinary creature’ he ever met. In reality, the idea of Scott and Fleming ever meeting is dubious. We should not neglect thatFleming’s precociousness and free-thinking offers a key source for those interested in a situational history of early 19th century Scotland. Nevertheless, it is high time that we move beyond these cloying tributes of her genius and return to the words Fleming wrote. This is what the NLS attempted to do in automatically transcribing her diary through artificial intelligence.
Using Handwritten Text Recognition (HTR) on Fleming’s Diary
In the case of historical materials as illuminating as the Fleming diary, careful treatment had to be taken to accurately transcribe their contents. Archives and libraries have increasingly turned to computer-aided processes of transcribing historical handwriting, reducing the need for tedious manual work. This often sparks distrust, with many believing falsehoods that artificial intelligence (AI) will replace humanity. The machines are not so near taking over …Whilst issues of bad algorithms containing problematic and dangerous biases are real, the notion that AI might replace human agency in archives and libraries is a nonstarter for now. The idea of a machine faithfully reading and transcribing a complex set of documents unsupervised is still far out of reach. Once more, those in charge of transcribing archival materials have always turned to technological aids when it suited them, with the palaeographer Turner stressing as early as 1968 that ‘[T]echnical aid must of course be called in without hesitation’, advocating for the use of ultraviolet light to make handwriting more visible. Following this past collaboration between humans and machines, the NLS turned to the software Transkribus to provide aid in transcribing the Fleming diary.
Transkribus is a Handwritten Text Recognition (HTR) platform that aims to broaden access to archival collections for those who cannot read historical handwriting. In short, Transkribus aims to make the past ‘readable’. It is a tool in growing use, with 524 active users daily and over 1.5 million images of documents having been uploaded to its server. Though it may sound more like sci-fi than a workable tool, the software imitates human brain functionality in its attempts to predict language patterns in historical texts. It asks itself prompts such as ‘if word x appears then word y must follow’. From there on in, it begins to understand a certain writer’s style of forming sentences and automatically transcribes pages by itself (helpful if an archive has a collection of thousands of documents). As stressed before, human transcribers are still needed, as the HTR can only grasp the language patterns in a document after being fed around 15,000 words of manually transcribed text (approximately 50 pages of material). Without this, the computer software has nowhere to start.
Scanned images of the Fleming diary were uploaded to the application and through the software’s automatic layout analysis (LA) tool, the handwritten text was recognised as a separate layer and segmented into lines which could then be transcribed in a corresponding window as shown below in figure 3.
The NLS supplied Transkribus with around 11,000 words of text (slightly below the recommended amount due to Fleming’s writing being neat and the diary itself being relatively short in length). With this information, the HTR began to be trained, checking against itself once it was up to speed. After a few hours of training, a model – entitled ‘Early Nineteenth Century (Child)’–was finished and produced a character error rate (CER) (the percentage of characters which the model failed to recognise correctly) of only 1.85% on the training set (shown below as the blue line in figure 4) and 11.26% on the validation set (which the HTR model uses to check itself with each repetition of training shown by the red line). Fleming’s diary is free of illustrations, which often causes problems for computers in document understanding. That said, Transkribus allows pictorial elements, tables and marginalia to be tagged and fed into the algorithm to be deciphered, although this process is not currently automatic. With a limited number of pages, the speed at which Transkribus learnt Fleming’s handwriting was impressive and these error rates would certainly drop further if applied to a larger collection at the NLS. This model was then applied to the remaining 120 pages of the diary, producing a fairly accurate result which could then be corrected by a member of staff, significantly quickening the transcription process. The model is now available for anyone working on similar material.
Applying a New Dataset to the History of Childhood
By working in Transkribus, the NLS produced various versions of the Fleming diary, all of which can be read without palaeographical training. One exciting product was integrating the diary into the ‘read&search’ tool, made possible by Recognition and Enrichment of Archival Documents (READ) who also developed Transkribus. With a full digital transcript of the diary now available, this platform allows users to search for individual keywords within an underlying text file, receiving results as part of a rich web-interface, instead of searching within a limited library catalogue for titles or dates. Figure 5 shows one result for the query ‘Mary’, providing a highlighted area of the document where the word can be seen and the attached transcription.
With the traditional challenge of time-consuming transcription overcome, more scholarly attention can now be devoted to the valuable insights Fleming’s writing holds about her own childhood but also to the social attitudes and institutions she operated in and around. Myall and Morrow remind us, in their book ascertaining the contributions that children played in the war effort, that ‘… children should be regarded as experts in their own lives, in the sense that they can provide unique accounts of their experiences and understandings.' At some stage, in the case of Fleming, historians appear to have forgotten this, remaining content with the mythologised status of her genius by the famous ‘men of letters’ aforementioned.
Age is, much like gender, not a neutral category. It is embedded with cultural assumptions, meanings and values and is always intertwined with race, religion, class, ethnicity, and nationality. While we do not have the space to dissect Fleming’s writings each and every way (this is where there is room for further research) a few key observations can be made. Myall presents the case that children were actors in history and not merely deficient or incomplete adults. Fleming embodies this well, remaining inquisitive about the events, surroundings, and figures in her life.
The young writer was acutely aware of colonial events as more than a ‘pawn of empire’, not only operating within her lane as a privileged Scotbut making inferences about the world. In her diary, we gain a sense of the relationship she held between the material and consciousness. Her descriptions are often underpinned by an object, helping her unravel the physical, observable, world. She mentions that her cousin has a museum full of curiosities and talks often about a monkey the family seem to own. Elsewhere, Fleming complains about the price of pineapples and the expense of going to colonial exhibitions predating those at Crystal Palace and Glasgow. We cannot expect a child to grasp the full oppressive nature of empire behind these objects and events; yet, Fleming’s observations highlight how children of an affluent background carved out spaces for themselves within the global theatre of colonialism.
Another observation pertains to the relationship between Fleming and her tutor Isabella, which is markedly different to the traditional one-way teacher-student instruction. This closeness is partially explained by their being cousins; still, both contribute to a close mutual connection. In one case, Marjory writes, in her humorous style:
Here lies sweet Isabella in bed
With a nightcap on her head
Her skin is soft her face is fair
And she has very pretty hair …
It appears that Isabella instructed her younger cousin to explore her environment as a means of intellectual, physical, social, and emotional development, preceding similar efforts in working-class areas by educational reformers such as Friedrich Froebel (1782-1852). The retreats of Ravelston and Braehead provided the ability to ‘saunter’, as Marjory put it, in the nearby woods, her rural surroundings offering relaxation whilst instituting a sense of social order in equal measure.
By looking at the Fleming diary, now made accessible through AI transcription and tools like Transkribus, we can apply ideas taken from the history of childhood, unearthing details of the experience of Fleming as an inquisitive individual not as a glorified figure dreamt up by male writers. HTR holds the same promise for an incalculable number of historical texts, making the past more readable and offering new insights into those whose voices have been forgotten and co-opted.
The Fleming transcription can now be found on the NLS Data Foundry: https://data.nls.uk/data/digitised-collections/marjory-fleming/
The beta version of READ’s read&search tool can be found here: https://transkribus.eu/r/marjory-fleming/#/
Joe Nockels is a PhD researcher based at the University of Edinburgh and National Library of Scotland, funded by the Scottish Graduate School for Arts and Humanities (SGSAH). His work explores how best to adopt computer automated and artificial intelligence systems at the National Library of Scotland (NLS). A heavy library user himself, Joe has a background in religious and gender history and is driven by the clear value in making archival collections accessible to user communities.To make any potential conflicts clear, he is also the principal researcher for READ (Recognition and Enrichment of Archival Documents), the body behind the tool Transkribus introduced in this piece.