DIGITAL LIBRARY
USING NLP TO CREATE CORPUS-BASED VOCABULARY EXERCISES IN LATIN CLASSES
Humboldt-Universität zu Berlin (GERMANY)
About this paper:
Appears in: INTED2020 Proceedings
Publication year: 2020
Pages: 1750-1757
ISBN: 978-84-09-17939-8
ISSN: 2340-1079
doi: 10.21125/inted.2020.0562
Conference name: 14th International Technology, Education and Development Conference
Dates: 2-4 March, 2020
Location: Valencia, Spain
Abstract:
Learning a historical language is in itself different from learning a modern language in view of emphasizing the work on texts instead of everyday communication. Therefore, not only the expectations and motivation differ, but also the teaching methodology. Whereas learners of modern languages focus on language production, learners of Latin read or translate their texts. Because of the overall low frequency of occurrence of a Latin word or a phrase in this kind of learning environment, most students are often unfamiliar with a given word and therefore finally unable to translate the texts. To approach this underlying problem of Latin classes (in German high schools) we are working together in an interdisciplinary research group of corpus linguistics, Latin pedagogy and computer science at the Humboldt-Universität zu Berlin. In our research project we try to figure out:
- whether corpus-based methods are more supportive in Latin vocabulary acquisition than other methods used in teaching languages and
- how corpus-based tasks might be (analogically and digitally) implemented in class.

Consequently, we adapt the methodology of data-driven (language) learning as an educational innovation for Latin classes. In this context, we reuse various tools from the Classics and the natural language processing community for the development of our corpus-based software. Simultaneously, we carried out different intervention studies using a design-based research approach. In these studies, we gained some interesting insights, e.g. that the majority of students fail to lemmatize words correctly. Likewise, we tested the user experience of our software receiving feedback from experts (teachers, students). Finally, we used both kinds of results to improve the software constantly, e.g. readjusting the type of exercises or the order of tasks in our so-called vocabulary unit. Apart from our exercise generator, we also built a searchable database and provided additional information – like measuring the text complexity or the matching percentage of words between a text and a predefined core vocabulary – for curating the exercises so that they are findable and reusable at a later point in time.

Our research showed that the causes of many errors are the same as in learning a modern language: Students overgeneralise, suffer from cross-lingual interferences concerning both sound and structure, and focus more on a supposed meaning than on a word form. What is more, they usually do not have enough (language) knowledge, experience, or strategies to answer a task. Therefore, students do appreciate a vocabulary-focused learning environment, if it is diverse and helpful for solving the tasks in Latin classes, e.g. inferring the meaning of a word from its context. In general, students store information in their mental lexicon by decoding written Latin input, but, in contrast to modern languages, they do not use it for production. Thus, unless students feel the need to know a word (e.g. in reading comprehension tasks), they learn the words just for tests and forget them afterwards. Besides, incidental vocabulary learning rarely works out in Latin classes, because original Latin texts contain too many unknown words. Therefore, we need to support teachers with corpus-based exercises that are adaptable to the learned vocabulary, to the text complexity, and to the previous reading experience.
Keywords:
Data-driven learning, corpus-based exercises, Latin vocabulary acquisition, NLP, adaptive learning.