Computerized Corpus of Spoken Galician
The Computerized Corpus of Spoken Galician project (Corpus Oral Informatizado da Lingua Galega or CORILGA) aims to create and put on-line a spoken corpus with orthographical and phonetic transcriptions, aligned with voice files and annotated on several levels: phonetic, grammatical, textual and thematic.
The corpus will include recordings of the spoken language in a range of styles and registers (rural language, urban language, formal language, language of the media), spoken by men and women over several generations, and even from different time periods (from the 1960s to the present time) and a variety of discourse genres (conversation, monologues, interviews, lectures and so on).
The purpose of this corpus will be to provide material for language studies from a variety of disciplines ranging from phonetics and grammar to discourse analysis, and for the study of linguistic variation in Galicia, as well as of language change within contemporary Galician given that the recordings cover a time depth going back as far as the 1960s.