About this corpus
Corpus description
Our corpus currently offers a total of 4303 sonnets in Spanish: 202 from the early-mid 20th century, 2692 from the 19th century, 321 from the 18th century and 1088 from the so-called Spanish Golden Age (15th to 17th centuries). There are a total of 1215 authors (from Spain, Latin America and the Philippines). It intends to provide a wide sample, inspired by distant reading approaches (Moretti, 2005). The raw texts were in most cases extracted from Biblioteca Virtual Miguel de Cervantes (1999), with some 18th-century texts coming from Wikisource. A table in section Data Distribution below summarizes these data.
The corpus is available in plain-text and in TEI formats; XML-TEI P5 was used given this standard’s benefits in terms of reuse, storage, and retrieval. Author metadata were extracted or inferred from unstructured content in the sources (year, place of birth and death, and gender), and placed in the TEIheader, or in a metadata table in the case of the plain-text version. For both TEI and plain-text formats, two versions of the texts are available: one collecting every sonnet per author, the other encoding a single sonnet per file. For corpus preparation, we closely followed the TEI guidelines and RIDE’s criteria for Digital Text Collections (Henny-Krahmer and Neuber, 2017).
Additionally, authors have been assigned VIAF identifiers and described using RDFa attributes. This gives the corpus an entry-point to the Linked Open Data cloud, enhancing its findability. The corpus is available as a GitHub repository and saved in Zenodo, in response to good practices for data use, reuse, and conservation.
Data distribution
Period | Nbr of Sonnets | Nbr of Authors | Tokens | ||
---|---|---|---|---|---|
20th | 202 | 9 | Female | 2 | 22,303 |
Male | 7 | ||||
Asia | 9 | ||||
19th | 2692 | 687 | Female | 48 | 251,975 |
Male | 639 | ||||
America | 334 | ||||
Europe | 348 (+3) | ||||
Asia | 2 | ||||
18th | 321 | 42 | Female | 1 | 29,017 |
Male | 41 | ||||
America | 6 | ||||
Europe | 36 | ||||
15th-17th (Golden Age) |
1088 | 477 | Female | 31 | 99,779 |
Male | 446 | ||||
America | 12 | ||||
Europe | 458 (+7) |
Bibliography
Biblioteca Virtual Miguel de Cervantes (1999): Biblioteca Virtual Miguel de Cervantes http://www.cervantesvirtual.com
Henny-Krahmer, Ulrike, and Frederike Neuber. 2017. “Criteria for Reviewing Digital Text Collections, Version 1.0.” A Review Journal for Digital Editions and Resources, no. 6. https://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0>.
Moretti, Franco. 2005. Graphs, Maps, Trees: Abstract Models for a Literary History. Verso
Cálamo currante
Si escribir te propones un soneto,
ve haciendo lo que yo, que, a fe, no es harto;
tras el verso tercero saldrá el cuarto...
¡Si es coser y cantar! ¡Mira: un cuarteto!
Haz otro igual después, que te prometo
que si aquesto es parir, es fácil parto;
van seis versos, y el séptimo ya ensarto;
otro, y van ocho, y al primer terceto.
Todo es que el verso nono venga al baile
y el décimo en la rueda esté metido.
¿Hay consonante a baile y fraile? Haíle.
Pues entonces, ya es esto pan comido,
y cata a Periquillo hecho fraile,
y cata el sonetejo concluido.
Francisco de Osuna