About this corpus

Corpus description

Our corpus currently offers a total of 4303 sonnets in Spanish: 202 from the early-mid 20th century, 2692 from the 19th century, 321 from the 18th century and 1088 from the so-called Spanish Golden Age (15th to 17th centuries). There are a total of 1215 authors (from Spain, Latin America and the Philippines). It intends to provide a wide sample, inspired by distant reading approaches (Moretti, 2005). The raw texts were in most cases extracted from Biblioteca Virtual Miguel de Cervantes (1999), with some 18th-century texts coming from Wikisource. A table in section Data Distribution below summarizes these data.

The corpus is available in plain-text and in TEI formats; XML-TEI P5 was used given this standard’s benefits in terms of reuse, storage, and retrieval. Author metadata were extracted or inferred from unstructured content in the sources (year, place of birth and death, and gender), and placed in the TEIheader, or in a metadata table in the case of the plain-text version. For both TEI and plain-text formats, two versions of the texts are available: one collecting every sonnet per author, the other encoding a single sonnet per file. For corpus preparation, we closely followed the TEI guidelines and RIDE’s criteria for Digital Text Collections (Henny-Krahmer and Neuber, 2017).

Additionally, authors have been assigned VIAF identifiers and described using RDFa attributes. This gives the corpus an entry-point to the Linked Open Data cloud, enhancing its findability. The corpus is available as a GitHub repository and saved in Zenodo, in response to good practices for data use, reuse, and conservation.

Data distribution

Table 1: Corpus data distribution per period, author gender and primary continent of literary activity
Period Nbr of Sonnets Nbr of Authors Tokens
20th 202 9 Female 2 22,303
Male 7
Asia 9
19th 2692 687 Female 48 251,975
Male 639
America 334
Europe 348 (+3)
Asia 2
18th 321 42 Female 1 29,017
Male 41
America 6
Europe 36
15th-17th
(Golden Age)
1088 477 Female 31 99,779
Male 446
America 12
Europe 458 (+7)

Bibliography

Biblioteca Virtual Miguel de Cervantes (1999): Biblioteca Virtual Miguel de Cervantes http://www.cervantesvirtual.com

Henny-Krahmer, Ulrike, and Frederike Neuber. 2017. “Criteria for Reviewing Digital Text Collections, Version 1.0.” A Review Journal for Digital Editions and Resources, no. 6. https://www.i-d-e.de/publikationen/weitereschriften/criteria-text-collections-version-1-0>.

Moretti, Franco. 2005. Graphs, Maps, Trees: Abstract Models for a Literary History. Verso


Cálamo currante

Si escribir te propones un soneto,
ve haciendo lo que yo, que, a fe, no es harto;
tras el verso tercero saldrá el cuarto...
¡Si es coser y cantar! ¡Mira: un cuarteto!

Haz otro igual después, que te prometo
que si aquesto es parir, es fácil parto;
van seis versos, y el séptimo ya ensarto;
otro, y van ocho, y al primer terceto.

Todo es que el verso nono venga al baile
y el décimo en la rueda esté metido.
¿Hay consonante a baile y fraile? Haíle.

Pues entonces, ya es esto pan comido,
y cata a Periquillo hecho fraile,
y cata el sonetejo concluido.

Francisco de Osuna