Introducing the Polifonia Corpus: explore music concepts and texts from the Polifonia Project with this new web tool
The latest Polifonia tool opens doors of multilingual textual musical heritage resources. Find out what you can do with this tool and how it was developed.
Università di Bologna (UniBo) launches the long–awaited web application Polifonia Corpus, as part of the Polifonia H2020 project. An interactive dashboard has been created to easily access the Polifonia Corpus and carries a user-friendly design based on a music player. The corpus exists of Wikipedia data (all music-related pages), books (e.g. from the Biblioteca Nacional de España), influential music periodicals (e.g. The Musical Times) and the textual sources belonging to Polifonia pilots BELLS, CHILD, MEETUPS, MUSICBO and ORGANS (e.g. the Dutch organ encyclopaedia). The tool will help linguists, scholars and students to access multi-language music related corpora and to investigate them according to new and different criteria.
Challenges in multi-lingual corpus and transcending keyword-based search
The new tool interrogates a collection of Italian, English, French, Spanish, German and Dutch sources. The large modularized corpus contains more than 100 million words for each language. A significant part of the sources of the corpus was only available as images or pdf files and Optical Character Recognition (OCR) to convert them in a processable format. The team from UniBo, consisting of Valentina Presutti, Rocco Tripodi, Arianna Graciotti, Marco Grasso, have been using more Natural Language Processing techniques to process the corpus and produce automatic morphosyntactic, semantic and MH-specific annotations. Further, custom APIs enable domain experts, scholars and music professionals to leverage the annotations produced to perform advanced structured queries on the corpus. The available search capabilities transcend standard keyword-based search, and allow for querying the corpus by using the advanced semantic information.
How to use Polifonia Corpus
To search in this corpus, the user first needs to prepare a few parameters. The typical user, linguists or students in the field, can start by entering a keyword in the “Query” section, which should be a musical concept such as ‘guitar’, ‘opera’, or ‘aria’. In the “Type” section users specify how the tool should search: by keyword, lemma, conceptual or named entities search. Then follows the selection of the “Module” to determine the source collection the tool should dig into (Wikipedia, Books, Periodicals or Pilots). The next section asks for selection of the module’s “Language”. The results that follow are sentences in which the input word is found. These sentences are listed in a Key Word In Context (KWIC) index, a well known practice in linguistic corpora querying. The results are listed in concordance lines, which means that they showcase the textual content following and preceding the concordance line keyword. It is also possible to access the full sentence line and its related source.
Release
The Polifonia Corpus is now live and released through the dedicated Polifonia Corpus GitHub repository and the interactive website. The Corpus, metadata and statistics, along with its annotations and interrogation tools are also part of the Polifonia Ecosystem.