Introducing the Polifonia Corpus: explore music concepts and texts from the Polifonia Project with this new web tool

The latest Polifonia tool opens doors of multilingual textual musical heritage resources. Find out what you can do with this tool and how it was developed.

24 February 2023

Università di Bologna (UniBo) launches the long–awaited web application Polifonia Corpus, as part of the Polifonia H2020 project. An interactive dashboard has been created to easily access the Polifonia Corpus and carries a user-friendly design based on a music player. The corpus exists of Wikipedia data (all music-related pages), books (e.g. from the Biblioteca Nacional de España), influential music periodicals (e.g. The Musical Times) and the textual sources belonging to Polifonia pilots BELLS, CHILD, MEETUPS, MUSICBO and ORGANS (e.g. the Dutch organ encyclopaedia). The tool will help linguists, scholars and students to access multi-language music related corpora and to investigate them according to new and different criteria. 

Challenges in multi-lingual corpus and transcending keyword-based search
The new tool interrogates a collection of Italian, English, French, Spanish, German and Dutch sources. The large modularized corpus contains more than 100 million words for each language. A significant part of the sources of the corpus was only available as images or pdf files and Optical Character Recognition (OCR) to convert them in a processable format. The team from UniBo, consisting of Valentina Presutti, Rocco Tripodi, Arianna Graciotti, Marco Grasso, have been using more Natural Language Processing techniques to process the corpus and produce automatic morphosyntactic, semantic and MH-specific annotations. Further, custom APIs enable domain experts, scholars and music professionals to leverage the annotations produced to perform advanced structured queries on the corpus. The available search capabilities transcend standard keyword-based search, and allow for querying the corpus by using the advanced semantic information.

How to use Polifonia Corpus
To search in this corpus, the user first needs to prepare a few parameters. The typical user, linguists or students in the field, can start by entering a keyword in the “Query” section, which should be a musical concept such as ‘guitar’, ‘opera’, or ‘aria’. In the “Type” section users specify how the tool should search: by keyword, lemma, conceptual or named entities search. Then follows the selection of the “Module” to determine the source collection the tool should dig into (Wikipedia, Books, Periodicals or Pilots). The next section asks for selection of the module’s “Language”. The results that follow are sentences in which the input word is found. These sentences are listed in a Key Word In Context (KWIC) index, a well known practice in linguistic corpora querying. The results are listed in concordance lines, which means that they showcase the textual content following and preceding the concordance line keyword. It is also possible to access the full sentence line and its related source.

Release
The Polifonia Corpus is now live and released through the dedicated Polifonia Corpus GitHub repository and the interactive website. The Corpus, metadata and statistics, along with its annotations and interrogation tools are also part of the Polifonia Ecosystem.

Recent News

Polifonia is known for its strong links with academia and is pleased to present some highlights in its involvement in research and associated conferences.

Polifonia is known for its strong links with academia and is pleased to present some highlights in its…

29 February 2024

In 2024, Paul Mulholland, Naomi Barker and Paul Warren (The Open University, U.K) are continuing their experiment investigating how different kinds of music influence the appreciation of an artwork; and to what extent the same kind of sense-making processes are used when viewing artwork and when listening to music. To do this, the researchers are looking for more participants. They have now automated the process so that participants can complete the experiment online without the involvement of an experimenter.

Music instrument with music notes on white background illustration In 2024, Paul Mulholland, Naomi…

17 January 2024

During the last project meeting, the Polifonia consortium extensively discussed how to foster the impact of the project in academia and beyond. How to make the output of Polifonia sustainable after the lifetime of the project is one important aspect. But fostering re-usability does not end by long-term preservation of certain assets (such as data and tools). In Polifonia Research Ecosystem – Impact of a project. A webinar on Data re-use and workflows, we will discuss how we ensure that more fluid assets such as interfaces, but also experiences in setting up and executing workflows via those interfaces, become reproducible and reuseable.

During the last project meeting, the Polifonia consortium extensively discussed how to foster the impact…

15 January 2024

For the Polifonia project, the Central Institute for Cataloging and Documentation (ICCD) of the Italian Ministry of Culture is carrying out activities on the historical bell heritage. The ICCD has also initiated a process of documentation of the practices and knowledge associated with bell production through collaboration with historical Italian foundries.

The bell casting process performed by the Pontifical Marinelli Foundry. Photo courtesy of ICC For…

9 January 2024

One of the tools Polifonia will release is MELODY. It stands for ‘Make mE a Linked Open Data StorY’ and is a place where you can make sense of Linked Open Data and publish text-based as well as visual data stories. Earlier this year, students of the University of Bologna explored data through this tool. Let’s see what they have found and learned about… rock music.

One of the tools Polifonia will release is MELODY. It stands for 'Make mE a Linked Open Data StorY'…

13 December 2023

Music libraries currently lack well-founded information retrieval tools. While it is relatively easy to find music based on metadata, content-based music retrieval still remains as a challenge. The Polifonia FACETS pilot aims to tackle this challenge by building a faceted search engine (FSE) for large collections of music documents.

Music libraries currently lack well-founded information retrieval tools. While it is relatively easy…

24 November 2023

This is a week of major importance to the Polifonia team, as its researchers join both the conference of the International Society for Music Information Retrieval (ISMIR) and the conference for the International Semantic Web and Linked Data Community (ISWC): venues of significant importance for both research and industry. Read more about Polifonia’s contributions below.

This is a week of major importance to the Polifonia team, as its researchers join the conference of…

7 November 2023

On Oct, 13 an explorative workshop took place in a school in Milton Keynes (UK) as part of the Polifonia project. The “Music Meets Machines workshop” gave a look into cutting-edge technologies used to represent music history.

On Oct, 13 an explorative workshop took place in a school in Milton Keynes (UK) as part of the Polifonia…

3 November 2023

Between 16 and 20 October, the Polifonia consortium met in the Italian city of Bologna, home of the project coordinator University of Bologna (UNIBO). During an intensive week, the project team took steps in the development of the pilots, including the long-awaited web portal. Read more about the 7th project meeting here.

Between 16 and 20 October, the Polifonia consortium met in the Italian city of Bologna, home of the…

27 October 2023

The sensory journey “Data Wanderings” is a new project of Polifonia. The art installation will open on Friday, Oct. 13, in Bologna, Italy and you can visit it until the 28th of the month.

The sensory journey "Data Wanderings" is a new project of Polifonia. The art installation will open…

12 October 2023

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement N. 101004746