Exploration and discovery in large collections of music scores
Music libraries currently lacks well-founded information retrieval tools. While it is relatively easy to find music based on metadata, content-based music retrieval still remains as a challenge. The Facets pilot aims to tackle this challenge by building a faceted search engine (FSE) for large collections of music documents. It will support exploring and discovering documents of interests based on a combination of musical features (such as melodic, harmonic or rhythmic patterns), metadata (authors, form, instrumentation), and all kinds of annotations, all leveraged at the collection level. Data ingested in the search engine directly from the knowledge graph of Polifonia, and search results will be compliant with the principles of Polifonia knowledge representation.
Search engines are essential tools for information retrieval in large digital libraries. They supply ranked lists of relevant content to users’ searches, helping them locating the documents meeting their needs. While there exist tools for searching in large digital music collections, they are mostly based on the metadata of the musical documents, and content-based functionalities are severely limited. The Facets pilot aims to offer a full-featured solution to this problem, by building a search engine that will combine search and exploration methods combining metadata, music content and annotations.
Facets will supply a self-contained component apt at being integrated in the Polifonia ecosystem. It will be able to ingest collections of music documents issued from the knowledge graph in order to build indexes and search-oriented structures. Search functionalities will include pattern-based searches, identification of relevant relationships, and ranking methods.
Results will be presented in a form that allows further exploration and navigation. In particular they will be faceted in order to allow a refinement of user searches according to some dimension in the knowledge space. A pattern-based search for instance can result in a large set of documents from many styles, periods, authors, all of which constitute facets that can be used to further explore the result set.
Facets can also be seen as a knowledge producer (e.g., all the music scores that feature a similar pattern and share other descriptive features). Results will therefore be exported in a knowledge-based compliant format, and annotated with permanent identifiers for an easy integration in the Polifonia graph. This can serve purposes such as those expressed in some stories, related to the need for preserving the result of exploration efforts (see for instance the story of Sethus on the creation of persistent corpus, relevant to some musicological study, and built from searches and navigation in music collections).