#Milestone: Chord Corpus (ChoCo) released and publication in Scientific Data
ChoCo, the Chord Corpus, is out! After months of work the team is proud to release a 20K+ timed harmonic annotations of scores and tracks. Its release comes with a long awaited publication in the journal ‘Scientific Data’.
ChoCo, the Chord Corpus, is out! After months of work the team is proud to release a 20K+ timed harmonic annotations of scores and tracks. Its release comes with a long awaited publication in the journal ‘Scientific Data’.
Disconnected data: the struggle with music analysis
Various disconnected chord datasets are currently available for music analysis and information retrieval, but they are often limited by either their size, non-openness, lack of timed information, and interoperability. Together with the lack of overlapping repertoire coverage, this limits cross-corpus studies on harmony over time and across genres, and hampers research in computational music analysis, such as chord recognition, pattern mining, computational creativity. Polifonia set out to solve this problem!
The answer: ChoCo
ChoCo is the largest dataset and knowledge graph that semantically integrates harmonic data from 18 different sources using heterogeneous representations and formats (Harte, Leadsheet, Roman numerals, ABC, etc.). Polifonia relies on a data transformation workflow to integrate MIR datasets via the JAMifier (metadata and format interoperability) and the Chonverter (notational interoperability by Harte). Finally, jams2rdf uses the JAMS Ontology to generate a knowledge graph via SPARQL-Anything.
Article in Nature
The research is conducted by Jacopo de Berardinis & Albert Meroño-Peñuela from the King’s College London and Andrea Poltronieri & Valentina Presutti from the University of Bologna, with help in evaluation sessions from other Polifonia team members. With being under review over the summer, the publication of ChoCo: a Chord Corpus and a Data Transformation Workflow for Musical Harmony Knowledge Graphs followed in late September in Scientific Data (vol 10). Scientific Data is a journal part of the Nature Portfolio, and is a peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data
What’s next!?
The team is extending and linking ChoCo to other Web resources, and exploring methods for computational creativity. For example: Harmory – a knowledge graph of interconnected harmonic patterns.
Relevant links
- Article in Scientific Data [Nature] https://www.nature.com/articles/s41597-023-02410-w
- Github https://github.com/smashub/choco
- Latest release (v1.0) https://github.com/smashub/choco/releases/tag/v1.0.0
- SPARQL Endpoint https://polifonia.disi.unibo.it/choco/query
- The 18 included datasets:Isophonics, JAAH, Schubert-Winterreise, Billboard, Chordify, Robbie Williams, The Real Book, Uspop 2002, RWC-Pop, Weimar Jazz Database, Wikifonia, iReal Pro, Band-in-a-Box, When in Rome, Rock Corpus, Mozart Piano Sonata, Jazz Corpus, Nottingham.