15 Jul 2021

Two new Data Papers in the Journal of Open Humanities Data

Two new data papers on 'Music Sentiment' and 'Game Walkthroughs' are now available in the Journal of Open Humanities Data.

We are happy to introduce two new data sets on ‘Music Sentiment’ and ‘Game Walkthroughs’. Both data sets are described in an accompanying data paper (see below) and are available as CC BY 4.0 from Zenodo and Kaggle.

1. MuSe: The Musical Sentiment Dataset

Abstract

The MuSe (Music Sentiment) dataset contains sentiment information for 90,001 songs. We computed scores for the affective dimensions of valence, dominance, and arousal, based on the user-generated tags that are available for each song via Last.fm. In addition, we provide artist, title and genre metadata, and a MusicBrainz ID and a Spotify ID, which allow researchers to extend the dataset with further metadata.

Akiki, C. and Burghardt, M., 2021. MuSe: The Musical Sentiment Dataset. Journal of Open Humanities Data, 7, p.10. DOI: http://doi.org/10.5334/johd.33

2. The Game Walkthrough Corpus (GWTC) – A Resource for the Analysis of Textual Game Descriptions

Abstract

We present the Game Walkthrough Corpus (GWTC), which contains 12,295 unique walkthrough documents covering 6,117 games. For each game walkthrough, we provide frequencies of unigrams and bigrams, treating the walkthrough document as a Bag of Words. In addition, we provide word frequencies at the sentence level. Furthermore, the GWTC contains a number of game-related metadata, including title, publisher, developer, year, and genre. All the language statistics and metadata are stored in separate plain text files and can be referenced through uniform resource names (URN). These URNs can also be used to derive any combination of statistics and metadata. Researchers, for instance, can investigate the most frequent unigrams for games in the “Adventure” genre. This way, the GWTC can be reused for different kinds of research questions on gaming language.

Burghardt, M. and Tiepmar, J., 2021. The Game Walkthrough Corpus (GWTC) – A Resource for the Analysis of Textual Game Descriptions. Journal of Open Humanities Data, 7, p.14. DOI: http://doi.org/10.5334/johd.34