Data Mining Large Musical Data Bases

David Huron
Cognitive and Systematic Musicology Laboratory
School of Music
Ohio State University

Abstract

The availability of large symbolic musical databases has provided unprecedented opportunities for music-related research. Some of these databases have been created by scholarly organizations such as the Center for Computer Assisted Research in the Humanities and the Repertoire International des Sources Musicales (RISM). Most symbolic databases have been assembled by amateurs using MIDI. More recently, proprietary databases have been assembled by "dot-com" companies collecting user-response data from the Internet.

Such databases greatly facilitate studies in musical stylistics, taste, musical similarity, performance analysis, and even forensic musicology. The databases are used in innumerable ways from "name-that-tune" searching, to automated music summarization, and for training neural networks.

While these databases offer important opportunities, they also raise difficult methodological challenges. Available materials exhibit widely differing quality, so estimating error rates is essential. The variety of data formats means that missing or incompatible information is commonplace. Non-random (opportunistic) samples present onerous statistical problems for researchers.

The opportunities and problems afforded by musical databases are discussed and illustrated via several contrasting applications, including the use of such data to generate geographical maps of musical cultures.

References

Aarden, B. & Huron, D. (in press). "Analyzing geographical aspects of music." Computing in Musicology.

Darlington, R. B. (1990). Regression and Linear Models. New York: McGraw-Hill, see Chapter 11: "Multiple Tests," especially "Why Not Correct for the Whole History of Science?" pp.264-265.

Huron, D. (1988). "Error categories, detection and reduction in a musical database." Computers and the Humanities, Vol. 22, No. 4, pp. 253-264. Abstract

Huron, D. (1992). Design principles in computer-based music representation. In: A. Marsden & A. Pople (editors), Computer Representations and Models in Music, London: Academic Press, pp. 5-39. Abstract

Huron, D. (1997). Humdrum and Kern: Selective feature encoding. In: E. Selfridge-Field (editor), Beyond MIDI: The Handbook of Musical Codes, Cambridge, Massachusetts: MIT Press, pp.375-401. Further information; including reviews.

Huron, D. (2000). "Perceptual and cognitive applications in music information retrieval." Presentation at the International Symposium on Music Information Retrieval, Plymouth, Massachusetts. Abstract. Presentation.

Huron, D. (MS). "The New Empiricism: Systematic musicology in a Postmodern age." 1999 Ernest Bloch Lecture, University of California, Berkeley. Text.

von Hippel, P. & Huron, D. (2000). "Why do skips precede reversals? The effect of tessitura on melodic structure." Music Perception, Vol. 18, No.1, pp. 59-85. Abstract

Mani, I. & Maybury, M.T. (eds.) (1999). Advances in Automatic Text Summarization. Cambridge, Massachusetts: MIT Press.

Orpen, K. & Huron, D. (1992). "The measurement of similarity in music: A quantitative approach for non-parametric representations." Computers in Music Research, Vol. 4, pp. 1-44. Text

Schaffrath, H. (1995). The Essen Folksong Collection. D. Huron (ed.). [Four computer disks containing 6,255 folksong transcriptions and 34-page research guide.] Stanford, CA: Center for Computer Assisted Research in the Humanities, 1995.

Selfridge-Field, E. (ed.) (1997). Beyond MIDI: The Handbook of Musical Codes, Cambridge, Massachusetts: MIT Press. Further information; including reviews.

Watt, H.J. (1924). Functions of the size of interval in the songs of Schubert and of the Chippewa and Sioux Indians. British Journal of Psychology, Vol. 14, pp. 370-386.


Talk Presentation
Return to David Huron's Home Page
Return to List of Conference Presentations