Date of Graduation

12-2021

Document Type

Thesis

Degree Name

Bachelor of Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Patitz, Matthew

Committee Member/Reader

Raich, Andrew

Committee Member/Second Reader

Gauch, Susan

Abstract

Sounds with a high level of stationarity, also known as sound textures, have perceptually relevant features which can be captured by stimulus-computable models. This makes texture-like sounds, such as those made by rain, wind, and fire, an appealing test case for understanding the underlying mechanisms of auditory recognition. Previous auditory texture models typically measured statistics from auditory filter bank representations, and the statistics they used were somewhat ad-hoc, hand-engineered through a process of trial and error. Here, we investigate whether a better auditory texture representation can be obtained via contrastive learning, taking advantage of the stationarity of auditory textures to train a network to learn an embedding such that multiple glimpses of the same texture are close together while different textures are far apart. We use a large dataset of stationary sounds to train a neural network based on the human auditory system in a self-supervised way. Textures are synthesized from the representations in the model to evaluate how well the representations can match the key statistics present in auditory textures.

Keywords

deep learning, contrastive learning, unsupervised training, auditory science, audition

Share

COinS