Date of Graduation


Document Type


Degree Name

Bachelor of Science

Degree Level



Computer Science and Computer Engineering


Patitz, Matthew

Committee Member/Reader

Raich, Andrew

Committee Member/Second Reader

Gauch, Susan


Sounds with a high level of stationarity, also known as sound textures, have perceptually relevant features which can be captured by stimulus-computable models. This makes texture-like sounds, such as those made by rain, wind, and fire, an appealing test case for understanding the underlying mechanisms of auditory recognition. Previous auditory texture models typically measured statistics from auditory filter bank representations, and the statistics they used were somewhat ad-hoc, hand-engineered through a process of trial and error. Here, we investigate whether a better auditory texture representation can be obtained via contrastive learning, taking advantage of the stationarity of auditory textures to train a network to learn an embedding such that multiple glimpses of the same texture are close together while different textures are far apart. We use a large dataset of stationary sounds to train a neural network based on the human auditory system in a self-supervised way. Textures are synthesized from the representations in the model to evaluate how well the representations can match the key statistics present in auditory textures.


deep learning, contrastive learning, unsupervised training, auditory science, audition