Tipultech logo

Auditory Scene Analysis

Dawn Senathi-Raja

MR's preserved higher-level temporal processing revealed normal pitch pattern function. Auditory scene analysis requires not only intact pitch pattern processing but also intact pattern organisation. Pattern organisation is facilitated by auditory spatial processing and Gestalt grouping mechanisms. Thus, MR completed tests assessing both aspects.

Spatial Sound Processing

Sound is localised by difference cues in arrival time, known as phase. The inability to detect changes in phase cues is associated with sound movement disorders and pattern organisation deficits (Griffiths et al., 1996). A spatial battery, developed by Griffiths et al. (2001) was administered to assess MR's detection of sound movement. There were three conditions, each involving a tone being: (1) advanced in one ear and retarded at the other (Phase ramp); (2) lateralised to the left or right (Phase difference limen); or (3) sinusoidally modulated in one ear and retarded in the other (Interaural phase modulation).

Results are presented in Table 2.13. MR exhibited impaired performance on all three tasks, indicating a broad deficit of spatio-auditory processing (see Appendix L for MR's psychometric functions).

Table 2.13

Seventy-five Percent Thresholds for Griffiths et al.'s (2001) Spatial Sound Phase Tasks

Psychophysical task MR's threshold Controls threshold (SD)a
Phase rampb 0.06*** -0.57 (0.19)
Phase difference limenb -0.42* -1.09 (0.32)
2 Hz Interaural phase modulationc 0.25*** -0.75 (0.30)

a Data based on 15 controls matched for age to MR from the norms for the Newcastle Auditory Battery (Griffiths et al., 2001).
b Stimuli were based on a tone of 500 Hz. Data are expressed as log transformations of proportional phase change.
c Data are expressed as log transformations of the proportional depth of modulation.
*p < 0.05, ***p < 0.001

Sequential Streaming

The Gestalt grouping rules of proximity, similarity, and good continuation determine the likelihood of sequential streaming. Several of Bregman's (1995) commercially available sequential streaming tasks incorporate these rules into stimuli through the speed of sequences, the amount of frequency separation between tones, the timbral similarity between tones, cycle repetition, and discrete or gliding pitch changes between tones (see Figure 1.11 for an illustration example). Eight of Bregman's tasks were administered to assess MR's ability to stream sequences of sound. As these tasks elicit patterns that normal listeners of all ages are able to hear without special training or conditions of reproduction, control data was not required (Bregman, 1990).

As displayed in Table 2.14, MR did not perceive the automatically elicited sequential streaming effects for any of the tasks. MR's inability to organise patterns of sounds according to Gestalt rules indicates faulty Gestalt grouping mechanisms.

Table 2.14

Perceptions Elicited by Bregman's (1995) Sequential Streaming Tasks

Task Perception normally elicited Perception elicited in MR
1 Ascending tones heard as low stream. Descending tones heard as high stream. Only high tones heard
2 High stream noise pattern heard above low. stream noise pattern. One high noise patterns group and two groups of low noise patterns.
3 Tone patterns in high and low streams. Only one high tone heard, then a series of low tones.
4 A high and low stream form as length of tone sequences increases. Mixture of high and low tones.
5 Four tones heard in a connected glide sequence. Two tones heard in a connected glide sequence.
6 Two streams heard in an unconnected sequence, with two tones in each stream. One continuous tone heard in an unconnected sequence.
7 In a crossing pattern of rising and falling sequences, "upright V" path heard separately from "inverted V" path. Complete rising and falling sequences heard in crossing pattern.
8 As four overlapping tones became more percussive, tone order became easier to judge. Order of four overlapping tones was not easier to judge as tones became more percussive.

Simultaneous Streaming

Simultaneous streaming relies on Gestalt grouping rules of similarity, common fate, and proximity. These rules have been integrated into Bregman's (1995) tasks using sound components with synchronous onsets and offsets, or sound components with small frequency separations between each other but large frequency separations between competing components (see Figure 1.12 for an illustrative example). Four of Bregman's tasks were used to assess MR's perceptual ability to group sounds into a simultaneous stream.

As shown in Table 2.15, MR demonstrated an impaired ability to fuse sounds according to the relevant Gestalt grouping rules for all tasks.

Table 2.15

Perceptions Elicited by Bregman's (1995) Simultaneous Streaming Tasks

Task Perception normally elicited population Perception elicited in MR
1 Large frequency separation between tones A and B prevented sequential streaming from being heard but caused a fused complex tone BC to be heard. Large frequency separation between tones A and B caused a sequential A B stream to be heard separately from low C tones.
2 When there was no frequency separation between tones A and B, a sequential A B stream was heard separately from low C tones. When there was no frequency separation between tones A and B, a fused complex tone BC was heard.
3 Fused complex tone BC was heard in a separate stream from preceding tone A when tones B and C had synchronous. onsets. Tone A was heard with B in a sequential stream when tones B and C had synchronous onsets.
4 Preceding tone A was heard with B in a sequential stream when tones B and C had asynchronous onsets. Fused complex tone BC was heard in a separate stream from preceding tone A when tones B and C had asynchronous onsets.

Environmental Auditory Scene Decomposition

Beckwith's (2003) environmental sound recognition task revealed that MR was unable to identify approximately 25 percent of individual environmental sounds. Given MR's ability for identifying the remaining 75 percent of sounds, an identification task based on groups of those sounds was created to assess whether MR was able to deconstruct an auditory scene containing concurrent environmental sounds. Groups of two, three, four, or five sounds were digitally combined using Protools (2002), with five items per condition (refer to Appendix I). Careful selection of sounds for each item ensured that frequencies of different sounds were discriminative from one another. The onset of individual sounds in each auditory scene was asynchronous in order to resemble a natural environment as closely as possible. After hearing a 30 second auditory scene, participants were required to verbally identify the individual sounds comprising the scene.

The results showed that as the number of sounds comprising an auditory scene increased, MR made progressively and significantly more errors than controls (see Table 2.16), even though these sounds were previously identifiable when presented individually. MR's inability to track sound sequences from one source in the midst of competing sound sources indicates a sequential streaming deficit, thus contributing further evidence of impairment to MR's auditory scene analysis.

Table 2.16

Percentage of Errors for Environmental Auditory Scene Decomposition Task

Category MR's mean errors Controls mean errors Mean (SD)
2 sounds 0.00% 0.00% (0.00%)
3 sounds 13.33%* 4.02% (3.75%)
4 sounds 25.00%** 6.05% (6.51%)
5 sounds 28.00%*** 5.67% (6.14%)

Note. Errors included misidentification as well as non-responses
*p < 0.05, **p < 0.01, ***p < 0.001

<< >>