Computational protocol: Functional and Quantitative MRI Mapping of Somatomotor Representations of Human Supralaryngeal Vocal Tract

Similar protocols

Protocol publication

[…] Participants were 10 healthy adults (mean age = 34.2 years; Standard Deviation [SD] = 10.77; age range = 22–57; 4 males, 6 females). Participants were recruited from the School of Psychological Sciences, Birkbeck College and the School of Speech, Hearing and Phonetic Sciences, University College London. Handedness was recorded by self-report, with 9 participants right-handed and one left-handed (The left-handed participant was included since we had no reason to assume that elementary vocal tract representations within M-I and S-I would differ as a function of handedness—indeed, the results for this subject supported this conclusion). Overall, 7 participants had phonetic training (mean = 6.43; SD = 5.53 years of training). The remaining participants had extensive experience with language research and had practiced producing the stimuli in advance of the experiment. Seven participants were native English speakers and spoke with a variety of dialects (British, North American, Irish); the remaining 3 spoke English to native level proficiency, with British or American dialects. All but one participant had learned multiple languages other than English. Factors such as language history can affect sensorimotor control in the context of articulation, and so it is important to control for and minimize such effects. We sought to ensure consistency in articulatory performance by testing subjects with training in phonetics and/or extensive practice in producing controlled oral motor movements. We expand on this point. In the present study, the units of articulation were isolated voiceless stops or fricatives, and were drawn from a number of different language families. Two of the phones in the main experiment (retroflex - /ʈə/, palatal /cə/) were non-native to several of our subjects. The training in phonetics our subjects had (in addition to all subjects practising the phones before scanning) helped to ensure that the impact of language background was minimized as much as possible during articulation. For example, S6 (Fig. f) is a native English speaker and an experienced scientist without training in phonetics who was well-practiced at the tasks, while S10 (Fig. j) is also a native English speaker and a lecturer in phonetics; comparison of the maps for these subjects suggests very close correspondence. Similarly, S5 (Fig. e) is a native English speaker with training in phonetics, while S8 (Fig. h) is a non-native English speaker and a lecturer in phonetics; again, close correspondence of maps is observed between these 2 subjects. Our cohort also varied in age range: 3 subjects in the current cohort were over the age of 40 years at the time of the study, and 1 was at the upper age in the range (57 years old). Two of these 3 (including the eldest subject) have undergone audiometric screening since the study and had pure tone thresholds in the normal range. All of our subjects have experience of working within speech and language research; none reported any hearing difficulties (e.g., tinnitus and hearing loss) nor related speech or language issues (e.g., difficulty perceiving speech in noise). The study received ethical approval from the local ethics committee, and participants provided voluntary informed consent prior to commencing.Stimuli: In the main fMRI experiment, participants were auditorily prompted to produce trains of the following voiceless stops (plus neutral schwa vowel) that varied systematically in their place of articulation: bilabial (/pə/); alveolar (/tə/); retroflex (/ʈə/); palatal (/cə/); velar (/kə/); glottal (/ʔə/). (See Fig. and AudioFile1). Bilabial stops (initial consonant in English “pea”) involve rapid opening of the lips in tandem with a small downward movement of the lower jaw. Alveolar stops (English “ta”) require contact and quick release of the tongue tip or blade with the alveolar ridge. In retroflex stops (Indian English “time”), the tongue tip is curled back and contacts the postalveolar area before release. Palatal stops (Italian “chi”) instead require the middle or back part of the tongue to contact the hard palate, whereas in velar stops (English “kiss”) the back of the tongue contacts the soft palate. Glottal stops (English “uh”) involve glottal closure without tongue movement. Figure 1.Prompts were excised from a recording of a phonetically trained female native English speaker producing each plosive. Stimuli were scaled to a nominal intensity of 65 dB RMS in Praat (Praat, 5.3.01), and inspected to ensure peak clipping of sounds had not occurred. A token of each stop + vowel (padded by 40 ms of silence, total duration 330 ms) was concatenated 5 times to create a prompt train, with a syllable repetition rate of 3 Hz. Auditory prompts were then concatenated in order of place of articulation, either front-to-back (/pə/ /tə/ /ʈə/ /cə/ /kə/ /ʔə/) or the reverse, with each prompt separated by 9 s of silence, during which the participant repeated the prompted syllable at the same rate (see schematic in Fig. ). The full cycle of prompts repeated every 64 s, with 8 front-to-back or back-to-front cycles per run (run duration 8′32″). Stimuli were presented binaurally using Sensimetrics S14 earbuds.Procedure: Prior to scanning, the experimenters familiarized all participants with the syllabic prompts (typically, for 20–30 min), until participants could produce them at the required rate without difficulty. Care was taken to ensure all participants could perceive the speech sounds clearly and could produce all stops with the correct place of articulation. Critically, participants were instructed to reduce movement of the jaw and lips during production in order to minimize head movement artifacts. Participants also practiced producing each sound with as soft an articulation as possible to reduce artifact arising from B0 field distortions due to changes in tongue position and air volume within the mouth, throat, and chest. Participants were instructed to produce each speech sound at a constant rate, starting immediately at the onset of the prompt and at the prompted tempo (3 syllables/s). Participants continued producing the prompted sound after offset of the prompt, and began production at the next place of articulation at the next prompt onset (10.66 s after onset of the previous prompt). The stimulus prompt sound file began playing at the beginning of the fifth TR of each functional scanning run (the first 4 images were discarded to allow T1 magnetization to come to equilibrium). Participants underwent 2 alternated sessions of each place of articulation order (front to back and back to front). Participants were monitored by in-bore video camera and encouraged to breathe at irregular intervals during functional scanning, to reduce potential data artifact arising from systematic breathing and/or head movement.Data acquisition: Functional data were acquired on a 1.5 T Siemens Avanto scanner with 32-channel head coil using a T2*-weighted echo planar gradient echo pulse sequence (256 TRs, TR = 2000 ms, TE = 39 ms, flip angle = 90°, bandwidth = 1474 Hz/pixel, matrix = 64 × 64, 24 axial slices, 3.2 × 3.2 × 3.2 mm3 voxels). The first 4 volumes in each run were excluded to allow for T1 longitudinal magnetization to reach steady state. A T1-weighted magnetization-prepared rapid gradient echo (MPRAGE) scan was acquired for each subject (TI = 1000 ms, TR = 8.4 ms, TE = 3.57 ms, flip angle = 7°, matrix = 224 × 256, 176 axial slices, 1 × 1 × 1 mm3 voxels). A multiparameter mapping (MPM) protocol (; ) was also acquired on 4 of the participants. Proton density-weighted (PDw), T1-weighted (T1w), and magnetization transfer (MTw) images were acquired using an in-house 3D FLASH pulse sequence (voxel size: 0.8 × 0.8 × 0.8mm3, FOV = 256 × 224 × 180mm3, matrix = 320 × 256 × 224, TR = ×16.0ms, bandwidth 480 Hz/px, excitation flip angle: 4° (PDw/MTw) or 24° (T1w), slab rotation 30°). To accelerate this high-resolution acquisition, a partial Fourier acquisition (6/8 coverage) was used in the inner phase-encoded direction (RL) and parallel imaging was used along the outer phase encoding direction (AP), reconstructed using the GRAPPA algorithm (acceleration factor 2, 32 integrated auto-calibration lines) as implemented on the scanner platform. Four gradient echoes were acquired for each contrast (TE = 2.50, 4.84, 7.18, 9.52 ms) after each RF excitation pulse and averaged to improve SNR (). Quantitative R1 (=1/T1) maps were estimated from the PDw and T1w images according to the model developed by which was extended by including a correction for RF transmit field inhomogeneities () and imperfect spoiling (). The transmit field map was calculated using a 3D EPI spin-echo (SE)/stimulated echo (STE) method (; ; FOV = 256 × 192 × 192 mm3, matrix = 64 × 64 × 48, TE/TM = 50.02/44.16 ms, TR = 500 ms, nominal α varying from 115° to 65° in steps of 5°, acquisition time 4 min 24 s) and was corrected for off-resonance effects using a standard B0 field map (double gradient echo FLASH, 3 × 3 × 2 mm3 isotropic resolution, whole-brain coverage).Data processing, structural scans: The MPRAGE anatomical scans were used to reconstruct the cortical surface of 6 participants (; ; ), except for the 4 participants who underwent the MPM protocol, in which case these scans were used for surface reconstruction (). A 6-subject cortical-surface-based average group R1 map, corrected for local effects of cortical thickness and curvature (; ) was spherically morphed to the display subject's brain for comparison with articulation maps. (The 2 subjects from the current study were included in the data.) The gradient of the group R1 map was also overlaid with articulation maps to estimate the borders of secondary somatosensory areas ().Data processing, functional scans. All functional data were analyzed using a customized version of FreeSurfer (csurf, M. Sereno, http://www.cogsci.ucsd.edu/~sereno/.tmp/dist/csurf). Functional data were registered using a linear affine least-squares minimization algorithm in AFNI (3dvolreg; ); functional scans were then registered using bbregister () and manual blink comparison to the high-resolution volume used to create the cortical surface (i.e., rapidly flicking between functional volumes and the anatomical image in csurf tkregister to check registration success).Each functional session was analyzed using Fourier analysis methods (), where functional activation is measured as the amplitude of the periodic BOLD signal at the frequency of the stimulus cycle (). Periodic signal components with very low frequencies (due to motion) and the second and third harmonic of the stimulus (due to surround inhibition), as well as the higher frequency of the auditory prompt, were excluded as neither signal nor noise (this is mathematically equivalent to first linearly regressing out these frequencies as nuisance variables before calculating significance).For each subject, the full Fourier transforms of each functional run time course were calculated, with the calculated phase subsequently reversed at the stimulus (but not noise) frequencies for the back-to-front runs. A complex F-ratio was then calculated by comparing the Fourier amplitude at stimulus frequencies to the average Fourier amplitude at nonstimulus frequencies (; ). Averaged 3D Fourier amplitudes and first level statistics were painted onto each participant's inflated cortical surface in csurf.A cross-subject activation average was created using spherical-registration-based cross-subject resampling (; ). Each subject's statistical maps were resampled onto the icosahedral spherical surface using best-fit sulcal alignment with one step of surface-based smoothing. Group-level statistics were then calculated via a cross-subject F-ratio (based on the complex Fourier coefficients at the stimulus frequency from each subject) with (2, 2n–2) degrees of freedom. Averaged data were resampled onto a single subject's surface for visualization displayed with 10 steps of surface-based smoothing (approximating a Gaussian smoothing kernel of 3 mm FWHM). Surface-based cluster exclusion was used to correct for multiple comparisons (surfclust and randsurfclust from , ); group-level F statistics were thresholded at P < 0.05 and surface clusters less than 281 mm2 excluded (achieving a cluster-corrected significance of P < 10−5 per hemisphere). Surface cluster threshold extent was determined based on the minimum estimated cortical area from iterative random sampling of cluster sizes (N = 100 000 iterations per hemisphere by randsurfclust; , ) that were required to achieve a corrected alpha of P < 10−5 for each hemisphere, based on an initial uncorrected alpha of 0.05. Additionally, we ensured robustness of results at a more conservative initial threshold of P < 0.01 (cluster sizes 86 mm2, calculated with 10 000 iterations per hemisphere, achieving corrected hemisphere-wise alpha of 0.001). We additionally present results with less conservative cluster correction in (initial uncorrected alpha 0.05, cluster size 134 mm2, corrected hemisphere-wise alpha 0.05). […]

Pipeline specifications

Software tools FreeSurfer, AFNI
Applications Magnetic resonance imaging, Functional magnetic resonance imaging
Organisms Homo sapiens