Ultrasound speech analysis: State of the art Alan Wrench Overview 1. 2. 3. 4. 5. 6. Machines Methods of recording image sequences and syncing with audio Probes Head stabilisation Contour tracking Parameterisation Choosing an ultrasound : considerations Physical: Size, weight, portability, fan noise – small and quiet is good Probe design – low frequency, microconvex, short handled is good Method of extracting images – ideally high quality, high frame rate, fast Method of synchronising audio – ideally fully automated hardware frame sync Cost – Low cost of course There is no single perfect system. Overview ~50 speech labs now using ultrasound Aloka SSD 1000 -1 lab Aloka SSD 4000 -1 lab Aloka SSD 5000 -1 lab Aloka SSD 5500 -1 lab Mindray DP2200 - 6 labs Mindray DP6600 - 15 labs Mindray DP6900 - 1 lab Mindray M5 - 1 lab Echoblaster 128 - 3 labs GE Logiq e - 3 labs GE Logiq alpha 100 -1 lab Interson SeeMore – 2 labs and British Columbia health service Sonosite 180+ - 3 labs (being replaced) Sonosite Titan – 3 labs Terason T3000 – 7 labs Ultrasonix Toshiba Famio RP, Tablet, 8 (SSA-530A) Touch ––71labs lab Zonare Zone.1 Ultrasonix RP, Tablet, Touch - 1– lab 7 labs Zonare Z.one - 1 lab Recording ultrasound Acquiring image data via the Video Port (NTSC or digital) Methods used Frame grabber card and AAA software (Audio captured separately via soundcard and synced using a box that places a flash on the video and tone on the audio. Automatic post-processing to detect flash and tone and align – recording is fast but setting sync parameters can be a bit tricky – Canopus ADVC 110 video capture card. Provides integrated synchronous audio. Requires video editing software for capture such as Sony Vegas, Apple Final Cut Pro and iMovie, Avid Xpress DV. – Record to DVD recorder then transfer to PC offline. – Recording the screen using Snagit or Camtasia is an option for machines running under Windows such as Interson SeeMore. Although this is not using the video port it results in a video file. If data is not compressed then de-interlacing provides 60 frames per second. If compressed de-interlacing may not be possible – Things to look out for: (These factors can vary between individual models of ultrasound, even ones from the same manufacturer or if settings are changed.) – There may be a lag between the ultrasound and audio if the machine takes appreciable time to process the ultrasound signal. – There may be duplicate frames – There may be blurring if frame averaging cannot be switched off. – Video images may be “torn” when made of parts of different sweeps. – Careful selection of Ultrasound system can mitigate against these problems. – 25 labs using Aloka 1000,4000, Mindray 2200,6600,6900, GE LogiqE, Toshiba famio 8 use video port capture. Recording Ultrasound Cineloop – direct access to ultrasound memory Advantage – No “torn” images. Frame rates higher than 60fps possible. Disadvantage – – – Automatic audio synchronisation is not possible (with exceptions). Audio must be recorded separately, merged in video editing software and synchronised by manual observation of stop releases ( or a flash/beep signal ref., CHAUSA) Cine loops have limited size. This limits record time. Sometimes this is a few seconds, sometimes it can be several minutes. (with exceptions) This approach is used by 10 labs with GE Logiq e Zonare z.one Mindary M5 Sonosite 180+ Sonosite Titan Interson SeeMore There are 4 systems in use which allow automated synchronisation of cineloop data and audio Aloka SSD 5500 (Haskins) one off modification not generally available Ultrasonix – both frame and scanline pulse sequences generated by hardware. Terason T3000 – hardware sync signal not generally available so software sync used. Ultraspeech software polls system for new frames. Echoblaster 128 - TTL frame pulses. By recording these pulse signals on a second audio track alongside the microphone input, automated precise synchronisation is possible. 16 labs use this method, using either Terason/UltraSpeech or Ultrasonix/AAA or Matlab With the exception of the Aloka, these systems also provide software programming toolkits so that bespoke speech applications can be written: – – – UltraSpeech AAA Matlab Probes Convex and particularly microconvex (<20mm radius) generally preferred for midsagittal tongue imaging Probes come in a range of frequencies from 2-12MHz Low frequency = good penetration = tongue image doesn’t disappear for high vowels and consonants Small radius means transmitting array fits under chin Large Field of View means more of the tongue can be imaged. Short handle Aloka UST-9121 Multi Frequency Tight Convex Transducer Scan angle: 120° Radius: 14 mm Frequency range: 2.5-6 MHz Short handle Narrow cylindrical grip ideal for a clamp. Probe specifications FOV ° Handle 120 short 15 140 short 4 - 10 15 133 short C6.5/20/128Z 5-8 10 156 short EchoBlaster128 C3.5/20/128Z 2-4 20 104 short Ultrasonix C9-5/10 5-9 10 148 short Ultrasonix C7-3/50 3-7 50 69 short Mindray 65EC10EA 5-8 10 120 long Mindray 65C15EA 5-8 15 90 short Mindray 35C20EA 2-6 20 83 short Aloka UST-9121 2.5 - 4 15 120 short Interson SeeMore 99-5901 3.5 – 5 10 90 short Zonare C9-4t 4-9 11 134 short Sonosite C11 4-7 11 90 short Sonosite C15 2–4 15 101 short Model Probe MHz T3000 8MC3 3-8 T3000 8EC4 4-8 Ge Logiq-e 8C-RS EchoBlaster128 Best specification Poor FOV Radius Probe stabilisation Headset – 30+ labs Rest forehead against headrest with probe in fixed position – 2 labs Fixed head restraint and sprung-loaded probe Fixed head restraint fixed probe Head movement correction Palatoglossatron, Peterotron, https://github.com/jjberry/Autotrace/blob/master/old/ APIL wiki ?? HOCUS http://www.psych.mcgill.ca/labs/mcl/pdf/HOCUS.pdf GIPSA accelerometers and gyrometers Contour tracking Edgetrack – Maryland – standalone PC application – Snakes http://vims.cis.udel.edu/~mli/research.htm AAA – QMU – integrated PC application – fan based edge detection – similar performance to Edgetrak within a recording and analysis GUI. Also a snakes based contour fitting interface. Tonguetrack – Simon Fraser – Matlab – MRF energy minimisation http://tonguetrack.cs.sfu.ca/TongueTrackUserGuide.pdf L. Tang and G. Hamarneh. Graph-based tracking of the tongue contour in ultrasound se-quences with adaptive temporal regularization. InMathematical Methods for BiomedicalImage Analysis (MMBIA), pages 1–8, 2010. GetContours - Haskins – Matlab – Edgetrak with a GUI - available on request from Mark Tiede Ultramat – Gipsa – Matlab – Thomas Hueber Autotrace – Arizona – python script – Jeff Berry https://github.com/jjberry/Autotrace Noname - Munich – Matlab – in progress – Phil Hoole UltraPraat – Arizona – in progress UltraCats – Toronto – manual contour drawing – Tim Bressman Jacob - Rochester – Speckle tracking – software not available Jacob, M., H. Lehnert-LeHouillier, S. Bora, S. McAleavey, D. Dialecki, J. McDonough.2008. \Speckle Tracking for the Recovery of Displacement and Velocity Information fromSequences of Ultrasound Images of the Tongue".Proceedings of the 8th International Sem-inar on Speech Production, Strasbourg France, 53-57. Roussos – UCL/Trier/Queen Mary - Active appearance models – software not available Roussos, A. Katsamanis, and P. Maragos, “Tongue tracking inultrasound images with active appearance models,” inProc. IEEEInt’l Conf. on Image Processing, 2009. Speckle tracking University of Rochester Biomedical Engineering It provides displacement estimates giving “virtual fleshpoints” Works on clear vowel images. Parameterisation Lingua – Quebec – Matlab ISSP 2008 http://www.phonetique.uqam.ca/upload/files/anniebrasseur/menard%20et%20al%20iss p2008.pdf Zharkova – QMU – python Zharkova, N. (2013). A normative-speaker validation study of two indices developed to quantify tongue dorsum activity from midsagittal tongue shapes. Clinical Linguistics & Phonetics, 27, 484-496. Hueber – GIPSA – Matlab – EigenTongues – Ultraspeech tools www.ultraspeech.com Also Hoole – Munich – Matlab - Principal components Analysis, Mielke NCSU, USA and Richmond, Edinburgh NYU - SSANOVA using the gss package in R. Haskins – shape analysis methods based on polynomial fitting and procrustes comparison to a resting tongue shape. AAA – Tongue averaging – pointwise t-tests. Surfaces - Displays a sequence of contours as a time-motion display. Contour sequences can be averaged and compared numerically. Miscellaneous Ultrasonix 4D – Haskins GE Logiq – Linear probe – laryngeal – Victoria EchoTools - A set of tools for analyzing Echo-Doppler tongue images https://github.com/jjberry/EchoTools