State_of_Art

advertisement
Ultrasound speech analysis:
State of the art
Alan Wrench
Overview
1.
2.
3.
4.
5.
6.
Machines
Methods of recording image sequences and syncing with audio
Probes
Head stabilisation
Contour tracking
Parameterisation
Choosing an ultrasound : considerations

Physical: Size, weight, portability, fan noise – small and quiet is good

Probe design – low frequency, microconvex, short handled is good

Method of extracting images – ideally high quality, high frame rate, fast

Method of synchronising audio – ideally fully automated hardware frame sync

Cost – Low cost of course
There is no single perfect system.
Overview

~50 speech labs now using ultrasound

Aloka SSD 1000
-1 lab
Aloka SSD 4000
-1 lab
Aloka SSD 5000
-1 lab
Aloka SSD 5500
-1 lab
Mindray DP2200
- 6 labs
Mindray DP6600
- 15 labs
Mindray DP6900
- 1 lab
Mindray M5
- 1 lab
Echoblaster 128
- 3 labs
GE Logiq e
- 3 labs
GE Logiq alpha 100
-1 lab
Interson SeeMore
– 2 labs and British Columbia health service
Sonosite 180+
- 3 labs (being replaced)
Sonosite Titan
– 3 labs
Terason T3000
– 7 labs
Ultrasonix
Toshiba
Famio
RP, Tablet,
8 (SSA-530A)
Touch ––71labs
lab
Zonare Zone.1
Ultrasonix
RP, Tablet, Touch
- 1– lab
7 labs
Zonare Z.one
- 1 lab

















Recording ultrasound
Acquiring image data via the Video Port (NTSC or digital)
 Methods used
Frame grabber card and AAA software (Audio captured
separately via soundcard and synced using a box that
places a flash on the video and tone on the audio.
Automatic post-processing to detect flash and tone and
align – recording is fast but setting sync parameters can be
a bit tricky
– Canopus ADVC 110 video capture card. Provides integrated
synchronous audio. Requires video editing software for
capture such as Sony Vegas, Apple Final Cut Pro and
iMovie, Avid Xpress DV.
– Record to DVD recorder then transfer to PC offline.
– Recording the screen using Snagit or Camtasia is an option
for machines running under Windows such as Interson
SeeMore. Although this is not using the video port it results
in a video file.
If data is not compressed then de-interlacing provides 60 frames per
second. If compressed de-interlacing may not be possible
–
Things to look out for:
(These factors can vary between individual models of ultrasound, even ones from the
same manufacturer or if settings are changed.)
–
There may be a lag between the ultrasound and audio if the machine takes appreciable
time to process the ultrasound signal.
–
There may be duplicate frames
–
There may be blurring if frame averaging cannot be switched off.
–
Video images may be “torn” when made of parts of different sweeps.
–
Careful selection of Ultrasound system can mitigate against these problems.
–
25 labs using Aloka 1000,4000, Mindray 2200,6600,6900, GE LogiqE, Toshiba famio 8
use video port capture.
Recording Ultrasound



Cineloop – direct access to ultrasound memory
Advantage – No “torn” images. Frame rates higher than 60fps possible.
Disadvantage
–
–
–
Automatic audio synchronisation is not possible (with exceptions). Audio must
be recorded separately, merged in video editing software and synchronised by
manual observation of stop releases ( or a flash/beep signal ref., CHAUSA)
Cine loops have limited size. This limits record time. Sometimes this is a few
seconds, sometimes it can be several minutes. (with exceptions)
This approach is used by 10 labs with






GE Logiq e
Zonare z.one
Mindary M5
Sonosite 180+
Sonosite Titan
Interson SeeMore








There are 4 systems in use which allow automated synchronisation of
cineloop data and audio
Aloka SSD 5500 (Haskins) one off modification not generally available
Ultrasonix – both frame and scanline pulse sequences generated by
hardware.
Terason T3000 – hardware sync signal not generally available so
software sync used. Ultraspeech software polls system for new frames.
Echoblaster 128 - TTL frame pulses.
By recording these pulse signals on a second audio track alongside the
microphone input, automated precise synchronisation is possible.
16 labs use this method, using either Terason/UltraSpeech or
Ultrasonix/AAA or Matlab
With the exception of the Aloka, these systems also provide software
programming toolkits so that bespoke speech applications can be written:
–
–
–
UltraSpeech
AAA
Matlab
Probes
Convex and particularly microconvex (<20mm radius) generally preferred for
midsagittal tongue imaging
Probes come in a range of frequencies from 2-12MHz
 Low frequency = good penetration = tongue image doesn’t disappear for
high vowels and consonants
 Small radius means transmitting array fits under chin
 Large Field of View means more of the tongue can be imaged.
 Short handle
Aloka UST-9121 Multi Frequency Tight Convex Transducer
Scan angle: 120°
Radius: 14 mm
Frequency range: 2.5-6 MHz
Short handle
Narrow cylindrical grip ideal for a clamp.
Probe specifications

FOV °
Handle
120
short
15
140
short
4 - 10
15
133
short
C6.5/20/128Z
5-8
10
156
short
EchoBlaster128
C3.5/20/128Z
2-4
20
104
short
Ultrasonix
C9-5/10
5-9
10
148
short
Ultrasonix
C7-3/50
3-7
50
69
short
Mindray
65EC10EA
5-8
10
120
long
Mindray
65C15EA
5-8
15
90
short
Mindray
35C20EA
2-6
20
83
short
Aloka
UST-9121
2.5 - 4
15
120
short
Interson SeeMore
99-5901
3.5 – 5
10
90
short
Zonare
C9-4t
4-9
11
134
short
Sonosite
C11
4-7
11
90
short
Sonosite
C15
2–4
15
101
short
Model
Probe
MHz
T3000
8MC3
3-8
T3000
8EC4
4-8
Ge Logiq-e
8C-RS
EchoBlaster128
Best specification
Poor FOV
Radius
Probe stabilisation

Headset – 30+ labs

Rest forehead against headrest with probe in fixed position – 2 labs

Fixed head restraint and sprung-loaded probe

Fixed head restraint fixed probe
Head movement correction
Palatoglossatron, Peterotron,
https://github.com/jjberry/Autotrace/blob/master/old/
APIL wiki ??

HOCUS
http://www.psych.mcgill.ca/labs/mcl/pdf/HOCUS.pdf


GIPSA accelerometers and gyrometers
Contour tracking



Edgetrack – Maryland – standalone PC application – Snakes
http://vims.cis.udel.edu/~mli/research.htm
AAA – QMU – integrated PC application – fan based edge detection – similar performance to
Edgetrak within a recording and analysis GUI. Also a snakes based contour fitting interface.
Tonguetrack – Simon Fraser – Matlab – MRF energy minimisation
http://tonguetrack.cs.sfu.ca/TongueTrackUserGuide.pdf
L. Tang and G. Hamarneh. Graph-based tracking of the tongue contour in ultrasound se-quences with adaptive temporal regularization.
InMathematical Methods for BiomedicalImage Analysis (MMBIA), pages 1–8, 2010.







GetContours - Haskins – Matlab – Edgetrak with a GUI - available on request from Mark Tiede
Ultramat – Gipsa – Matlab – Thomas Hueber
Autotrace – Arizona – python script – Jeff Berry
https://github.com/jjberry/Autotrace
Noname - Munich – Matlab – in progress – Phil Hoole
UltraPraat – Arizona – in progress
UltraCats – Toronto – manual contour drawing – Tim Bressman
Jacob - Rochester – Speckle tracking – software not available
Jacob, M., H. Lehnert-LeHouillier, S. Bora, S. McAleavey, D. Dialecki, J. McDonough.2008. \Speckle Tracking for the Recovery of Displacement
and Velocity Information fromSequences of Ultrasound Images of the Tongue".Proceedings of the 8th International Sem-inar on Speech Production,
Strasbourg France, 53-57.

Roussos – UCL/Trier/Queen Mary - Active appearance models – software not available
Roussos, A. Katsamanis, and P. Maragos, “Tongue tracking inultrasound images with active appearance models,” inProc. IEEEInt’l Conf. on Image
Processing, 2009.
Speckle tracking


University of Rochester Biomedical Engineering
It provides displacement estimates giving “virtual fleshpoints” Works on
clear vowel images.
Parameterisation

Lingua – Quebec – Matlab
ISSP 2008
http://www.phonetique.uqam.ca/upload/files/anniebrasseur/menard%20et%20al%20iss
p2008.pdf

Zharkova – QMU – python
Zharkova, N. (2013). A normative-speaker validation study of two indices developed to
quantify tongue dorsum activity from midsagittal tongue shapes. Clinical Linguistics &
Phonetics, 27, 484-496.

Hueber – GIPSA – Matlab – EigenTongues – Ultraspeech
tools www.ultraspeech.com
Also Hoole – Munich – Matlab - Principal components
Analysis, Mielke NCSU, USA and Richmond, Edinburgh

NYU - SSANOVA using the gss package in R.
Haskins – shape analysis methods based on polynomial fitting
and procrustes comparison to a resting tongue shape.
AAA – Tongue averaging – pointwise t-tests.
Surfaces - Displays a sequence of contours as a time-motion
display. Contour sequences can be averaged and compared
numerically.




Miscellaneous

Ultrasonix 4D – Haskins
GE Logiq – Linear probe – laryngeal – Victoria
EchoTools - A set of tools for analyzing Echo-Doppler tongue images
https://github.com/jjberry/EchoTools


Download