Hynek Hermansky
Doctor of Engineering in Electrical Engineering
University of Tokyo, Japan
Dissertation Title: Improved Linear Predictive Analysis Based On Spectral Processing
Research Supervisor: Professor Hiroya Fujisaki
Master of Science in Electrical Engineering 1972
Technical University Brno, Czech Republic
Thesis Title: Synthesizer of Electronic Sounds
Current Appointments
 Julian S. Smith Professor of Electrical Engineering, the Johns Hopkins University, Whiting
School of Engineering, Since April 2012
 Director, Center for Language and Speech Processing, the Johns Hopkins University Since
April 2012
 Research Professor (on leave of absence), Brno University of Technology, Czech Republic,
Since 2000
Past Appointments
 Professor, Department of Electrical and Computer Engineering, the Johns Hopkins
University, Whiting School of Engineering, October 2008 to March 2012
 Interim Director, Center for Language and Speech Processing, the Johns Hopkins University
Since Nov. 2010 to March 2012
 Director of Research, IDIAP Research Institute, Martigny, Switzerland, 2003 to 2008
 Titular Professor, Swiss Federal Institute of Technology at Lausanne, 2005 to 2008
 Professor and Director, Center for Information Technology, Oregon Graduate Institute of
Science and Technology, Portland, Oregon, 1997 to 2003 (Associate Professor, 1993 to
 Senior Member of Technical Staff, U S WEST Advanced Technologies, Boulder, Colorado,
1990 to 1993 (Member of Technical Staff, 1988 to 1990)
 Research Engineer, Panasonic Technologies, Speech Research Laboratory, Santa Barbara,
California, 1983 to 1988
 Research Assistant, University of Tokyo, Japan, 1979 to 1983
 Assistant Professor, Brno University of Technology, Czech Republic, 1972 to 1978
The Johns Hopkins University
 EN.520.315 Introduction to Information Processing of Sensory Signals (3rd year
students), since 2009
 EN.520.515 Processing of Audio and Visual Signals (Graduate level), since 2011
 EN.520.680 Audio and visual processing humans and by machines (Graduate level),
since 2010
Other institutions
Speech and Audio Processing by Humans and by Machines, (Graduate seminar), Swiss
Federal Institute of Technology at Lausanne 2005 to 2007
Speech and Audio Processing by Humans and by Machines, (Graduate level), Brno
University of Technology, Czech Republic, since 2000
Speech and Audio Processing by Humans and by Machines, (Graduate level), Oregon
Graduate Institute of Science and Technology, Portland, Oregon 1994 to 2003
Speech Processing (Graduate level)
Speech Systems (Graduate level)
Signals and systems for multimedia engineering (Graduate level), Brno University of
Technology, Czech Republic 1972 to 1978
Theory of Communications (5th year students)
Electroacoustics (4th year students)
Invited lecturer in special courses and tutorials
Invited lecture on neural networks for speech, Academica Sinica, Taipei, Taiwan, May 2014
Invited talk, International Conference on Learning Representations 2014, Banff, Canada, April 2014
Invited tutorial, Speech Processing, Indian Institute of Information Technology, Hyderabad, India,
December 2013
Invited Keynote Talk, Artificial Neural Networks: Deep, Long and Wide, Oriental COCOSDA,
Gurgaon, India, November 2013
ISCA Medalist Keynote Talk, Interspeech 2013, Lyon, France
Invited lecture on neural networks, Carnegie-Mellon University, Colloquium of Language
Technology Institute, October 2013
Invited lecture on Speech Coding in ASR, Workshop on Computational Models of Early Language
Acquisition, Ecole Normal Superieur, Paris, July 2013
Invited talk on Multistream Recogntion of Speech, 2013 Telluride Workshop on Neuromorophic
Cognition Engineering
Tutorial on Speech Processing, 2009 Telluride Workshop on Neuromorophic Cognition Engineering
Tutorial on Machine Recognition of Speech, 2009 Summer Workshop on Human Language
Technology, The Johns Hopkins University, Baltimore
Invited Plenary Lecture, Information Extraction from Cognitive Signals, IEEE INDICON 2011,
Hyderabad, India
Invited talk, Workshop on Image and Speech Processing 2012, Hyderabad, India
Invited talk, Schlumberger workshop on Mathematical Models of Sound Analysis, IHES, Paris,
June 2012
Invited talk, Workshop on Image and Speech Processing 2011, Hyderabad, India
Invited talk, Computational Hearing Workshop, University College London, May 2010
Invited talk, Multistream Processing of Speech, IBM Watson Research Center, 2010
Leading session on classification, learning, and self-organization in biological systems at the School
on Neuromorphic Cognition 2009, Capo Caccia, Sardinia
Tutorial on Speech Processing, 2008 Telluride Workshop on Neuromorophic Cognition Engineering
Tutorial on modulation spectrum processing, Interspeech 2007, Antwerp, Belgium
Tutorial at the Summer Workshop on Multi-Sensory Modalities in Cognitive Science, Gerzensee,
Switzerland, 2007
Invited talk, Extraction Information from Speech, IBM Watson Research Center, 2007
Tutorial on Speech Processing, 2006 Telluride Workshop on Neuromorophic Cognition Engineering
Tutorial at Indian National Academy of Engineering Workshop on Image and Speech Processing,
Tutorial on Speech Processing, 2005 Telluride Workshop on Neuromorophic Cognition Engineering
Invited lecturer at International Graduate School for Neurosensory Science, Bad Zwischenahn,
Germany 2004
Tutorial on Speech Processing, 2004 Telluride Workshop on Neuromorophic Engineering
Tutorial at the International Conference on Intelligent Sensors, Melbourne, Australia, 2004
Invited talk, Carnegie-Mellon University, 2003
Invited lecturer at the Language and Speech Engineering Postgraduate Course 2002, EPFL
Invited talk, Center for Language and Speech Processing Distinguished Lectures Series, 2002
Invited lecturer at the Language and Speech Engineering Postgraduate Course 2001, EPFL
Invited lecturer at the LOT Summer School 2001, Nijmegen, Netherlands
Invited lecturer at the International Graduate School for Neurosensory Science, Bad Zwischenahn,
Germany 2001
Invited lecturer at the NATO Workshop on Temporal Dynamics of Speech 2001, Ill Chiocco, Italy
Invited lecturer at the Speech Euromasters Summer School 2000 Chios, Greece
Invited Keynote Talk, Advances in Neural Information Processing (NIPS2000), Denver, Colorado,
December 2000
Tutorial at the Fifth International Symposium on Signal Processing and its Applications, Brisbane,
Australia, 1999
Invited lecturer at the ESCA Workshop on Auditory Basis of Speech Perception 1996, Keele, U.K.
Invited lecturer at the NATO Workshop on Robust Speech Recognition 1997, Il Chiocco, Italy
Invited lecturer at the intensive course on speech processing 1998, KTH Stockholm, Sweden
Invited lecturer at the Summer School "Speech Recognition and Neural Networks" 1998, Vietri-sulMare, Italy
Workshop and panel organization, invited panelist,
General Chair, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding,
Olomouc, Czech Republic, 2013
Organizer, Johns Hopkins Summer Workshop on Speaker Recognition, June-July 2013
Organizing and leading ONR sponsored Workshop on Temporal Dynamics in Speech and
Hearing, Antwerp, Belgium 2007
Organizing and leading a group at the 2007 JHU 6 week Summer Workshop on Human
Language Technology
Organizing and leading a panel on Handling Unexpected Acoustic Data in Machine
Recognition of Speech, IEEE Workshop on Automatic Speech Recognition and
Understanding, Kyoto, Japan 2007
Organizing and leading a panel on Human-like Speech Processing, IEEE Workshop on
Automatic Speech Recognition and Understanding, San Juan, Puerto Rico 2005
Leading a panel on Industrial Applications of Speech Technology, Eurospeech 2001,
Aalborg, Denmark
Invited participant on the panel on Future of Speech Research, International Conference on
Spoken Language Processing, Sydney, Australia 2000
Invited senior scientist (seven times), 6 week JHU Summer Workshop on Human
Language Technology
Invited Faculty (six times), Telluride Summer Workshop on Neuromorphic Engineering
Articles in Refereed Journals
Hynek Hermansky, Jordan R. Cohen, Richard M. Stern, Perceptual Properties
of Current Speech Recognition Technology, Invited Paper, Proceedings of IEEE, Invited Paper,
Proceedings of IEEE, vol, 101, No. 9, pp. 1968-1985, September 2013
Hynek Hermansky, Dealing with Unknown Unknowns: Multi-stream Recognition of Speech, Invited
Paper, Proceedings of IEEE, Vol, 101, No. 5, pp. 1076-1088, May 2013
Sriram Ganapathy and Hynek Hermansky, "Temporal Resolution Analysis in Frequency Domain
Linear Prediction", Express Letters section of the Journal of the Acoustical Society of America,
September 2012
S. Garimella, S. Mallidi and H. Hermansky, "Regularized Auto-Associative Neural Networks for
Speaker Verification", IEEE Signal Processing Letters, Dec 2012, pp. 841-844
D. Weinshall, A. Zweig, H. Hermansky, S. Kombrink, F. W. Ohl, J. Anemueller, J. H. Bach, L. Van
Gool, F. Nater, T. Pajdla, M. Havlena and M. Pavel, Beyond Novelty Detection: Incongruent Events,
when General and Specific Classifiers Disagree, IEEE Transactions On Pattern Analysis And
Machine Intelligence, Vol. 34, No 10, pp. 1886-1901, October 2012
H. Hermansky, Speech recognition from spectral dynamics, Invited Paper, SADHANA, Indian
Academy of Sciences, Vol. 36, Part 5, October 2011, pp. 729–744
N. Mesgarani, S. Thomas and H. Hermansky, Towards optimizing stream fusion in multistream
recognition of speech, J. Acoust. Soc. Am. Volume 130, Issue 1, pp. EL14-EL18, 2011
G. Sivaram and H. Hermansky, Sparse Multilayer Perceptron for Phoneme Recognition, IEEE
Transactions on Audio, Speech, And Language Processing, vol.20, no.1, pp.23-29, Jan. 2012
Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky,"Temporal envelope compensation for
robust phoneme recognition using modulation spectrum", the Journal of the Acoustical Society of
America, 2011
G. Sivaram, S. Nemala, N. Mesgarani, H. Hermansky, “Data-driven and feedback based spectrotemporal features for speech recognition”, Signal Processing Letters, IEEE, volume 17, issue 11,
J. Pinto, G. Sivaram, M. Magimai.-Doss, H. Hermansky, and H. Bourlard, Analyzing MLP Based
Hierarchical Phoneme Posterior Probability Estimator, IEEE Transactions on Audio, Speech, and
Language Processing, 2010
S. Ganapathy, P. Motlicek and H. Hermansky, Autoregressive Models of Amplitude Modulations in
Audio Compression in: IEEE Transactions on Audio, Speech, And Language Processing, 2010
P. Motlicek, S. Ganapathy, H. Hermansky and H. Garudadri, Wide-Band Audio Coding based on
Frequency Domain Linear Prediction, in: EURASIP Journal on Audio Speech And Music
Processing, Special Issue on, 2009
S. Ganapathy, S. Thomas, and H. Hermansky, "Modulation frequency features for phoneme
recognition in noisy speech", the Journal of the Acoustical Society of America, Vol 125, No 1, pp
EL8-EL12, 2009
W. Verhelst, J. Herre, G. Kubin, H. Hermansky, S. H. Jensen, Editorial, Special Issue in
Antropomorphic Processing of Audio and Speech, EURASIP Journal on Applied Signal Processing
2005:9, 1289–1291, 2005 Hindawi Publishing Corporation
S. Thomas, S. Ganapathy and H. Hermansky, "Recognition Of Reverberant Speech Using Frequency
Domain Linear Prediction", IEEE Signal Processing Letters, 2008.
N. Morgan, Q. Zhu, A. Stolcke, K. Sonmez, S. Sivadas, T. Shonozaki, M. Ostendorf, P. Jain, H.
Hermansky, D. Gelbart, D. Ellis, G. Doddington, B. Chen, O. Cetin, H. Bourlard and M. Athineos,
“Pushing the Envelope – Aside : Beyond the Spectral Envelope as the Fundamental Representation
for Speech Recognition,” Invited Paper in the IEEE Signal Processing Magazine, 2005.
H. Hermansky and N. Morgan, “Show What You Know: Musings on the Reporting of Negative
Results in Speech Recognition Research,” Invited Editorial Note in Journal of Negative Results in
Speech and Audio Sciences, 2004.
H. H. Yang, S. Sharma, S. van Vuuren, H. Hermansky, “Relevance of Time-Frequency Features for
Phonetic and Speaker/Channel Classification,” Speech Communication, August 2000.
N. Malayath, H. Hermansky, S. Kajarekar and B. Yegnanarayana, “Data-Driven Temporal Filters
and Alternatives to GMM in Speaker Verification,” in Digital Signal Processing, Vol. 10, pp 55-74,
N. Kanedera, T. Arai, H. Hermansky and M. Pavel, “On the Relative Importance of Various
Components of the Modulation Spectrum of Speech,” Speech Communication 28 (1), pp. 43-56,
May 1999.
B. Yegnanarayana, C. Avendano, H. Hermansky and P. S. Murthy, “Speech Enhancement Using
Linear Prediction Residual,” Speech Communication 28 (1), pp. 25-42, May 1999.
T. Arai, M. Pavel, H. Hermansky and C. Avendano, Syllable Intelligibility for Temporally-Filtered
LPC Cepstral Trajectories, Journal of the Acoustical Society of America, (105), 5, pp. 2783-2791,
May 1999.
H. Hermansky, “Should recognizers have ears?” in Speech Communication, vol. 25, num. 3-27,
H. Bourlard, H. Hermansky and N. Morgan, “Towards Increasing Speech Recognition Error Rates,”
invited paper, Speech Communication, Vol 18 (4), May 1996.
H. Hermansky, “Robust Speech Recognition,” in Cole, Hirshman et al., “The Challenge of Spoken
Language Systems, Research Directions for the Nineties,” IEEE Transactions on Speech and Audio
Processing, Vol. 3, No. 1, pp. 1-21, January 1995.
H. Hermansky and N. Morgan, RASTA Processing of Speech, IEEE Transactions on Speech and
Audio Processing, Vol. 2, No. 4, pp. 587-589, October 1994.
J. C. Junqua, H. Wakita and H. Hermansky, “Optimizing Perceptually Based ASR Front End,” IEEE
Transactions on Speech and Audio Processing, Vol. 1, No. 1, pp. 39-49, January 1993.
H. Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,” Journal of the Acoustical
Society of America, Vol. 87, 4, April 1990, pp. 1738-1752.
Book Chapters
Anemüller, J., B. Caputo, Hermansky, H., Ohl, F., Pajda, T, Pavel, M, Van Gool, L, Vogels, R,
Wabnik, S, Weinshall, D. "DIRAC: Detection and identification of rare audio-visual events." Studies
in Computational Intelligence 384: 3-35, Anemüller ,Weinshall, Van Gool, Eds., Springer 2012
F. Valente and H. Hermansky, “Data-driven extraction of spectral-dynamics based
posteriors”, Handbook of Natural Language Processing and Machine Translation: DARPA
Global Autonomous Language Exploitation, Olive, Christianson, McCary, Eds. Springer,
March 2011.
H. Hermansky, “Data-driven extraction of temporal features from speech,” in Dynamics of
Speech Production and Perception 5, P. Divenyi et al. (Eds.), IOS Press, 2006
H. Hermansky, “Speech and its processing,” in Language and Speech Engineering, M. Rajman
(Ed.), EPFL Press, 2006.
C. Avendano, L. Deng, H. Hermansky and B. Gold, “Analysis and Representation of Speech,” in
Speech Processing in the Auditory System, Greenberg and Aintsworth (Eds.), Springer 2004.
N. Morgan, H. Bourlard and H. Hermansky, “Automatic Speech Recognition: an Auditory
Perspective,” in Speech Processing in the Auditory System, Greenberg and Aintsworth (Eds.),
Springer 2004.
H. Hermansky and N. Morgan, “Automatic Speech Recognition,” in Encyclopedia of Cognitive
Science, L. Nadel (Ed.), Nature Publishing Group, Macmilian Publishers, 2002.
H. Hermansky, “Modulation Spectrum in Speech Processing,” in Signal Analysis and Prediction,
Prochazka, Uhlir, Rayner and Kingsbury (Eds.), Birkhauser Boston, 1998.
Peer Reviewed Papers in Conference Proceedings
1. Hynek Hermansky, Speech representation based on spectral dynamics, Proceedings of
International Worskhop on Models and Analysis of Vocal emissions for Biomedical
Applications, Firenze, Italy, December 2013
2. Hynek Hermansky: Long, Deep and Wide Artificial Neural Nets for Dealing with Unexpected
Noise in Machine Recognition of Speech, Proc. Text, Speech and Dialogue 2013, Springer
3. Hynek Hermansky, Jeff Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Mallidi, Feipeng Li,
Hynek Hermansky, Improvements in Language Identification on the RATS Noisy Speech
Corpus, Proc. INTERSPEECH 2013
4. Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky, Robust Speaker Recognition Using
Spectro-Temporal Autoregressive Models, Proc. INTERSPEECH 2013
5. Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky,
Emmanuel Dupoux, Evaluating speech features with the Minimal-Pair ABX task: Analysis of
the classical MFC/PLP pipeline, Proc. INTERSPEECH 2013
6. Ehsan Variani, Feipeng Li, Hynek Hermansky, Multi-stream recognition of noisy speech with
performance monitoring, Proc. INTERSPEECH 2013
7. Keith Kintzley, Aren Jansen, Hynek Hermansky, Text-to-Speech Inspired Duration Modeling
for Improved Whole-Word Acoustic Models, Proc. INTERSPEECH 2013
Aren Jansen, Samuel Thomas, Hynek Hermansky, Weak Top-Down Constraints For
Unsupervised Acoustic Model Training, Proc. ICASSP 2013, Vancouver, Canada
Samuel Thomas, Michael Seltzer, Kenneth Church, Hynek Hermansky, Deep Neural Network
Features and Semi-Supervised Training for Low Resource Speech Recognition, Proc. ICASSP
2013, Vancouver, Canada
Feipeng Li, Hynek Hermansky, Effect Of Filter Bandwidth and Spectral Sampling Rate of
Analysis Filterbank on Automatic Phoneme Recognition, Proc. ICASSP 2013, Vancouver,
Pascal Clark, Harish Mallidi, Aren Jansen, Hynek Hermansky, Frequency Offset Correction in
Speech Without Detecting Pitch, Proc. ICASSP 2013, Vancouver, Canada
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur,
Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard Rose, Michael
Seltzer, Pascal Clark, Ian Mcgraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin
Borschinger, Justin Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-Ying Lee,
Keith Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz,
Samuel Thomas, A Summary Of The 2012 JHU CLSP Workshop on Zero Resource Speech
Technologies and Models of Early Language Acquisition, Proc. ICASSP 2013, Vancouver,
Hynek Hermansky, Ehsan Variani, Vijayaditya Peddinti, Mean Temporal Distance: Predicting
ASR Error from Temporal Properties of Speech Signal, Proc. ICASSP 2013, Vancouver,
Vijayaditya Peddinti, Hynek Hermansky, Filter-Bank Optimization for Frequency Domain
Linear Prediction, Proc. ICASSP 2013, Vancouver, Canada
Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, Jeff Ma, Sandro Cumani,
Ondrej Glembek, Hynek Hermansky, Sri Harish Mallidi, Nima Mesgarani, Richard Schwartz,
Mehdi Soufifar, Zheng-Hua Tan, Samuel Thomas, Bing Zhang, Xinhui Zhou, Developing a
Speaker Identification System for The Darpa Rats Project, Proc. ICASSP 2013, Vancouver,
16. Ehsan Variani and Hynek Hermansky, “Estimating Classifier Performance in Unknown
Noise”, Proc. Interspeech 2012, Portland, Oregon
17. Samuel Thomas, Sriram Ganapathy, Aren Jansen and Hynek Hermansky, Data-driven
Posterior Features for Low Resource Speech Recognition Applications, Proc. Interspeech
2012, Portland, Oregon
18. Feipeng Li, Sri Harish Mallidi, Hynek Hermansky, Phone recognition in critical bands using
sub-band temporal modulations, Proc. Interspeech 2012, Portland, Oregon
19. Keith Kintzley, Aren Jansen, Hynek Hermansky, MAP Estimation of Whole-Word Acoustic
Models with Dictionary Priors, Proc. Interspeech 2012, Portland, Oregon
20. Keith Kintzley, Aren Jansen, Kenneth Church, Hynek Hermansky, “Inverting the Point
Process Model for Fast Phonetic Keyword Search”, Proc. Interspeech 2012, Portland, Oregon
21. Samuel Thomas, Sri Harish Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani,
Xinhui Zhou, Shihab Shamma,Tim Ng, Bing Zhang, Long Nguyen and Spyros Matsoukas,
Acoustic and Data-driven Features for Robust Speech Activity Detection, Proc. Interspeech
2012, Portland, Oregon
22. Sriram Ganapathy, Hynek Hermansky, “Analysis of Temporal Resolution in Frequency
Domain Linear Prediction”, Proc. Interspeech 2012, Portland, Oregon
23. Samuel Thomas, Sriram Ganapathy, Hynek Hermansky,” Multilingual MLP Features For
Low-Resource LVCSR Systems”, Proc. ICASSP 2012, Kyoto, Japan
24. Daniel Garcia-Romero, Xinhui Zhou, Dmitry Zotkin, Balaji Srinivasan, Yuancheng Luo,
Sriram Ganapathy, Samuel Thomas, Sridhar Krishna Nemala, Sivaram Garimella, Majid
Mirbagheri, Sri Harish Mallidi, Thomas Janu, Padmanabhan Rajan, Nima Mesgarani, Mounya
Elhilali, Hynek Hermansky, Shihab Shamma, Ramani Duraiswami, “The UMD-JHU 2011
Speaker Recognition System”, Proc. ICASSP 2012, Kyoto, Japan
25. Sriram Ganapathy, , and Hynek Hermansky, Multi-layer perceptron based speech activity
detection for speaker verification, 2011 IEEE Workshop on Applications of Signal Processing
to Audio and Acoustics, pp. 321-324
26. Hynek Hermansky, Dealing with unexpected words in automatic recogntion of speech, Proc.
14th International Conference on Text, Speech and Dialogue, 2011, pp.1-12
27. Keith Kintzley, Aren Jansen, Hynek Hermansky, ‘Event Selection from Phone Posteriorgrams
Using Matched Filters’, Proc. Interspeech 2011, Florence, Italy
28. Nima Mesgarani, Samuel Thomas, Hynek Hermansky, ‘Adaptive Stream Fusion in
Multistream Recognition of Speech’, Proc. Interspeech 2011, Florence, Italy
29. Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky, ‘Modulation spectrum analysis for
recognition of reverberant speech’, Proc. Interspeech 2011, Florence, Italy
30. Sivaram Garimella, Samuel Thomas, Hynek Hermansky, ‘Mixture of Auto-Associative Neural
Networks for Speaker Verification’, Proc. Interspeech 2011, Florence, Italy
31. Michael Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky, ‘Rapid Evaluation of
Speech Representations for Spoken Term Discovery’, Proc. Interspeech 2011, Florence, Italy
32. Samuel Thomas, Patrick Nguyen, Geoffrey Zweig, Hynek Hermansky, 'MLP Based Phoneme
Detectors For Automatic Speech Recognition', Proceedings of IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing (ICASSP) 2011, Prague, Czech Republic
33. Geoffrey Zweig, Patrick Nguyen, Dirk Van Compernolle, Kris Demuynck, Les Atlas, Pascal
Clark, Greg Sell, Meihong Wang, Fei Sha, Hynek Hermansky, Damianos Karakos, Aren
Jansen, Samuel Thomas, Sivaram G.S.V.S., Sam Bowman, Justine Kao, 'Speech Recognition
With Segmental Conditional Random Fields: A Summary Of The JHU CLSP 2010 Summer
Workshop', Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
(ICASSP) 2011, Prague, Czech Republic
34. Sivaram Garimella and Hynek Hermansky, Multilayer Perceptron with Sparse Hidden Outputs
for Phoneme Recognition, Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal
Processing (ICASSP) 2011, Prague, Czech Republic
35. Nima Mesgarani, Samuel Thomas, Hynek Hermansky, A Multistream Multiresolution
Framework for Phoneme Recognition, Proc. INTERSPEECH 2010, Tokyo, Japan, pp. 318 –
36. Samuel Thomas, Sriram Ganapathy, Hynek Hermansky, Cross-Lingual and Multi-Stream
Posterior Features for Low Resource LVCSR Systems, Proc. INTERSPEECH 2010, Tokyo,
Japan, pp. 877-880
37. Aren Jansen, Kenneth Church, Hynek Hermansky, Towards Spoken Term Discovery at Scale
with Zero Resources, Proc. INTERSPEECH 2010, Tokyo, pp. 1676 – 1679
38. G.S.V.S. Sivaram, Sriram Ganapathy, Hynek Hermansky, Sparse Auto-Associative Neural
Networks: Theory and Application to Speech Recognition, Proc. INTERSPEECH 2010,
Tokyo, Japan, Page 2270 – 2273
39. Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky, A
Phoneme Recognition Framework Based on Auditory Spectro-Temporal Receptive Fields,
Proc. INTERSPEECH 2010, Tokyo, Japan, pp. 2458 – 2461
40. Hynek Hermansky, History Of Modulation Spectrum In ASR, Invited Paper, Proceedings of
IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 2010, Dallas, Texas
41. Sivaram Garimella, Sridhar Krishna Nemala, Mounya Elhilali, Trac D. Tran, Hynek
Hermansky, Sparse Coding For Speech Recognition, Proceedings of IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing (ICASSP) 2010, Dallas, Texas
42. Sriram Ganapathy, Samuel Thomas, Hynek Hermansky, Robust Spectro-Temporal Features
Based On Autoregressive Models Of Hilbert Envelopes, Proceedings of IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing (ICASSP) 2010, Dallas, Texas, pp. 4286 - 4289
43. Sriram Ganapathy, Samuel Thomas, Hynek Hermansky, Comparison Of Modulation Features
For Phoneme Recognition, Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal
Processing (ICASSP) 2010, Dallas, Texas
44. Shih-Chii Liu, Nima Mesgarani, John Harris, Hynek Hermansky, The Use of Spike-Based
Representations for Hardware Audition System, IEEE International Symposium on Circuits
and Systems 2010, Paris, France
45. Tobi Delbruck, Thomas Koch, Raphal Berner, Hynek Hermansky, Fully Integrated 500uW
Speech Detection Wake-Up Circuit, IEEE International Symposium on Circuits and Systems
2010, Paris, France
46. S. Kombrink, M. Hanneman, L. Burget, and H. Hermansky, “Recovery of rare words in
lecture speech.” in Proceedings Text, Speech and Dialogue 2010 , vol. 2010, no. 9, Springer
Verlag, 2010, pp. 330-337.
47. Petr Motlicek, Sriram Ganapathy and Hynek Hermansky, Arithmetic Coding of Sub-Band
Residuals in FDLP Speech/Audio Codec, in: 10th Annual Conference of the International
Speech Communication Association, ISCA, Brighton, England, pages 2591-2594, ISCA 2009,
48. Sriram Ganapathy, Petr Motlicek and Hynek Hermansky, Error Resilient Speech Coding
Using Sub-band Hilbert Envelopes, in: 12th International Conference on Text, Speech and
Dialogue, TSD 2009, Pilsen, Czech Republic, pages 355-362, Springer - Verlag, Berlin
Heidelberg 2009, 2009
49. Sriram Ganapathy, Petr Motlicek and Hynek Hermansky, MDCT for Encoding Residual
Signals in Frequency Domain Linear Prediction, in: Audio Engineering Society (AES), 127th
Convention, Audio Engineering Society (AES), Audio Engineering Society, 60 East 42nd
Street, New York, New York 10165-2520, USA, 2009
50. Joel Praveen Pinto, G. S. V. S. Sivaram, Hynek Hermansky and Mathew Magimai-Doss,
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, in:
Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2009
51. Sriram Ganapathy, Samuel Thomas, Hynek Hermansky: Applications of Signal Analysis
Using Autoregressive Models for Amplitude Modulation, Proceedings Workshop on
Applications of Signal Processing to Audio and Acoustics (WASPAA 2009), 2009, New Paltz,
New York, U.S.A.
Nima Mesgarani, G.S.V.S. Sivaram, , Hynek Hermansky, "Discriminant Spectrotemporal
Features for Phoneme Recognition", Proc. Interspeech 2009, Brighton, U.K.
S. Ganapathy, S. Thomas and H. Hermansky "Static and Dynamic Modulation Spectrum for
Speech Recognition", Proc. Interspeech 2009, Brighton, U.K.
S. Thomas, S. Ganapathy and H. Hermansky, "Tandem Representations of Spectral Envelope
and Modulation Frequency Features for ASR", Proc. Interspeech 2009, Brighton, U.K.
P. Motlicek, S. Ganapathy and H. Hermansky, "Arithmetic Coding of Sub-band Residuals in
FDLP Speech/Audio Codec", Proc. Interspeech 2009, Brighton, U.K.
S. Kombrink, L. Burget, P. Matejka, M. Karafiat and H. Hermansky, “Posterior-based Out-ofVocabulary Word Detection in Telephone Speech”, Proc. Interspeech 2009, Brighton, U.K.
Sriram Ganapathy, Petr Motlicek, Hynek Hermansky and Harinath Garudadri, Autoregressive
Modelling of Hilbert Envelopes for Wide-band Audio Coding, in: AES 124th Convention,
Audio Engineering Society, 2008
Joel Pinto, Sivaram G.S.V.S, Hynek Hermansky, Mathew Magimai. Doss, 'Volterra Series For
Analyzing MLP Based Phoneme Posterior Estimator', Proceedings of ICASSP 2009
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky, 'Phoneme Recognition Using Spectral
Envelope And Modulation Frequency Features', Proceedings of ICASSP 2009
Misha Pavel, Malcolm Slaney, Hynek Hermansky, 'Reconciliation of human and machine
speech recognition performance', in Proceedings of ICASSP 2009
Daphna Weinshall, Hynek Hermansky, Alon Zweig, Jie Luo, Holly Jimison, Frank Ohl, Misha
Pavel, Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers
Disagree, Proceedings Neural Information Processing Conference, Advances in Neural
Information Processing 21, MIT Press
Tamara Tosic, Mathew Magimai-Doss, Hynek Hermansky, Using comparison of parallel
phoneme probability streams for OOV work detection, Proc. Eusipco 2008
Joel Pinto and Hynek Hermansky, Combining Evidence from a Generative and a
Discriminative Model in Phoneme Recognition, Proceedings of Interspeech, 2008, Brisbane,
Fabio Valente and Hynek Hermansky, On the Combination of Auditory and Modulation
Frequency Channels for ASR applications, Proceedings of Interspeech, 2008, Brisbane,
Sriram Ganapathy , Petr Motlicek , Hynek Hermansky , Harinath Garudadri Spectral Noise
Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral
Domain, Proceedings of Interspeech, 2008, Brisbane, Australia
Samuel Thomas, Sriram Ganapathy and Hynek Hermansky, "Spectro-Temporal Features for
Automatic Speech Recognition using Linear Prediction in Spectral Domain", Proceedings of
Pinto Joel, Szöke Igor, Prasanna S.R.M., Hermanský Hynek: Fast Approximate Spoken Term
Detection from Sequence of Phonemes, In: The 31st Annual International ACM SIGIR
Conference 20-24 July 2008, Singapore, Singapore, SG, ACM, 2008, p. 28-33, ISBN 978-90365-2697-5
Samuel Thomas, Sriram Ganapathy and Hynek Hermansky, "Hilbert Envelope Based Features
for Far- Field Speech Recognition", Proceedings of MLMI, 2008
Sriram Ganapathy, Samuel Thomas and Hynek Hermansky, "Front-end for Far-Field Speech
Recognition based on Frequency Domain Linear Prediction", Proceedings of Interspeech,
2008, Brisbane, Australia
Samuel Thomas, Sriram Ganapathy and Hynek Hermansky, "Hilbert Envelope Based SpectroTemporal Features for Phoneme Recognition in Telephone Speech", Proceedings of
Interspeech, 2008, Brisbane, Australia
71. G.S.V.S. Sivaram and Hynek Hermansky, Emulating Temporal Receptive Fields of Auditory
Mid-brain Neurons for Automatic Speech Recognition, 16th European Signal Processing
Conference, Lausanne, Switzerland, August 2008
72. Joel Pinto, G.S.V.S. Sivaram, and Hynek Hermansky, Reverse Correlation for Analyzing MLP
Posterior Features in ASR, 11th International Conference on Text, Speech and Dialogue,
Brno, Czech Republic (TSD-08).
73. Garimella S.V.S. Sivaram and Hynek Hermansky: Emulating Temporal Receptive Fields of
Higher Level Auditory Neurons for ASR, 11th International Conference on Text, Speech and
Dialogue, TSD 2008, Brno, Czech Republic, September 8–12 2008,
74. G.S.V.S. Sivaram and Hynek Hermansky Introducing Temporal Asymmetries in Feature
Extraction for Automatic Speech Recognition, Interspeech 2008, Brisbane, Australia.
75. Sree Hari Krishnan Parthasarathi, Petr Motlicek, and Hynek Hermansky, “Exploiting
contextual information for speech/non-speech detection ”, Text, Speech and Dialogue, 2008.
76. Sriram Ganapathy, Petr Motlicek, Hynek Hermansky, Harinath Garudadri, Autoregressive
Modeling of Hilbert Envelopes for Wide-Band Audio Coding. 124th Convention of
Audioengineering Society, Amsterdam, 2008
77. Lukas Burget, Petr Schwarz, Pavel Matejka, Mirko Hannemann, Ariya Rastrow, Christopher
White, Sanjeev Khudanpur, Hynek Hermansky, Jan Cernocky, Combination Of Strongly And
Weakly Constrained Recognizers For Reliable Detection Of Out-Of-Vocabulary Words
({OOV}s), Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
(ICASSP)", 2008
78. Christopher White, Geoffrey Zweig, Lukas Burget, Petr Schwarz, Hynek Hermansky,
Confidence Estimation, OOV Detection And Language ID Using Phone-To-Word
Transduction And Phone-Level Alignments, Proceedings of IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing (ICASSP)", 2008
79. Sriram Ganapathy, Petr Motlicek, Hynek Hermansky, Harinath Garudadri, 'Temporal Masking
For Bit-Rate Reduction In Audio Codec Based On Frequency Domain Linear Prediction',
Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)", 2008
80. Joel Pinto, Yegnanarayana Bayya, Hynek Hermansky, Mathew Magimai-Doss, 'Exploiting
Contextual Information For Improved Phoneme Recognition', Proceedings of IEEE Int. Conf.
on Acoustics, Speech, and Signal Processing (ICASSP)", 2008
81. Fabio Valente, Hynek Hermansky, 'Hierarchical and Parallel Processing of Modulation
Spectrum for ASR applications', Proceedings of IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing (ICASSP)", 2008
82. F. Valente and H. Hermansky, “Multi-stream Features Combination based on DempsterShafer Rule for LVCSR System”, in Proceedings of the International Conference on Spoken
Language Processing, Antwerp, Belgium 2007.
83. F. Valente and H. Hermansky, Combination of Acoustic Classifiers based on Dempster-Shafer
Theory of evidence, in Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal
Processing (ICASSP)", 2007.
84. Valente, Fabio, Vepa, Jithendra, Plahl, Christian, Gollan, Christian, Hermansky, Hynek,
Schlüter, Ralf , "Hierarchical neural networks feature extraction for LVCSR system", in
Proceedings of Interspeech-2007, 42-45.
85. P. Motlicek, V. Ullal, H. Hermansky, Wide-Band Perceptual Audio Coding based on
Frequency-Domain Linear Prediction, in Proceedings of IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing (ICASSP)", 2007
86. M. Prasanna and Hynek Hermansky, “MRASTA and PLP in Automatic Speech Recognition”,
in Proceedings of the International Conference on Spoken Language Processing, Antwerp,
Belgium 2007.
87. F. Valente and H. Hermansky, “Hierarchical Neural Networks Feature Extraction for LVCSR
System”, in Proceedings of the International Conference on Spoken Language Processing,
Antwerp, Belgium 2007.
J. Pinto, A. Lovitt and H. Hermansky, “Exploiting Phoneme Similarities in Hybrid HMMANN Keyword Spotting”, in Proceedings of the International Conference on Spoken
Language Processing, Antwerp, Belgium 2007.
H. Ketabdar, M. Hannemann, and H. Hermansky, “Detection of Out-of-Vocabulary Words in
Posterior Based ASR”, in Proceedings of the International Conference on Spoken Language
Processing, Antwerp, Belgium 2007.
F. Valente and H. Hermansky, “Combination of Acoustic Classifiers based on DempsterShafer Theory of Evidence,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, Honolulu, HI, 2007.
P. Motlicek, V. Ullal and H. Hermansky, “Wide-Band Perceptual Audio Coding based on
Frequency-Domain Linear Prediction,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, Honolulu, HI, 2007.
F. Valente and H. Hermansky, “Discriminant Linear Processing of Time-Frequency Plane,” in
Proceedings of the International Conference on Spoken Language Processing, Pittsburgh, PA,
P. Fousek and H. Hermansky, “Towards ASR Based on Hierarchical Posterior-Based
Keyword Recognition,” in Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing, Toulouse, France, 2006
P. Motlicek, H. Hermansky, H. Garudadri and N. Srinivasamurthy, “Speech Coding based on
Spectral Dynamics,” in Proceedings of the International Conference on Text, Speech and
Dialogue, Brno, Czech Republic, 2006.
H. Hermansky, P. Fousek and M. Lehtonen, “The Role of Speech in Multimodal HumanComputer Interaction (Towards Reliable Rejection of Non-Keyword Input),” in Proceedings
of the International Conference on Text, Speech and Dialogue, Carlsbad, Czech Republic,
Lehtonen, M., Fousek, P., Hermansky H.: Hierarchical Approach for Spotting Keywords,
Proceedings 2nd Workshop on Multimodal Interaction and Related Machine Learning
Algorithms - MLMI 05, Edinburgh, United Kingdom, July 2005.
H. Hermansky and P. Fousek, “Multi-resolution RASTA filtering for TANDEM-based ASR,”
in Proceedings of the European Conference on Speech Communication and Technology,
Lisbon, Portugal, 2005.
M. Athineos, H. Hermansky, and D. P. W. Ellis, "LP-TRAP: Linear Predictive Temporal
Patterns," Proc. ICSLP, pp. 1154-1157, Jeju, S. Korea, October 2004
M. Athineos, H. Hermansky and D. Ellis (2004) PLP^2: Autoregressive modeling of auditorylike 2-D spectro-temporal patterns: Proceedings ISCA Tutorial and Research Workshop
(ITRW) on Statistical and Perceptual Audio Processing (SAPA) ICC Jeju, Korea, October 3,
P. Fousek, P. Svojanovsky, F. Grezl and H. Hermansky, “New Nonsense Syllables Database
– Analyses and Preliminary ASR Experiments,” in Proceedings of the International
Conference on Spoken Language Processing, Jeju Island, Korea, 2004.
S. Sivadas and H. Hermansky, “On use of task-independent data in TANDEM feature
extraction,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Montreal, Canada, 2004.
S. Ikbal, H. Misra, H. Bourlard and H. Hermansky, “Phase AutoCorrelation (PAC) features in
Entropy based Multi-Stream for Robust Speech Recognition,” in Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada,
H. Misra, S. Ikbal, H. Bourlard and H. Hermansky, “Spectral Entropy Based Feature for
Robust ASR,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Montreal, Canada, 2004.
104. H. Hermansky and H. Bourlard, “Some Emerging Concepts in Speech Recognition,” Invited
Paper in Proceedings of Lectures by Masters in Speech Processing, Maui, HI, 2004
105. H. Hermansky, “Data-Guided Processing of Speech,” Invited Paper in Proceedings of the
International Workshop on Speech and Computer, Moscow, Russia, 2003.
106. H. Hermansky, “Data Guided Processing of Speech,” Invited Paper in Proceedings of the
Indian Workshop on Spoken Language Processing, Tata Institute of Fundamental Research
(TIFR), Mumbai, India, 2003.
107. H. Hermansky, “TRAP-TANDEM: Data-driven extraction of temporal features from speech,”
in Proceedings of the IEEE Workshop in Automatic Speech Recognition and Understanding,
U.S. Virgin Islands, 2003
108. A. Adami, L. Burget, S. Dupont, H. Garudadri, F. Grezl, H. Hermansky, P. Jain, S. Kajarekar,
N. Morgan and S. Sivadas, “QUALCOMM-ICSI-OGI Features for ASR,” in Proceedings of
the International Conference on Spoken Language Processing, Denver, CO, 2002.
109. S. Sivadas and H. Hermansky, “Hierarchical Tandem Feature Extraction,” in Proceedings of
the IEEE International Conference on Acoustics, Speech and Signal Processing, Tampa, FL,
110. A. Adami, S. Kajarekar and H. Hermansky, “New Speaker-Change Detection Algorithm for
Two-Speaker Segmentation,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, Tampa, FL, 2002.
111. S. Kajarekar, B. Yegnanarayana and H. Hermansky, “A Study of Two Dimensional Linear
Discriminants for ASR,” in Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing, Salt Lake City, UT, 2001.
112. C. Benitez, L. Burget, B. Chen, S. Dupont, H. Garudadri, H. Hermansky, P. Jain, S. Kajarekar,
N. Morgan and S. Sivadas, “Robust ASR Front-End Using Spectral-Based and Discriminant
Features: Experiments on the AURORA Tasks,” in Proceedings of the European Conference
on Speech Communication and Technology, Aalborg, Denmark, 2001.
113. H. Hermansky, “Human Speech Perception: Some Lessons from Automatic Speech
Recognition,” in Proceedings of the International Conference on Text, Speech and Dialogue,
Zelezna Ruda, Czech Republic, September 2001.
114. L. Burget and H. Hermansky, “Data Driven Design of Filter Bank for Speech Recognition,” in
Proceedings of the International Conference on Text, Speech and Dialogue, Zelezna Ruda,
Czech Republic, September 2001.
115. S. Kajarekar and H. Hermansky, “Analysis of Information in Speech and Its Application in
Speech Recognition,” in Proceedings of the International Conference on Text, Speech and
Dialogue, Brno, Czech Republic, 2000.
116. S. Sharma, D. Ellis, S. Kajarekar, P. Jain and H. Hermansky, “Feature Extraction Using Nonlinear Transformation for Robust Speech Recognition on the AURORA Data-Base,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, Istanbul, Turkey, 2000.
117. H. Hermansky, D. Ellis and S. Sharma, “Connectionist Feature Extraction for Conventional
HMM Systems,” in Proceedings of the IEEE International Conference on Acoustics, Speech
and Signal Processing, Istanbul, Turkey, 2000.
118. S. Kajarekar, N. Malayath and H. Hermansky, “ANOVA in Modulation Spectral Domain,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, Istanbul, Turkey, 2000.
119. S. Sivadas, P. Jain and H. Hermansky, “Discriminative MLPs in HMM-Based Recognition of
Speech in Cellular Telephony,” in Proceedings of the International Conference on Spoken
Language Processing, Beijing, China, 2000.
120. S. Kajarekar and H. Hermansky, “Optimization of Units for Continuous-Digit Recognition
Task,” in Proceedings of the International Conference on Spoken Language Processing,
Beijing, China, 2000.
121. P. Jain and H. Hermansky, “Temporal Patterns of Critical-Band Spectrum for Text-toSpeech,” in Proceedings of the International Conference on Spoken Language Processing,
Beijing, China, 2000.
122. H. Hermansky, S. Sharma and P. Jain, “Data-derived Nonlinear Mapping For Feature
Extraction in HMM,” in Proceedings of the IEEE Workshop in Automatic Speech Recognition
and Understanding, Keystone, CO, 1999.
123. S. Kajarekar, N. Malayath and H. Hermansky, “Analysis of Speaker and Channel Variability
in Speech,” in Proceedings of the IEEE Workshop in Automatic Speech Recognition and
Understanding, Keystone, CO, 1999.
124. H. Hermansky and P. Jain, “Down-Sampling Speech Representation in ASR,” in Proceedings
of the European Conference on Speech Communication and Technology, Budapest, Hungary,
125. S. Kajarekar, N. Malayath and H. Hermansky, “Analysis of Source of Variability in Speech,”
in Proceedings of the European Conference on Speech Communication and Technology,
Budapest, Hungary, 1999.
126. H. Yang, S. van Vuuren and H. Hermansky, “Relevancy of Time-Frequency Features for
Phonetic Classification Measured by Mutual Information,” in Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, 1999.
127. H. Hermansky and S. Sharma, “Temporal Patterns (TRAPS) in ASR of Noisy Speech,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, Phoenix, AZ, 1999.
128. H. Hermansky, “Mel Cepstrum, Deltas, Double-Deltas, … – What Else Is New?” Invited
Paper in Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse
Conditions, Tampere, Finland, 1999.
129. S. van Vuuren and H. Hermansky, “On the Importance of Components of the Modulation
Spectrum for Speaker Verification,” in Proceedings of the International Conference on Spoken
Language Processing, Sydney, Australia, 1998.
130. S. van Vuuren and H. Hermansky, “Speaker Recognition in a Time-Feature Space,”
Proceedings of the NIST Speaker Recognition Workshop, Gaithersburg, MD, 1998.
131. H. Hermansky, “Data-Driven Analysis of Speech,” Invited Paper, in Proceedings of the
International Conference on Text, Speech and Dialogue, Brno, Czech Republic, 1998.
132. H. Hermansky and N. Malayath, “Speaker Verification Using Speaker-Specific Mappings,” in
Proceedings of the Workshop on Speaker Recognition and its Commercial and Forensic
Applications, Avignon, France, 1998.
133. S. van Vuuren and H. Hermansky, “MESS: A Modular Efficient Speaker Verification
System,” in Proceedings of the Workshop on Speaker Recognition and its Commercial and
Forensic Applications, Avignon, France, 1998.
134. S. Sharma, P. Vermeulen and H. Hermansky, “Combining Information from Multiple
Classifiers for Speaker Verification,” in Proceedings of the Workshop on Speaker Recognition
and its Commercial and Forensic Applications, Avignon, France, 1998.
135. H. Hermansky and S. Sharma, “TRAPS - Classifiers of Temporal Patterns,” in Proceedings of
the International Conference on Spoken Language Processing, Sydney, Australia, 1998.
136. H. Hermansky and N. Malayath, “Spectral Basis Functions from Discriminant Analysis,” in
Proceedings of the International Conference on Spoken Language Processing, Sydney,
Australia, 1998.
137. B. Yegnanarayana, P. S. Murthy, C. Avendano and H. Hermansky, “Enhancement of
Reverberant Speech Using LP Residual,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, Seattle, WA, 1998.
138. N. Kanedera, H. Hermansky and T. Arai, “Desired Characteristics of Modulation Spectrum for
Robust Automatic Speech Recognition,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, Seattle, WA, 1998.
139. S. Tibrewala and H. Hermansky, "Multi-Stream Approach in Acoustic Modeling ," in Proc.
LVCSR-Hub5 Workshop,. May 1997
140. H. Hermansky, “The Modulation Spectrum in Automatic Recognition of Speech,” in
Proceedings of the IEEE Workshop in Automatic Speech Recognition and Understanding,
Santa Barbara, CA, 1997.
141. N. Malayath, H. Hermansky and A. Kain, “Towards Decomposing the Sources of Variability
in Speech,” in Proceedings of the European Conference on Speech Communication and
Technology, Rhodos, Greece, 1997.
142. H. Hermansky, “Speculations on Knowledge Versus Data in ASR,” in Proceedings of the
ESCA-NATO Tutorial and Research Workshop on Robust Speech Recognition for Unknown
Communication Channels, Pont-a-Mousson, France, 1997.
143. H. Hermansky, C. Avendano, S. van Vuuren and S. Tibrewala, “Recent Advances in
Addressing Sources of Non-Linguistic Information,” in Proceedings of ESCA-NATO Tutorial
and Research Workshop on Robust Speech Recognition for Unknown Communication
Channels, Pont-a-Mousson, France, 1997.
144. H. Hermansky, “Auditory Modeling in Automatic Recognition of Speech,” Survey Keynote
Paper in Proceedings of The First European Conference on Signal Analysis and Prediction,
Prague, Czech Republic, 1997.
145. M. Pavel and H. Hermansky, “Information Fusion by Human and Machine,” Proceedings of
the First European Conference on Signal Analysis and Prediction, Prague, Czech Republic,
146. C. Avendano and H. Hermansky, “On the Properties of Temporal Processing for Speech in
Adverse Environments,” Proceedings of The IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics, Mohonk Mountain House, New Paltz, NY, 1997.
147. H. Hermansky, “Should Recognizers Have Ears?” invited tutorial paper in Proceedings of the
ESCA-NATO Tutorial and Research Workshop on Robust Speech Recognition for Unknown
Communication Channels, Pont-a-Mousson, France, 1997
148. S. Tibrewala and H. Hermansky, “Multi-band and Adaptation Approaches to Robust Speech
Recognition,” in Proceedings of the European Conference on Speech Communication and
Technology, Rhodos, Greece, 1997.
149. S. van Vuuren and H. Hermansky, “Data-Driven Design of RASTA-Like Filters,” in
Proceedings of the European Conference on Speech Communication and Technology, Rhodos,
Greece, 1997.
150. B. Yegnanarayana, C. Avendano, H. Hermansky and P. S. Murthy, “Processing Linear
Prediction Residual for Speech Enhancement,” in Proceedings of the European Conference on
Speech Communication and Technology, Rhodos, Greece, 1997.
151. N. Kanedera, T. Arai, H. Hermansky and M. Pavel, “On the Importance of Various
Modulation Frequencies for Speech Recognition,” in Proceedings of the European Conference
on Speech Communication and Technology, Rhodos, Greece, 1997.
152. S. Tibrewala and H. Hermansky, “Sub-band Based Recognition of Noisy Speech,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, Munich, Germany, 1997.
153. H. Bourlard, H. Hermansky and N. Morgan, “Copernicus and ASR Challenge: Waiting for
Kepler,” invited keynote paper in Proceedings of Advanced Research Projects Agency
Workshop on Automatic Speech Recognition, Harriman, NY, 1996.
154. H. Hermansky, S. Tibrewala and M. Pavel, “Towards ASR on Partially Corrupted Speech,” in
Proceedings of the International Conference on Spoken Language Processing, Philadelphia,
PA, 1996.
155. C. Avendano, H. Hermansky, M. Vis and A. Bayya, “Adaptive Speech Enhancement System
Based on Frequency-Specific SNR Estimation,” in Proceedings of the Third International
Workshop on Interactive Voice Technology for Telecommunications Applications, Baskin
Ridge, NJ, 1996.
C. Avendano, S. van Vuuren and H. Hermansky, “Data Based Filter Design for RASTA-Like
Channel Normalization in ASR,” in Proceedings of the International Conference on Spoken
Language Processing, Philadelphia, PA, 1996.
T. Arai, M. Pavel, H. Hermansky and C. Avendano, “Intelligibility of Speech with Filtered
Time Trajectories of Spectral Envelopes,” in Proceedings of the International Conference on
Spoken Language Processing, Philadelphia, PA, 1996.
C. Avendano and H. Hermansky, “Study on the Dereverberation of Speech Based on
Temporal Envelope Filtering, ” in Proceedings of the International Conference on Spoken
Language Processing, Philadelphia, PA, 1996.
H. Hermansky, S. Greenberg and M. Pavel, “A Brief (100-200 ms) History of Time in Feature
Extraction of Speech,” in Proceedings of the XV Annual Speech Research Symposium,
Baltimore, MD, 1995.
H. Hermansky and M. Pavel, “Psychophysics of Speech Engineering Systems,” invited paper
in Proceedings of the 13th International Congress on Phonetic Sciences, Stockholm, Sweden,
H. Hermansky, “Exploring Temporal Domain for Robustness in Speech Recognition,” invited
paper in Proceedings of the 15th International Congress on Acoustics, Trondheim, Norway,
H. Hermansky, C. Avendano and E. Wan, “Noise Reduction and Recovery of Missing
Frequencies in Speech,” in Proceedings of the XV Annual Speech Research Symposium,
Baltimore, MD, 1995.
C. Avendano, H. Hermansky and E. Wan, “Beyond Nyquist: Recovery of Wide-band Speech
from Narrow-band Speech,” in Proceedings of the European Conference on Speech
Communication and Technology, Madrid, Spain, 1995.
H. Hermansky, E. Wan and C. Avendano, “Speech Enhancement Based on Temporal
Processing,” Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Detroit, MI, 1995.
N. Morgan, H. Bourlard, S. Greenberg and H. Hermansky, “Stochastic Perceptual Models of
Speech,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Detroit, MI, 1995.
N. Morgan, H. Bourlard, S. Greenberg and H. Hermansky, “Stochastic Perceptual AuditoryEvent-Based Models for Speech Recognition,” in Proceedings of the International Conference
on Spoken Language Processing, Yokohama, Japan, 1994.
J. Koehler, N. Morgan, H. Hermansky, H. G. Hirsch and G. Tong, “Integrating RASTA-PLP
Into Speech Recognition,” in Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing, Adelaide, Australia, 1994.
H. Hermansky, E. A. Wan and C. Avendano, “Noise Suppression in Cellular
Communications,” in Proceedings of the Second International Workshop on Interactive Voice
Technology for Telecommunications Applications, Kyoto, Japan, 1994.
H. Hermansky, “Speech beyond 20 ms: Speech Processing in Temporal Domain,” invited
keynote lecture in Proceedings of the International Workshop on Human Interface
Technology, Aizu, Japan, 1994.
H. Hermansky, N. Morgan and H. G. Hirsch, “Recognition of Speech in Additive and
Convolutional Noise Based On RASTA Spectral Processing,” in Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Minneapolis, MN,
H. Hermansky, N. Morgan, A. Bayya and P. Kohn, “RASTA-PLP Speech Analysis
Technique,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, San Francisco, CA, 1992.
172. H. Hermansky, “Recognition of Information-bearing Elements in Speech”, Journal of the
Acoustical Society of America, Vol. 114, No 5, Pt. 2, 2003.
173. H. Hermansky and N. Morgan, “Towards Handling Acoustic Environment in Spoken
Language Processing,” Proc. International Conference on Speech and Language Processing,
Banff, Canada, 1992.
174. H. Hermansky, “Ignorance-Based Verification of Some Concepts in Hearing,” in Proceedings
of the Workshop on Speech Research, Indian Institute of Technology, Madras, India, 1992
175. N. Morgan and H. Hermansky, “RASTA Extensions, Robustness to Additive and Convolutive
Noise,” in Proceedings of the Workshop on Speech Processing in Adverse Environments,
Cannes, France, 1992.
176. H. Hermansky, N. Morgan, A. Bayya and P. Kohn, “The Challenge of Inverse-E,”
Proceedings of the IEEE Asilomar Conference on Signal, Systems and Computers, Asilomar,
CA, 1991.
177. H. Hermansky, N. Morgan, A. Bayya and P. Kohn, “Compensation for the Effect of the
Communication Channel in Auditory-Like Analysis of Speech,” in Proceedings of the
European Conference on Speech Communication and Technology, Genova, Italy, 1991.
178. H. Hermansky, “In Search of Linguistic Information in Speech,” Proceedings of the Arden
House Speech Recognition Workshop, Harriman, NY, 1991.
179. H. Hermansky and A. L. Cox Jr., “Perceptual Linear Predictive (PLP) Analysis-Resynthesis
Technique,” in Proceedings of the European Conference on Speech Communication and
Technology, Genova, Italy, 1991.
180. N. Morgan, H. Hermansky, H. Bourlard, P. Kohn and C. Wooters, “Continuous Speech
Recognition Using PLP Analysis with Multilayer Perceptrons,” in Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 1991.
181. Bayya and H. Hermansky, “Towards Feature-Based Speech Metric,” in Proceedings of the
IEEE International Conference on Acoustics, Speech and Signal Processing, Albuquerque,
NM, 1990.
182. H. Hermansky and D. Broad, “The Effective Second Formant F2' and the Vocal Tract Front
Cavity,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Glasgow, Scotland, 1989.
183. H. Hermansky and J. C. Junqua, “Optimization of Perceptually Based ASR Front End,” in
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, New York, NY, 1988.
184. H. Hermansky, “Automatic Speech Recognition and Human Auditory Perception,” in
Proceedings of the European Conference on Speech Communication and Technology,
Edinburgh, Scotland, 1987.
185. H. Hermansky, “Role of Relative Positions of Formant and Harmonic Peaks in Perception of
Vowel-Like Stimuli,” STL Technical Reports, Santa Barbara, CA, 1987.
186. H. Hermansky, “An Efficient Speaker-Independent Automatic Speech Recognition By
Simulation of Some Properties of Human Auditory Processing,” in Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, 1987.
187. H. Javkin, H. Hermansky and H. Wakita, “Interaction Between Formant and Harmonic Peaks
in Vowel Perception,” Proceedings of Eleventh International Congress of Phonetic Sciences,
Tallin, Estonia, 1987.
188. H. Hermansky, K. Tsuga, S. Makino and H. Wakita, “Perceptually Based Processing in
Automatic Speech Recognition,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, Tokyo, Japan, 1986.
189. H. Hermansky, B. A. Hanson and H. Wakita, “Perceptually Based Linear Predictive Analysis
of Speech,” in Proceedings of the IEEE International Conference on Acoustics, Speech and
Signal Processing, Tampa, FL, 1985.
190. H. Hermansky, B. A. Hanson, H. Wakita and H. Fujisaki, “Linear Predictive Modeling of
Speech in Modified Spectral Domains,” Proceedings of International Conference on Digital
Processing of Signals in Communications, Institution of Electronic and Radio Engineers,
Loughbrorough, England, 1985.
H. Hermansky, H. Fujisaki and Y. Sato, “Spectral Envelope Sampling and Interpolation in
Linear Predictive Analysis of Speech,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, San Diego, CA, 1984.
H. Hermansky, H. Fujisaki and Y. Sato, “Analysis and Synthesis of Speech Based on Spectral
Transform Linear Predictive Method,” in Proceedings of the IEEE International Conference
on Acoustics, Speech and Signal Processing, Boston, MA, 1983.
H. Hermansky, H. Fujisaki and Y. Sato, “Spectral Transform Linear Predictive Analysis for
Analysis-Synthesis of Speech,” Proceedings of 11e Congress International d'Acoustique,
Paris, France, 1983.
H. Hermansky, H. Fujisaki and Y. Sato, “Speech Analysis-Synthesis System Based on STLP
Method,” Proceedings of the Spring Meeting of the Acoustical Society of Japan, Tokyo, 1983.
H. Hermansky, H. Fujisaki and Y. Sato, “On Analysis of Real Speech Based on STLP
Method,” Proceedings of the Spring Meeting of the Acoustical Society of Japan, Nagaoka,
Japan, 1982.
H. Hermansky, H. Fujisaki and Y. Sato, “Spectral Envelope Sampling in Linear Predictive
Analysis of Speech,” Proceedings of the Spring Meeting of the Acoustical Society of Japan,
Tokyo, Japan, 1982.
H. Hermansky and H. Fujisaki, “Spectral Transforms in Linear Prediction,” Proceedings of the
Annual Meeting of the Institute of Electronics and Communication Engineers of Japan,
Tokyo, Japan, 1982.
H. Hermansky, “Improved Linear Predictive Analysis of Speech Based on Spectral
Processing,” Dr. Eng. Thesis, University of Tokyo, Tokyo, Japan, 12.
H. Hermansky, H. Fujisaki and Y. Sato, “Sampling and Interpolation of Spectral Envelopes in
Linear Predictive Analysis of Speech,” Transactions of the Committee on Speech Research,
Acoustical Society of Japan, 1982, pp. 97-104.
H. Hermansky and H. Fujisaki, “Spectral Transforms in Linear Predictive Analysis,”
Proceedings of the Fall Meeting of the Acoustical Society of Japan, Kagoshima, Japan, 1981.
H. Hermansky and H. Fujisaki, “LPC Methods in the Short Time Analysis of Voiced Speech,”
Proceedings of the Spring Meeting of the Acoustical Society of Japan, Tokyo, 1981.
H. Hermansky and H. Fujisaki, “The Effect of Spectral Transforms in Linear Predictive
Analysis of Speech,” Transactions of the Committee on Speech Research, Acoustical Society
of Japan, 1981, pp. 365-372.
H. Hermansky and H. Fujisaki, “Acoustic Characteristics of Czech Vowels, Research on
Human Information Processing,” in Proceedings of the Fall Meeting of the Acoustical Society
of Japan, Shimizu, Japan, 1980.
Processing of excitation in audio coding and decoding (Coding and decoding of
time-varying FM excitation signal in Frequency Domain Linear Predictive coder)
8,027,242 Signal coding and decoding based on spectral dynamics (Principle of using
Frequency Domain Linear Prediction for coding and decoding of speech for speech
7,672,838 Systems and methods for speech recognition using frequency domain linear
prediction polynomials to form temporal and spectral envelopes from frequency domain
representations of signals (A technique for deriving spectro-temporal patterns of speech signal
using parametric autoregressive models)
7,254,538 Nonlinear mapping for feature extraction in automatic speech recognition (A
technique for applying artificial neural nets in feature computation for conventional machine
recognition of speech).
7,089,178 Multistream network feature processing for a distributed speech recognition system
(A technique for computing and handling features for speech recognition in cellular telephony).
6,098,038 Method and system for adaptive speech enhancement using frequency specific
signal-to-noise ratio estimates (A technique for enhancing quality of noisy speech when the
noise in not stationary).
5,878,389 Method and system for generating an estimated clean speech signal from a noisy
speech signal (A data-driven technique for suppressing noise in the signal based on temporal
5,537,647 Noise resistant auditory model for parameterization of speech (A technique for
recognizing speech corrupted by additive noise and by linear distortions).
5,450,522 Auditory model for parameterization of speech (A technique for recognizing speech
corrupted by linear distortion).
5,165,008 Speech synthesis using perceptual linear prediction parameters (A technique for
synthesizing speech of different speakers from a single set of speaker-independent
Four additional patent applications are pending:
20110270616 Spectral noise shaping in audio coding based on spectral dynamics in frequency
20090198500 Temporal masking in audio coding based on spectral dynamics in frequency subbands
20030204394 Distributed voice recognition system utilizing multistream network feature
20030004720 System and method for computing and transmitting parameters in a distributed
voice recognition system
1) Mentoring and Research Supervision
Postdoctoral Fellows
Prof. Takayuki Arai (currently professor at Sophia University, Tokyo, Japan)
Prof. Jan Cernocky (currently associate professor and department head at the Brno University of
Technology, Czech Republic)
Dr. Fabio Valente (IDIAP Switzerland)
Dr. Petr Motlicek. (IDIAP Switzerland and Brno University of Technology, Czech Republic)
Dr. Joseph Keshet (Toyota Research Institute, Chicago)
Dr. Nima Mesgarani (University of California San Francisco)
Graduate Students
PhD advisees graduated (First Employer):
Dr. Carlos Avendano (UC Davis)
Dr. Sarel van Vuuren (University of Colorado at Boulder),
Dr. Sangita Sharma (Intel Inc.),
Dr. Naren Malayath (Qualcomm Inc.),
Dr. Sachin Kajarekar (SRI Palo Alto),
Dr. Pratibha Jain (SRI Palo Alto),
Dr. Andre Adami (OHSU Portland).
Dr. Sunil Sivadas (Nokia Finland)
Dr. Petr Fousek (LIMSI France)
Dr. Carolina Parada (Google Research)
Dr. Sriram Ganapathy (IBM Research)
Dr. Sivaram Garimella (Amazon Research)
Dr. Samuel Thomas (IBM Resarch)
Dr. Keith Kitzley (Naval Academy, Annapolis, MD)
Co-adviser or former adviser
Dr. R. Padmanabhan (IIT Madras, India)
Dr. Lukas Burget (Brno University of Technology, Czech Republic)
Dr. Frantisek Grezl (Brno University of Technology, Czech Republic)
Dr. Petr Schwartz (Brno University of Technology, Czech Republic)
Dr. Pavel Matejka (Brno University of Technology, Czech Republic)
Dr. Shajith Ikbal Mohamed (IBM Bangalore, India)
Dr. Joel Pinto (Swiss Federal Institute of Technology, Lausanne)
Dr. Nicolas Scaringella (Swiss Federal Institute of Technology, Lausanne)
Current PhD advisees:
1. Vijay Pendini
2. Sri Harish Mallidi
3. Nagaraj
4. Phani
5. Ladislav Jerabek
Examiner of PhD. Thesis and Member of the Thesis Committee
Aalto Unviversity, Finland (Dr. Hannu Pulakka)
Indian Institute of Technology Madras (Dr. R. Padmanabhan)
The Johns Hopkins University (Dr. Ariya Rastow)
Nitte Meenakshi Institute of Technology, Hyderabad, India (Dr, Anandthrirtha B. Gudi)
Swiss Federal Institute of Technology Zurich (ETH) (Dr. Tobias Kaufmann)
Swiss Federal Institute of Technology at Lausanne (EPFL) (Dr. Hesam Sagha, Dr. Mihai
Gurban, Dr. Hemant Misra)
Columbia University (Dr. Marios Athineos)
University of California Berkeley (Dr. Michael Shire)
Carnegie-Mellon University (Dr. Dr. Changwoo Kim, Dr. Xiang Li, Dr. Juan Huerta)
Tampere University of Technology (Dr. Kari Laurila, Dr. Imre Kiss)
Orsay University, Paris (Dr. Jan Cernocky)
University of Nijmegen (Dr. Febe de Wet)
University of Erlangen (Dr. Georg Stemmer)
Indian Institute of Technology, Madras (Dr. R. Sundar, Dr. C, Chandra Sekhar, Dr. Suryakanth
V Gangashetty, Dr. Shafi Ullah Khan, Dr. R. Padmanabhan )
Oregon Graduate Institute of Science and Technology (Dr. Brian Mak, Dr. Johan Wouters, Dr.
Paul Hossom, Dr. Chaojun Liu, Dr. Xintian Wu, Dr.David Burnett, Dr. Ravi Sharma)
Brno University of Technology (Dr. Petr Motlicek, Dr. Lukas Burget)
Service for Speech and Language Research Community
Distinguished Lecturer, International Speech Communication Association, 2013-2014
Elected Member of the Board, International Speech Communication Association, 2013-ongoing
Organizer, Johns Hopkins Summer Workshop on Speaker Recognition, June-July 2013
General Chair, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding,
Olomouc, Czech Republic, 2013
Organizer of the ONR-sponsored Workshop on Temporal Dynamics in Speech and Hearing,
Antwerp, Belgium, 2007
Technical Chair, IEEE Conference on Acoustics, Speech, and Signal Processing, Seattle,
Washington, 1988
General Co-Chair, Specom-2001, Moscow, Russia
Executive Chair, yearly ISCA-sponsored workshops on Text, Speech and Dialogue, Czech
Republic, ten consecutive times, 2000-2010
General Chair, yearly ISCA-sponsored workshops on Text, Speech and Dialogue, Czech
Republic, three consecutive times, 2011-2013
Member of the Editorial Board, Speech Communication, Elsevier Publishers
Member of the Editorial Board, Phonetica, Karger Publishers
Member of the Editorial Board, Journal of Negative Results in Speech and Audio Sciences
Appointed Technical Expert, European Community
Past Member of the Board, International Speech Communication Association
Past Member, IEEE Technical Committee on Speech Processing
Past Associate Editor, IEEE Transactions on Speech an Acoustics
Administrative academic service
 Director, Center for Language and Speech Processing, the Johns Hopkins University, 2012ongoing
 Chair of the CSLP Faculty Hiring Committee, 2013
 ECE Department Representative on Ad Hoc Academic Promotion Committee, 2013
 Interim Director, Center for Language and Speech Processing, the Johns Hopkins University,
 Member of the Academic Promotion Committee, Oregon Graduate Institute of Science and
 Director of the Center for Information Technology, Oregon Graduate Institute of Science and
Academic Honors:
2013 International Speech Communication Association Medal for Scientific Achievement, the
highest honor awarded once a year to a single person by the ISCA
Fellow of the Institute of Electrical and Electronic Engineers, for Invention and Development of
Perceptually-Based Speech Processing Methods
Fellow of the International Speech Communication Association, for Pioneering Bio-Inspired
Approaches To Processing Of Speech
Speech Committee Nominee for the IEEE Signal Processing Technical Achievement Award
(2003, 2004, 2012)
External Fellow, International Computer Science Institute, Berkeley, California
Guest Professor, Chinese Academy of Sciences, Beijing, China
Fellow, Hanse Institute for Advanced Studies, Delmenhorst, Germany
Japanese Ministry of Education Fellowship (1978-1983)
Page 212