[Tutorial 2] Deep Learning for Audio and Text Modeling

Date & Time:

Sunday, October 7, 9:00 – 12:00


2F R03 (Gibraltar)


Konstantin Markov, Jen-Tzung Chien and Sakriani Sakti


This tutorial will introduce deep machine learning starting with its success stories in wide range of areas such as computer vision, speech recognition, natural language processing and understanding, document summarization, image caption generation, dialogue modeling, recommendation systems, question answering, machine translation, etc. After presenting some basic concepts and the most popular deep neural network structures, in debt description of deep learning methods for audio signal processing will be given. They include end-to-end speech recognition, DNN based speech synthesis and full speech-tospeech translation systems. Another rapidly growing DL application area is the music information processing and retrieval. Among the main tasks in this area are music genre and emotion recognition, cover song detection, music transcription and synchronization, to name a few. The third part of the tutorial will focus on “symbolic” data processing with deep neural methods including variational neural network, adversarial neural network, attention model, memory-based neural network, and neural Turing machine. We will present how these models are connected and why they work for a variety of applications on symbolic and complex signals or patterns. At last, we will point out some deep learning limitations and failures, try to clear up some believes and misconceptions about the DL/AI and present number of
directions and outlooks for future studies.

Short Biography:

Konstantin Markov has been a Senior Associate Professor at the University of Aizu, Japan since 2009, Prior to 2009, he was a Research Scientist at the Advanced Telecommunications Research (ATR), Japan, which he joined after receiving his PhD at Toyohashi University of Technology, Japan in 1999. His research interests include Machine Learning, Deep Learning, Signal Processing with applications to Speech, Music, and Natural language processing. He’s been a program member of various international conferences such as Interspeech, ICASSP, EUSIPCO, and SpeCom as well as reviewer for several IEEE and Elsevier scientific journals.

Jen-Tzung Chien received his PhD degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 1997. He is now with the Department of Electrical and Computer Engineering and the Department of Computer Science at the National Chiao Tung University, Hsinchu, where he is currently a Chair Professor. He was a visiting researcher with the IBM T. J. Watson Research Center, Yorktown Heights, NY, in 2010. His research interests include machine learning, deep learning, natural language processing and computer vision. Dr. Chien served as the associate editor of the IEEE Signal Processing Letters in 2008-2011, the guest editor of the IEEE Transactions on Audio, Speech and Language Processing in 2012, the organization committee member of ICASSP 2009, ISCSLP 2016, MLSP 2018, the area coordinator of Interspeech 2012, EUSIPCO 2017, 2018, the program chair of ISCSLP 2018, the general chair of MLSP 2017, and currently serves as an elected member of IEEE Machine Learning for Signal Processing Technical Committee. He received the Best Paper Award of IEEE Automatic Speech Recognition and Understanding Workshop in 2011 and the AAPM Farrington Daniels Paper Award in 2018. He has published extensively including the book “Bayesian Speech and Language Processing”, Cambridge University Press, 2015. He was the tutorial speaker for APSIPA 2013, ISCSLP 2014, Interspeech 2013, 2016 and ICASSP 2012, 2015 and 2017.

Sakriani Sakti received her B.E. degree in Informatics (cum laude) from Bandung Institute of Technology, Indonesia, in 1999. In 2000, she received DAAD-Siemens Program Asia 21st Century Award to study in Communication Technology, University of Ulm, Germany, and received her MSc degree in 2002. During her thesis work, she worked with Speech Understanding Department, DaimlerChrysler Research Center, Ulm, Germany. Between 2003-2009, she worked as a researcher at ATR SLC Labs, Japan, and during 2006-2011, she worked as an expert researcher at NICT SLC Groups, Japan. While working with ATRNICT, Japan, she continued her study (2005-2008) with Dialog Systems Group University of Ulm, Germany, and received her PhD degree in 2008. In 2009-2011, she served as a visiting professor of Computer Science Department, University of Indonesia (UI), Indonesia. In 2011-2017, she was an assistant professor at the Augmented Human Communication Laboratory, NAIST, Japan. She served also as a visiting scientific researcher of INRIA Paris-Rocquencourt, France, in 2015-2016, under “JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation”. Currently, she is a research associate professor at NAIST, as well as a research scientist at RIKEN, the Center of for Advanced Intelligent Project AIP, Japan. Her research interests include statistical pattern recognition, deep learning, multilingual speech recognition & synthesis, spoken language translation, affective dialog system, and cognitive communication.