Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge


Jahangir Alam, Gilles Boulianne, Lukáš Burget, Mohamed Dahmane, Mireia Diez, Ondrej Glembek, Marc Lalonde, Alicia Lozano-Diez, Pavel Matejka, Petr Mizera, Ladislav Mošner, Cédric Noiseux, Joao Monteiro, Ondrej Novotný, Oldrich Plchot, Johan Rohdin, Anna Silnova, Josef Slavıcek, Themos Stafylakis, Pierre-Luc St-Charles, Shuai Wang, Hossein Zeinali


Brno University of Technology, Speech@FIT and IT4I Center of Excellence, Brno, Czechia
Phonexia, Czechia
Speechlab, Shanghai Jiao Tong University, China
Omilia – Conversational Intelligence, Athens, Greece
CRIM, Montreal (Quebec), Canada
Audias-UAM, Universidad Autonoma de Madrid, Madrid, Spain

Publication Date

January 1, 2020

The present thesis addresses an important, open, Machine Learning problem, namely the automatic correction of the involuntary errors, made by humans, when communicating by written messages with chatbots. First, the problem is formulated as a “noisy-channel model” problem, and all the needed algorithms are developed, employing both, n-gram and Transformer-based language models. Next, a complete software framework is developed for solving the problem by employing Machine Learning methods, using Python and C++ libraries, and partially modifying them, resulting in a 20-fold increase in the processing speed for the specific problem. Finally, the developed software framework is used for performing Machine Learning experiments, using the publicly available corpora of “WikEd” and “W&I”. Although only a simple personal computer and limited use of cloud computing are used, and the publicly available corpora are not entirely appropriate for the machine training-tuning-testing procedures, certain interesting results are obtained, with respect to the relative efficiency of the various available methods for language processing. If, in the future, appropriate corpora become available and sufficient computer resources are used, it is expected that the developed software framework can provide acceptably efficient methods for the automatic text correction for chatbots.