Teaching Computers to Read Music and Introduction to Speech Processing

  Machine learning

Teaching Computers to Read Music - Jan Hajič
Optical Music Recognition (OMR) is a field of research that attempts to computationally read music notation. Its users range from librarians and musicologists to active musicians and composers. There are several reasons why OMR is a difficult problem that defies the analogy to its much more mature cousin, OCR: mainly the featural nature of music notation itself, which is in principle distinct from all systems used to graphically capture natural languages. Furthermore, there is the expectation that OMR will produce not only a logical description of the music notation document itself, but that it also infers the musical semantics encoded by the music notation.

Machine learning — and specifically deep learning techniques developed in computer vision — is a natural fit for dealing with many of these complexities, especially with respect to the input. In this talk, I will present significant recent contributions to OMR — both with respect to underlying work that makes it possible to formulate OMR as a machine learning task, and to the machine learning aspects themselves.

Introduction to Speech Processing for Voice Assistants - Ondřej Plátek
What are the speech processing tasks needed for spoken voice assistants? In the talk, we will review tasks like incremental ASR, voice-activity detection, end-pointing, speaker recognition, diarization, beam-forming, LM modeling, inverse-text-normalization.
The talk should give you an introduction to the field of speech processing by introducing zoo of tasks. The machine-learning tasks will be motivated and introduced by their role in voice assistants.
We will briefly cover the latest architectures for some of the tasks but we will focus on what the state-of-the-art results mean for our human high expectations about speech understanding.

Language: English

- 17:45 - 18:00 - Your arrival
- 18:00 - 18:40 - Teaching Computers to Read Music
- 18:40 - 18:50 - Short break
- 18:50 - 19:30 - Introduction to Speech Processing for Voice Assistants
- 19:30 - 22:00 - Networking in Bitcoin Coffee