Ramón Fernandez Astudillo
[Google Scholar] [GitHub] [LinkedIn] [Twitter, BlueSky][Talks]
I am currently Principal Research Scientist and manager at IBM Research AI in the T. J. Watson research center in Yorktown Heights, New York. Before this I was senior Research Scientist at Unbabel and associate researcher at INESC-ID in Lisboa and before that PhD candidate at TU Berlin. I got here starting from signal processing,then machine learning applied to speech and after that machine learning applied to natural and formal languages. While at it, deep learning happened. I ended up spending very large chunks of my life in Spain, Germany and Portugal and now in the Unites States. It is hard to find things that do not interest me, but artificial and crowd intelligence seem particularly motivating in this moment in history.
2024 Principal Research Scientist and Manager
- Talked at Cornell Tech about the BRAIn estimator from the perspective of Optimal Policy Distillation [slides]
2022 Principal Research Scientist
2021
-
Jiawei Zhou’s Structured-BART shows how BART can be fine-tuned to interiorize a parser’s state, yielding a new SoTA for AMR parsing and built-in alignments, at EMNLP 2021
-
Jiawei Zhou’s Action Pointer Transformer (APT) decouples node and token representations yielding a lighter, more performant, 100% coverage oracle for AMR parsing, at NAACL 2021
-
Peng Quian’s revisits Generative Parsing and Structural Scaffolds, showing they can increase linguistic generalization in Transformers, work with Prof. Roger Levi, under the MIT-IBM program, at ACL 2021
-
We release as OSS APT (
v0.4.2
) and Structured-BART (v0.5.1
) withintransition-amr-parser
https://github.com/IBM/transition-amr-parser
2020
-
Manuel Mager’s GPT-too paper reaches a new SoTA in AMR-to-txt, presented at ACL 2020
-
We achieve new SoTA in AMR-parsing by leveraging self-learning (silver parses, AMR-to-text, oracle mining) and cycle consistency.
-
We release as open source (Apache 2) our core tool, the
transition-amr-parser
, implementing the stack-Transformer (v0.3.4
) https://github.com/IBM/transition-amr-parser. -
Thrilled to start a collaboration with MIT’s Professor Roger Levi from the Department of Brain and Cognitive Sciences on is Neuro-Symbolic methods (MIT-IBM program).
2019 Research Staff Member
-
Murali Karthik’s paper shortlisted for best student paper at Interspeech 2019!. JSALT2018 follow-up applying cycle-consistency to fine-tune end-to-end ASR with unpaired speech and text [preprint]
-
Joined IBM research AI in NY as Research Staff Member at the Multilingual Natural Language Processing (MNLP) group in the T. J. Watson research center in Yorktown Heights.
-
After 6 years, Emmanuel and I are leaving our positions as president and secretary of the RoSP-SIG. Shinji Watanabe and Marc Delcroix are taking over.
-
After 9 great years as Research Associate at the Spoken Language Systems lab at INESC-ID in Lisbon, I am (reluctantly!) leaving INESC-ID to move to my next step. I leave great friends from which I have learned a lot, an amazing infrastructure and fun memories.
2018
-
Talked at some company in Berlin about the current state of Quality Estimation for Machine Translation (some details about our WMT 2018 work) [slides]
-
Visiting scholar at Johns Hopkins University for the 5th Jelinek Summer Workshop (JSALT2018). Investigated cicle-consistency losses to learn from unpaired text and speech at Takaaki Hori and Shinji Watanabe’s team.
-
After 2.5 years I am leaving Unbabel. What a ride. From 28 employees to 130, funding rounds A and B and a concentrated dose of the start-up life. A privilege to have been in a start-up that hires such amazing talent.
-
Talked at university of Hildeberg about the impact of deep learning in Spoken Language Translation (mostly about my separate knowledge of ASR and MT) [slides]
-
Co-created this years word and document-level Quality Estimation (QE) WMT tasks [paper]. Corpus builder for word-level QE with optional adequacy task (thought for NMT) [code]
-
Co-organized the Automatic Post-Editing and Quality Estimation workshop at AMTA 2018 [slides]
2016 Senior Research Scientist
-
INESC-ID/L2F got the winning system for Compare 2016 Native Language Identification task [paper]
-
Unbabel got the winning system for WMT 2016 word-level Quality Estimation task [paper]
2015
-
I am joining Unbabel!, a Y-combinator start-up focused on man-in-the-loop machine translation as senior researcher. Andre Martins and I will start Unbabel-X, its research division.
-
Co-organized the special session in Robust Speech Processing using Observation Uncertainty at Interspeech 2015 website
-
Was visiting scholar at the Language Technologies institute in Carnegie Melon University with the always inspiring Bhiksha Raj. Gave a talk about observation uncertainty in neural networks [slides]
-
Silvio Moreira got the winning system for SemEval 2015 task E [paper]. Also got best late-breaking system for task A with a transfer learning approach [paper,code]
2012
-
Emmanuel Vincent and I founded the ISCA Robust Speech Processing Special Interest group (RoSP-SIG) [website]
-
Held a tutorial on Uncertainty Handling for Robust Speech Recognition with Li Deng and Emmanuel Vincent at Interspeech 2012 [slides] (follow-up from my thesis)
2010 Post-doctoral Researcher
-
best area paper award in Robust Speech Recognition (shortlist SPECOM best paper award) attaining MMSE estimates with Uncertainty Propagation for ASR [paper,code]
-
Joined INESC-ID’s Spoken Language Systems lab with a FCT post-doctoral grant. Looking forward to apply my work on uncertainty modeling to neural networks and natural language processing
Phd
Obtained the Dr-Ing (PhD) title with distinction in 2010 in the fields of speech processing and robust automatic speech recognition with the thesis
Integration of Short-Time Fourier Domain Speech Enhancement and Observation Uncertainty Techniques for Robust Automatic Speech Recognition [pdf] [code]
My doctor-fathers were Reinhold Orglmeister and Rainer Martin, but I mostly have a doctor-mother, Dorothea Kolossa, who developed the initial idea and helped me kickstart my thesis.
In short, context for my thesis is the following
-
Speech enhancement (noise reduction, dereverberation, etc) is done in the STFT domain because of its multiple advantages (source independence, spatio-temporal filtering)
-
ASR happens in feature domains like log-Mel or MFCC, that are non-linear transformations of STFT. Here speech can be represented in a more robust and compact form.
-
It would be ideal to keep your speech enhancement models in STFT domain while doing estimates in e.g. log-Mel domain.
-
If we also derived a measure of enhancement uncertainty, there are well established methods to integrate this with ASR models and improve performance.
This was a relatively active topic in robust ASR at the time, with multiple competing approaches (feature/model based). My thesis contributions were basically
-
Notice that the Ephraim-Malah filters can be seen as propagating the uncertainty of the posterior distribution associated to a Wiener Filter in STFT domain, through the amplitude and log-amplitude non-linearities [IEEE TASLP article].
-
Exploiting this fact to transform a complex-Gaussian distributed model of the STFT into MFCC (log-Mel) and RASTA-PLP, deriving first and second order moments [Book Chapter].
-
Exploiting this fact to derive extensions of super-Gausian prior MMSE estimators based on mixture models [IEEE SPL article].
-
Showing that this can improve the robustness of ASR systems without retraining, including for performant methods such as the ETSI advanced front-end [IEEE STSP article].
Aside from my PhD, in my time in Berlin I also got to tutor the Neural Networks Seminar of EMSP, supervised Phillip Mandelartz’s Thesis and helped other students in the department with their projects and theses.
After defending my thesis I still spent some months at EMSP finishing papers. While finishing, a causality led me to discover Isabel Trancoso’s department at INESC-ID in Lisbon. After reading João Graças and Diamantino Caseiros papers, I decided to apply for a post-doctoral grant to transition from speech to natural language processing. Luckily, I was awarded with a 3+3 year FCT Post-Doctoral grant to join INESC-ID/L2F.
2006
I worked as an intern at Peiker Acustic for six months with the aid of a Leonardo grant and in collaboration with the TU-Berlin. The output was a spectral codebook-based speech reconstruction algorithm. Aside from realising how hard is to write a tech report in German, I also got to know better the Minimum Mean Spectral Amplitude and Log Spectral Amplitude Estimators (also known and Ephraim-Malah filters).
on this same year I was awarded with a La Caixa and the German Academic Exchange Service (DAAD) scholarship for research towards the Ph.D. degree at the EMSP department of the Technische Universität Berlin. I started working with Dorothea Kolossa on the topic of uncertainty propagation.
2005 Industrial Engineer, Electronics and Automatics
I got the Industrial Engineering degree with specialization in electronics and automatics at the Escuela Politecnica Superior de Ingenieria de Gijon (Spain). At the time, this was a 6 year multidisciplinary degree plus thesis. I got to learn a lot of math/phisics and all things engineering from industrial heat and cold to macro-economy, got also to play with all levels of programming languages from VHDL and x86 CISC to Visual C++ and MFCs (no Python sadly). On my free time, I started learning fuzzy logic and programmed my first neural network.
I did my last year at Technische Universität Berlin with the aid of an Erasmus grant (2004). My final thesis was also at TU-Berlin, at the Electronics and Medical Signal Processing (EMSP) department (2005). My thesis was directed by Dorothea Kolossa and the topic was prunning in tied-mixture Hidden Markov Models for Automatic Speech Recognition.