ELMo provides contextual embeddings from a bidirectional LM, while ULMFiT transfers a pretrained LM to new tasks via careful fine-tuning (discriminative rates, gradual unfreezing). In seq-to-seq models, attention lets the decoder focus on relevant encoder states at each step, removing the single-vector bottleneck and boosting performance.
Early NLP transfer learning introduced the idea of reusing knowledge from large, generic language models for downstream tasks. ELMo generates contextual word representations from a deep, bidirectional LSTM language model, letting the same word take different vectors depending on sentence context and improving tasks from NER to sentiment. ULMFiT pretrains a language model on a large corpus, then fine-tunes it for target tasks using techniques like discriminative fine-tuning, slanted triangular learning rates, and gradual unfreezing, delivering strong text classification with modest data. Complementing these advances, the attention mechanism in seq-to-seq architectures allows a decoder to query encoder states dynamically, overcoming the fixed-bottleneck limitation, improving alignment and translation quality, and setting the stage for modern attention-based models.
Stefan received a PhD from the University of Hamburg in 2007, where he also completed his habilitation on decision analysis and support using ensemble forecasting models in 2012. He then joined the Humboldt-University of Berlin in 2014, where he heads the Chair of Information Systems at the School of Business and Economics. He serves as an associate editor for the International Journal of Business Analytics, Digital Finance, and the International Journal of Forecasting, and as department editor of Business and Information System Engineering (BISE). Stefan has secured substantial amounts of research funding and published several papers in leading international journals and conferences. His research concerns the support of managerial decision-making using quantitative empirical methods. He specializes in applications of (deep) machine learning techniques in the broad scope of marketing and risk analytics. Stefan actively participates in knowledge transfer and consulting projects with industry partners; from start-up companies to global players and not-for-profit organizations.