This course focuses on the theoretical foundations and mathematical models used for next-word prediction in Natural Language Processing. Topics include probability-based language models, Markov assumptions, n-gram models, and the transition to neural architectures such as word embeddings and recurrent neural networks. The emphasis is on understanding the core algorithms and their statistical properties, with limited practical examples.
This course provides a comprehensive, theory-driven introduction to next-word prediction in natural language processing (NLP). Starting from statistical language models and the mathematics of text prediction, the curriculum systematically explores the progression from n-gram models and Markov assumptions to neural architectures, recurrent neural networks (RNNs), sequence-to-sequence models, and transformers. Each topic is analyzed with a focus on underlying mathematical principles, probabilistic modeling, and algorithmic innovations. Students will gain a clear understanding of how language models are constructed, why certain approaches succeed or fail, and the theoretical basis for the major advances in predictive NLP.
Key Topics:
Foundations of statistical language modeling and information theory
Markov models and n-gram-based prediction
Probability distributions over text sequences
Smoothing methods for sparse data
Vector representations and word embeddings
Neural language models: architecture and training
Recurrent networks and memory in sequence processing
Encoder-decoder (seq2seq) models
Transformer architecture fundamentals
Evaluation metrics (e.g., perplexity)
Theoretical limitations and challenges in language modeling