Going beyond pretraining: recent advances, applications and future directions for test-time compute and RL
Going beyond pretraining: recent advances, applications and future directions for test-time compute and RL
While large language models have traditionally relied on massive pretraining compute budgets, a paradigm shift is emerging that prioritizes test-time computation and reinforcement learning. This talk explores how techniques like chain-of-thought reasoning, self-consistency, and iterative refinement unlock new capabilities by investing compute during inference rather than training alone. We examine recent breakthroughs including RL with verifiable rewards, process reward model that enable reasoning models to achieve superhuman performance on many tasks including advanced mathematics, agentic capabilities. This talk covers practical concerns for RL and test-time compute, including RL training efficiency and designing domain-specific reward functions. Finally, we outline future research directions including hybrid architecture for combining reasoning and non-reasoning tasks, and potential for splitting memory and intelligence in LLMs.
Speaker’s profile
Bill Cai is a Senior Applied Scientist in the Generative AI Innovation Center in Amazon Web Services. His most recent research focuses on model optimisations and fine-tuning, including efficient deployments on hardware including NVIDIA and AWS Neuron devices, and optimizing model performance across multiple modalities for domain-specific/language-specific applications. He has most recently published in the area of NLP, CV and ML research in NAACL, ACM MM, ACM Web Conference, IEEE IOTJ, CVPR workshop and NeurIPS workshops. Bill holds a Master’s degree from Massachusetts Institute of Technology and a Bachelor’s degree from University of Chicago.