Rui Abreu is a Research Software Engineer at Meta. He holds a Ph.D. in Computer Science - Software Engineering from the Delft University of Technology, The Netherlands, and a M.Sc. in Computer and Systems Engineering from the University of Minho, Portugal. His research revolves around software quality, with emphasis in automating the testing and debugging phases of the software development life-cycle as well as self-adaptation. He has extensive expertise in both static and dynamic analysis algorithms for improving software quality. He is the recipient of 5 Best Paper Awards, and his work has attracted considerable attention.
Software development in the era of AI is fraught with risk, especially in rapidly evolving large enterprise software organizations. In this talk Rui and Nachi share the tools Meta has implemented to mitigate risk. Specifically, Meta has developed, deployed, and enforced Diff Risk Score (DRS) and other code health metrics to tackle production risk. Equipped with a model that predicts if a code change might cause a product customer disruption, Meta developers can build features and workflows to improve almost every aspect of writing and pushing code. Today, DRS powers many risk-aware features that optimize product quality, developer productivity, and computational capacity efficiency. Notably, DRS has helped us eliminate major code freezes, letting developers ship code when they historically could not with minimal impact to customer experience and the business.
Topics and outline
Maintaining large-scale distributed systems poses significant challenges due to their complexity, scale, and the risks of live changes. This talk presents a case study of a system which processes vast volumes of items in real time for billions of users. Over nine months, this system underwent a live architectural refactoring to improve maintainability, developer efficiency, and reliability. Key strategies included staged rollouts, automated testing, and impact validation, resulting in a 42% boost in developer efficiency, 87% reliability improvement, and notable gains in system performance and resource savings.
Looking forward, in this talk, we will explore the growing role of AI-driven refactoring techniques in accelerating development, enhancing reliability, and optimising performance in complex systems. This talk offers an overview of our current efforts, practical insights, and future directors for code maintenance and refactoring empowered by AI.