Distributed Systems and Parallel Computing

No matter how powerful individual computers become, there are still reasons to harness the power of multiple computational units, often spread across large geographic areas. Sometimes this is motivated by the need to collect data from widely dispersed locations (e.g., web pages from servers, or sensors for weather or traffic). Other times it is motivated by the need to perform enormous computations that simply cannot be done by a single CPU.

From our company’s beginning, Google has had to deal with both issues in our pursuit of organizing the world’s information and making it universally accessible and useful. We continue to face many exciting distributed systems and parallel computing challenges in areas such as concurrency control, fault tolerance, algorithmic efficiency, and communication. Some of our research involves answering fundamental theoretical questions, while other researchers and engineers are engaged in the construction of systems to operate at the largest possible scale, thanks to our hybrid research model.

Recent Publications

Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples
Mangpo Phothilimthana
Soroush Ghodrati
Selene Moon
ASPLOS 2024, Association for Computing Machinery
Federated Variational Inference: Towards Improved Personalization and Generalization
Elahe Vedadi
Josh Dillon
Philip Mansfield
Karan Singhal
Arash Afkanpour
Warren Morningstar
AAAI Federated Learning on the Edge Symposium (2024)
Load is not what you should balance: Introducing Prequal
Bartek Wydrowski
Bobby Kleinberg
Steve Rumble
(2024)
BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse
Justin Levandoski
Garrett Casto
Mingge Deng
Rushabh Desai
Thibaud Hottelier
Amir Hormati
Anoop Johnson
Jeff Johnson
Dawid Kurzyniec
Prem Ramanathan
Gaurav Saxena
Vidya Shanmugam
Yuri Volobuev
SIGMOD (2024)