Computer Science Department Seminar

11:30AM-12:30PM, Wednesday, September 29, 2010

Prof. Michel A. Bender, Stony Brook University and Tokutek, Inc.



Title: How to Index Massive Data Sets Quickly


Abstract:

Insertion bottlenecks lie at the heart of database and file-system innovations, best practices, and system workarounds. Most databases and file systems are based on B-tree data structures, and suffer from the performance cliffs and unpredictable run times of B-trees. In this talk, I introduce the Fractal Tree data structure and explain how it provides dramatically improved performance in both theory and in practice. From a theoretical perspective, if B is the block-transfer size, the B-tree performs O(log_B N) block transfers per insert in the worst case. In contrast, the Fractal Tree structure performs O((log_B N)/B) memory transfers per insert, which translates to run-time improvements of two orders of magnitude. To relate that theory to practice, I present an algorithmic model for B-tree performance bottlenecks. I explain how the bottlenecks affect best practice and how database designers typically modify B-trees to try to mitigate the bottlenecks. Then I show how Fractal Tree structures can attain faster insertion rates, intuitively by transforming disk-seek bottlenecks into disk-bandwidth bottlenecks I conclude with performance results. Tokutek has developed a transaction-safe Fractal-Tree storage engine for MySQL. I present performance results, showing how the system can maintain rich indexes more efficiently than B-trees. Surprisingly, Fractal Tree structures seem to maintain their order-of-magnitude competitive advantage over B-trees on both traditional rotating media as well as SSDs.

Biography:

Dr. Michael A. Bender is an associate professor of computer science at Stony Brook University. His research interests include analysis of algorithms, scheduling, parallel computing, data structures, and cache- and I/O-efficient computing. Dr. Bender has coauthored over 80 articles on these and other topics in computer science. He has received several awards, including an R&D 100 Award for his work on scheduling in parallel computers and three awards for both graduate and undergraduate teaching. Bender founded in startup Tokutek in 2006, where he serves as VP of Engineering. He has also held Visiting Scientist positions at both MIT and King's College London. Dr. Bender received his B.A. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. in Computer Science from the Ecole Normale Suprieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998.

For more information about this colloquium, please contact Habib M. Ammari at cschma@hofstra.edu