About this EventAdd to calendar
Fall 2021 Electrical Engineering and Computer Science (EECS) Seminar Series
"Designing Scalable HPC, Deep Learning, Big Data, and Cloud Middleware for Exascale Systems"
Dhabaleswar K. (DK) Panda
The Ohio State University
Faculty Host: Prof. Xiaoyi Lu
This talk will focus on challenges in designing HPC, Deep Learning, Big Data, and HPC Cloud middleware for Exascale systems with millions of processors and accelerators. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks (InfiniBand and RoCE), GPGPUs (including GPUDirect RDMA), and emerging BlueField-2 DPUs. Features, sample performance numbers and best practices of using MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu) will be presented. For the Deep Learning domain, we will focus on MPI-driven solutions (http://hidl.cse.ohio-state.edu) to extract performance and scalability for popular Deep Learning frameworks (TensorFlow and PyTorch) and large out-of-core models. Accelerating Deep Learning applications with Bluefield-2 DPUs will also be presented. For the Big Data domain, we will focus on high-performance and scalable designs of Spark and Hadoop based on the HiBD libraries (http://hibd.cse.ohio-state.edu). Finally, we will outline the challenges in moving this middleware to the cloud environments including Azure, AWS, and Oracle. The talk will conclude with an overview of the HPC, DL, and Big Data problems to be solved in the newly awarded NSF-AI Institute – ICICLE (https://icicle.osu.edu).
DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, Omni-Path, iWARP and RoCE) libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.45M downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 20th, and 31st ranked ones) in the TOP500 list. MPI-driven solutions for providing high-performance and scalable deep learning for TensorFlow and TensorFlow frameworks are available from https://hidl.cse.ohio-state.edu. The RDMA packages for Apache Spark, Apache Hadoop and Memcached together with OSU HiBD benchmarks from his group (http://hibd.cse.ohio-state.edu) are also publicly available. Prof. Panda is an IEEE Fellow. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.
0 people are interested in this event