E-mail: pkumar<at>wm<dot>edu
Office: McGlothlin-Street Hall, 135
E-mail: pkumar<at>wm<dot>edu
Office: McGlothlin-Street Hall, 135
Please visit http://www.cs.wm.edu/~pkumar/ for latest informations.
I am an assistant professor in the Department of Computer Science, William & Mary (The second oldest university in the USA, and the public Ivy). I completed my PhD in Computer Engineering at the George Washington University (GWU). My research interest is at the intersection of Data Science and Cyber-infrastructure, developing an eco-system to plug the emerging storage and compute hardwares in the system software stack of data analytics. This roughly translates to IO, File and Storage Systems, Memory sub-system, Data Management, and High-Performance Data Analytics with a focus on emerging Big Data applications such as Graph Processing, Stream Analytics, Machine Learning etc. I completed my Bachelor of Technology degree from Indian Institute of Technology, Dhanbad, India in the year 2007.
I am leading the Data Lab in the Department of Computer Science, William and Mary. Everything that we do here are related to huge data that is a norm today. If you are interested in doing research on any aspect of data, that will accelerate the data-led innovation, do not hesitate to contact me. Following projects are under active research in the Data Lab, but a lot more are still in the drawing board phase:
Fall, 2019: CSCI 780-02, Big Data Systems, MW 1530-1650. [Details]
Spring, 2020: Operating Systems.
Following list are those papers that have been published or are about to get publish. Feel free to send me an email for the paper or the code. There are many exciting works that are in the pipeline that will be submitted soon to top tier conferences and journals. If you are interested in knowing more about those papers, send me an email to initiate the discussion. * denotes the top-tier venues, that are extremely competitive to get in.
*[ACM Transcation on Storage] Pradeep Kumar, Howie Huang. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. Extended Version of FAST'19 Paper, Under Review.
*[USENIX FAST'19] Pradeep Kumar, Howie Huang. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. [Blog][PDF][PPT][Code]
GraphOne is first ever system that can perform diverse set of analytics on the same data-store, and replaces the current practice of deploying specialized systems for each type of analytics. The above two works propose following contributions:
[IEEE HPEC'17] Yang Hu, Pradeep Kumar, Guy Swope (Raytheon), H. Howie Huang. TriX: Triangle Counting at Extreme Scale. Finalist, 2017 IEEE/Amazon/DARPA Graph Challenge
*[SC'16] Pradeep Kumar, Howie Huang. G-Store: High-Performance Graph Store for Trillion-Edge Processing. [Blog] [PDF] [PPT] [Code]
The above two works present graph computing with extreme scale. G-Store is first ever system to demonstrate graph computing at trillion-edge scale within a commodity server. Many techniques make this possible:
*[USENIX ATC'17] Pradeep Kumar, Howie Huang. Falcon: Scaling IO Performance in Multi-SSD Volumes. [Blog] [PDF] [PPT] [Code]
Do you Think that today's IO stack can deliver the raw performance, if multiple SSDs are combined in a volume. The answer is no, because of the per-volume convention that is hard-coded in the IO stack. We have proposed Falcon IO Stack that introduces a new convention of per-drive IO processing, that introduces a new layer in the IO stack to bring the best of each drivers that are part of the volume. Further, the Falcon IO Stack optimizes its merging, sorting, and dispatch process to make it future ready, where we have shown that our new IO stack can saturate the newer NVMe Flash.
[IEEE Big Data Congress'17] Pradeep Kumar, Howie Huang. SafeNVM: A Non-Volatile Memory Store with Thread-Level Page Protection. [Blog] [PDF] [PPT] [Code]
Storing persistent data in NVM is not safe against software induced corruptions as it used to be if stored in disks. Do know why, and what will change if we store data in NVM instead of disk. This paper discusses the unknown safety conventions that is implicitly followed when data moves from DRAM to Disks. But that convention is broken due to presence of NVM. Our techniques bring those conventions back to protect your persistent data in NVM from any software induced corruption.
I have over 6 years of industry research and development experience in the broader Systems area. During my PhD, I interned in IBM Research. Prior to my PhD, I worked as File System/Operating System kernel developer, and Distributed Systems Programmer for cluster-mode storage systems in NetApp Inc. at their Bangalore (India) research and development center. Prior to that I worked at Huawei technologies at their Bangalore, Shanghai and Shenzhen center.