(In Press) "TreeTracker Join: Simple, Optimal, Fast", Zeyuan Hu, Yisu Remy Wang and Daniel P. Miranker, ACM Transactions on Database Systems, See the preprint. The version on arxiv is very different.
In 1981 Yannakakis published an optimal algorithm for the evaluation of a special class of database queries, acyclic conjunctive queries. At that time it was not yet understood that nearly all real-world queries are acyclic. Even so, Yannakakis' algorithm has remained a formal result where the low order terms neglected in the complexity analysis almost always dominate when the algorithm is implemented and embedded in a database management system.
The algorithm in our result is also optimal. Yet our new algorithm requires changes in about 3 lines of pseudo code of one of the fastest and most popular algorithms used in enterprise class database management systems. Thus our contribution is the first optimal algorithm that is downward compatible with existing systems and can improve their query execution speed.
July 15, 2025, Broadcom's newly announced product ethernet product, Tomahawk Ultra, does operations in the network. Now known as In-Network Collectives. Identifying these operations as bottlenecks in parallel AI computation was first identified by Stolfo and Miranker and implemented in the interprocessor communication network of the DADO 2, a 1023 Processor AI Computer (1983)
May 7th, 2025 Data.World aquired by ServiceNow, Previously, Capsenta Inc. which was spun out of the Miranker Lab was aquired by Data.World
My primary research and teaching interest always concerns advances in the intersection of artificial intelligence, scaling up database management systems, parallel and distributed computing and the challenges of creating systems that integrate all three. From time to time technology developments in other areas yield a new discpline as the next source of order-of magnitude larger amounts of data. The result is the set of open problems is continually renewed and the pleasure of having to learn new things and making interdisciplinary contributions is a never ending cycle of exciting opportunity.
I received a Bachelor of Science in Mathematics from MIT ‘79 and a Ph.D. in Computer Science from Columbia University ‘86,'87. My dissertation concerned the creation of the DADO machine, a pioneering 1023-core parallel AI computer that became operational in 1985. My dissertation results included that DADO achieved one of its primary goals, high-performance inference of rule-based intelligent systems. Core architectural concepts from the DADO computer have reappeared: at least one in Nvidia products and a distinct one in a recently announced Broadcom product. Upon completion of my Ph.D. I joined the Computer Science faculty at the University of Texas at Austin and have led projects on the development of graph-based semantic databases, AI search and pattern-matching and indexing large collections of non-traditional data types. Application areas have included finance, life science, evolutionary biology and network security.
Twice I've seen my research results make it to commercial practice by co-founding companies, Capsenta Inc. and Liaison Technology Inc. Both companies’ provided solutions to data integration problems by applying AI methods. After repeated acquisitions, intellectual property licensed from the University of Texas to launch Capsenta is now part of ServiceNow’s Data Fabric offering. Liaison Technology was acquired by Forest Express which rebranded itself as Liaison Technology and was later aquired by OpenText.
My papers
Current research:
The K2 Project: Support and Optimization of Graph Database Queries
Past research projects:
Data Integration in the Context of the Semantic Web
Morphster
(2001 - 20??) Bioinformatics and Mobios, A Database Management System to Support Data in Metric Spaces
Rule-based Systems
The DADO Parallel AI Computer