Northwest Database Society (NWDS) Annual Meeting 2024

Overview

The Northwest Database Society Annual Meeting brings together researchers and practitioners from the greater Pacific Northwest for a day of technical talks and networking on the broad topic of data management systems.

This year, the meeting will be hosted by Google.  It will be a full-day event. There will be a keynote,  a panel, several sessions of shorter presentations by members of our community, a poster session, and significant break time for unstructured discussion. There will be breakfast, lunch, coffee breaks, and a post-conference reception.



DETAILS

WHEN: Friday, February 9th  2024

WHERE: Google Kirkland - Building E  - Frozen  Tech Talk  Room
                  747 6th St South, Kirkland, WA 98033

REGISTRATION: REGISTRATION IS NOW CLOSED, LOOKING FORWARD TO A GREAT NWDS 2024!

INSTRUCTIONS UPON ARRIVING: click here for map and instructions.

CONTACT: Pavan Edara, Justin Levandoski, Jing Jing Long

Keynote

Open Language Model (OLMo):  The science of Language models and language models for science


Speaker: Hanna Hajishirzi, University of Washington


Abstract: Over the past few years, and especially since the deployment of ChatGPT in November 2022,  neural language models with billions of parameters and trained on trillions of words are powering the fastest-growing computing applications in history and generating discussion and debate across society. However, AI scientists cannot study or improve those state-of-the-art models because the models' parameters, training data, code, and even documentation are not openly available. In this talk, I present our OLMo project toward building strong language models and making them fully open to researchers along with open-source code for data management, training, inference, and interaction. In particular, I describe DOLMa, a 3T token open dataset curated for training language models, Tulu, our instruction-tuned language model, and OLMo v1, a fully-open 7B parameter language model trained from scratch. 


Bio: Hanna Hajishirzi is a Torode Family Associate Professor at UW CSE and a Senior Director at AI2. Her research spans different areas in NLP and AI, specifically understanding and advancing large language models. Honors include the NSF CAREER Award, Sloan Fellowship, Allen Distinguished Investigator Award, Intel rising star award, multiple best paper and honorable mention paper awards, and several industry research faculty awards. Hanna received her PhD from University of Illinois and spent a year as a postdoc at Disney Research and CMU.

AGENDA

8am-9am Hot Breakfast (served in general space outside conference room)

9am-9:05am Welcome

9:05am - 10am: Keynote by Hanna Hajishirzi, University of Washington (Chair: Pavan Edara)

10am-10:30am Break

10:30am-12pm - Database Systems [15 mins each] (Chair: Jing Jing Long)

Marc Brooker (AWS), Parameterizing contention and coordination [slides]

Kaisong Huang (Simon Fraser University), The Art of Latency Hiding in Modern Database Engines [slides]

Prashant Pandey (University of Utah), BP-tree: Overcoming the Point-Range Operation Tradeoff for In-Memory B-trees [slides]

Mosha Pasumansky (Firebolt), Firebolt - Cloud Data Warehouse for Data Intensive Applications [slides]

Cristian Diaconu and Hossein Ahmadi (Snowflake), Unistore: Hybrid Transactional and Analytical Processing in Snowflake [slides]

Jordan Tigani (MotherDuck), MotherDuck: DuckDB in the cloud and in the client [slides]

12pm-12:45 Lunch (Provided)

12:45-1:30 Poster presentations (instructions here) and socializing

1:30-3pm Panel: AI and Data Management (Chair: Magdalena Balazinska)

Panelists:
Anna Fariha (University of Utah) [slides]
Moe Kayali (University of Washington) [slides]
Luna Dong (Meta) [slides]
Ihab Ilyas (Apple)
Xi Cheng (Google) [slides]
Amir Hormati (Databricks)

3pm-3:30pm Break with light snacks

3:30pm-5pm Modern Data Processing Systems [15 mins each] (Chair: Justin Levandoski)

Jiaxun Wu (Google), BigFrames: Bringing Scale to Python Data Analysis and Science in a Data Warehouse [slides]

Laurel Orr (Numbers Station), LLMs for Text2SQL: The Dream versus Reality [slides]

David Maier and Todd Porter (Portland State University and Meta), Block and Tackle: Exploiting Stream Macro-structure

Mingge Deng and Denis Petushkov (Confluent), Managed Flink for Streaming Lakehouse

Primal Pappachan (Portland State University), Preventing Inferences through Data Dependencies on Sensitive Data [slides]

Phil Bernstein (Microsoft Research), Chablis: Fast and General Transactions in Geo-Distributed Systems [slides]


Previous Meetings

This is the seventh meeting of the series. Previous meetings were held at: