GTC 2024
March 2024
GTC 2024 Keynote
Jensen Huang (NVIDIA)
Science
Earth-2: Updates on kilometer-scale visualization, simulation, digital twinning, and AI super-resolution
Karthik Kashinath (NVIDIA), Mike Pritchard (NVIDIA)
Diffision models for high resolution emulation.
Earth-2 Mission #1 - Next-gen weather and climate prediction
Eatch-2 Mission #2 - Interacti with predictions with low latency e.g. Q&A
3 miracles (Huang at Berlin Sumit for climate simulation 2023):
Km-scale simulation (efficiently)
AI emulation for any region for any time period
Visualization using cloud technology
Currently partner with companies to run climate models on GPU. See ALPS talk.
Best resolution is around 25 km e.g. ERA5.
Mining the data assimilation states from ERA5, like training ML to make 1080p video
WeatherBench2.0 shows AI models skill over traiditonal NWP
Models capture idealized physics
FourCastNet does multi-month roll out e.g. AFNO -> SFNO (spherical harmonics)
Shows stability e.g. can run for 10 years and matches expected signal
Need to measure metrics carefully e.g. different for each variable
Modulus offers different moels
Earth2MIP - Open source (https://github.com/NVIDIA/earth2mip)
Hard to compare against full ECMWF members.
Create lagged ensembles with diff IC.
Need higher res for impacts e.g. wildfires
sub-KM is spare e.g. radar, HRRR, WRF etc.
GenAI (diffusion models) for super resolution
CorrDiff e.g. on Tainwan regioanl model 25 km -> 2km. Does better than other models,
Fine-tune to run anywhere. Scale global. fast
Partner with the weather company to generate km scale training datasets. e.g. weather stations network. wind sensors along power lines
Apply to future climate
Services available see build.nvidia.com
Sub-seasonal and Seasonal Forecasting with a Deep Learning Earth-System Model
Dale Durran
DLWP-HPX
CNN using 2D spherical shells; HEALPix (Hierarchial Equal Area isoLatitude Pixelazation) mesh (common in atmosheric science, E-to-R mesh)
Novel ConvNet - inverted channel depth, recurrence in the latest space, dilation give large receptive field.
Loss function is the sum of MSE over 24 hrs
https://arxiv.org/abs/2401.15305 - A Practical Probabilistic Benchmark for AI Weather Models
Avoid pystical parametrizaton
No geo-specific NN weghts (CNN is translation invariant)
Determins ppt from other physcical fields - works well
Run in coupled atmosphere-ocean
Captures a ETC, ENSO, amplitude is too low but coarse model
https://essopenarchive.org/doi/full/10.22541/essoar.169603505.58030377 - Advancing Parsimonious Deep Learning Weather Prediction using the HEALPix Mesh
Huge Ensembles of Weather Extremes using NVIDIA's Fourier Forecasting Neural Network (FourCastNet)
William Collins (Berkeley)
Low-Likeihood High Impact Extremes (LLHIs) - heatwaves, atmospheric rivers, TC, ETC, flooding, Snow events, Wildfires
Data-Driven Weather Prediction (DDWP)
Deuben & Bauer (2018) 6 deg, MLP
Rasp et al (2020) 5 deg, CNN
Weyn et al (2019) 2.5 deg, ConvLSTM
Weyn et al (2020) 2 deg, CNN
Keisler et al (2022) 1 deg, GNN - https://github.com/openclimatefix/graph_weather https://arxiv.org/abs/2202.07575
Pathak et al (2022) 0.25 deg, VIT+AFNO (FourCastNet) - https://arxiv.org/abs/2202.11214
FourCastNet - Full Atmosphere AI Surrogate, Fourier Neural Operator, trained on 100 GPU hours, inference in 3 seconds for a 2-week forecast. 10k-100k speed up
Fourier transform for global convolution. Learns solutions operator, mesh and resolution invariant. Trained using ERA5 1975-2015, valid on 2016-2017 an held out 2018 onwards
Medium-range forecast skill is comparable to IFS
Modulus-Makani (massively parallel training of ML based weather and climate models)
Earth2-MIP (Experimentation with AI models for weather and climate)
Able to predict extremes such as TCs
Preduct moisture variables
Speed of inference enables massive ensembles (>10k) - capture long-tail events
How? Perturb IC, Perturb the model, Inject noise during model integration
The spread of an ensemble prediction should grow same rate as its error.
Scales efficiently up to ~400 GPUs and peak performance is 140.8 petaFLOPS (https://arxiv.org/abs/2208.05419 - FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators)
Spherical harmonics - https://github.com/NVIDIA/torch-harmonics - WIP
Can DDWP fill gap between weather and climate?
https://arxiv.org/abs/2310.02074 - ACE: A fast, skillful learned global atmospheric model for climate prediction
Toward Km scale emulation - fourcastnet ~2.5B Parameter FCN (5km Resolution)
Next steps - Data Driven climate prediction in IPCC; these can supplement traditional forecast cetner models
How AI and Accelerated Computing are Revolutionizing Oceanographic Data Processing
Jann Wendt (North.io), Shilpa Kolhatkar (NVIDIA)
Geophysical data e.g. scans, sonar, Work with point clouds. Sound velocity under water, munition detection
Harnessing GPUs for Accelerated Air Quality Simulations in the NASA Earth System Model
Peter Ivatt (University of Maryland)
VIDEO NOT WORKING.
Old data:
https://developer.nvidia.com/blog/nasa-and-nvidia-collaborate-to-accelerate-scientific-data-science-use-cases-part-1/ Use XGBoost for chemical solver
https://github.com/christophkeller/gc-xgb
A ROMS-Compatible, Ocean Numerical Model for the GPU
Jose Ondina (University of Florida), Oden Green (NVIDIA), Zoe Ryan (NVIDIA), Ron Fick (University of Florida), Maitane Olabarrieta (University of Florida)
Sailfish. Use cupy. 35-70x faster.
A New Generation of Global Climate Models Augmented by AI
Laure Zanna (New York University)
NOT RECORDED. https://m2lines.github.io https://m2lines.github.io/code/
Global Strategies: Startups, Venture Capital, and Climate Change Solutions
Karthik Kashinath (NVIDIA), Thomas Debass (U.S. Department of State), Shimon Elkabetz (tomorrow.io), Joanna Lichter (Emerson Collective)
Emerson Collective - material investments, energy efficient systems
Department of State - Climate change, fund start-ups.
Tomorrow.io - nowcasting, global simulated radar. Climate security is going to be like cyber security.
Bridging the Compute Divide to Mitigate Climate Risk
Kate Kallot (Amini), Geoffrey Levene (NVIDIA), Bod Pette (NVIDIA), Jim Nottingham (HP), Jonathan Reid (Barbados government)
Predicting CO2 Plume Migration in Carbon Storage Projects using Graph Neural Networks
Chung Shih (National Energy Technology Lab), Paul Holcomb (National Energy Technology Lab),
Building a Lower-Carbon Future With HPC and AI in Energy
Marc Spieler (NVIDIA), Charlie Fazzino (ExxonMobil), Shashi Menon (Schlumberger), Otavio Cirbelli Borges (Petrobas), Vibhor Aggarwal (Shell)
Energy-Efficient GPU Computing With Mixed-Precision Modeling for Climate/Weather Applications
Sameh Abdulah (KAUST), Hatem Ltaief (KAUST)
Early Science with Grace Hopper at Scale on Alps
Thomas Schulthess (ETH Zurich)
Genomic Analysis at Scale: Mapping Irregular Computations to Advanced Architectures
Kathy Yelick (Berkeley)
ML
Enterprise MLOps 101
William Benton (NVIDIA), Michael Balint (NVIDIA)
NVIDIA AI Workbench ; TensorRT-LLM
XGBoost is All You Need
Bojan Tunguz (NVIDIA)
NN for tabular is still unsolved. Tree based methods work our of the box. Handling missing values, not sensitive t outliers
No good for predicting values outside of the data.
What approach on what dataset?
< 100 points - stats
100-5,000 - linear/logistic regression
5,000 - 10,000 - GBM/SVM
10,000 - 1,000,000,000 - GBT
> 1,000,000,000 - NN
Mission: accessible accelearated computing (painless CPU -> Multi GPU)
https://github.com/tunguz/TabularBenchmarks/tree/main/datasets/Porto_Seguro
GPUTreeShap
Use XGBoost for unsupervision e.g. tSNE
LLMOps: The New Frontier of Machine Learning Operations
Nik Spirin (NVIDIA), Michael Ballint (NVIDIA)
Navigating the Large Language Models Frontier: Practical Strategies for Building Enterprise Applications Powered by LLMs
Harrison Chase (LangChain), Jerry Liu (LlamIndex), Arvin Jain (Glean), Farshad Saberi Movahed (NVIDIA), Joey Conway (NVIDIA), Jane Polak Scowcrift (NVIDIA)
Insights from Kaggle Grandmasters and Experts on Competitive AI and LLM Frontiers
David Austin (NVIDIA), Jiwei Liu (NVIDIA), Kazuki Onodera (NVIDIA), Chris Deotte (NVIDIA), Laura Leal-Taixe (NVIDIA)
Leveraging GPU-Efficient Vector Search for Large-Scale Ads Pipelines
Benjamin Karsin (NVIDIA) Arpan Jain (Microsoft)
Lightning Fast with Thunder, a New Extensible Deep Learning Compiler for PyTorch
Luca Antiga (Lightning AI), Thomas Viehmann (Lightning AI), Mike Ruberry (NVIDIA)
Large-Scale Production Deployment of RAG Pipelines
Mariem Bendris (NVIDIA)
Optimize Generative AI inference with Quantization in TensorRT-LLM and TensorRT
Zhiyu Cheng (NVIDIA), Asma Kuriparambil Thekkempate (NVIDIA)
Accelerated LLM Model Alignment and Deployment in NeMo, TensorRT-LLM, and Triton Inference Server
Bharat Giddwani (NVIDIA), Utkarsh Uppal (NVIDIA)
RAPIDS
RAPIDS in 2024: Accelerated Data Science Everywhere
Dante Gama Dessavre (NVIDIA), Nick Becker (NVIDIA)
NVIDIA AI Enterprise (developer tools, Cloud Native Management and Orchestration, Infrastructure Optimization)
Accelerated computing swim lands (higher is easier to use and lower is maximum performance):
Zero code change (cudf.pandas, nx-cugraph, RAPIDS-spark, Array-API for scikit-learn)
Minimal changes (Pytorch, XGBoost, cuml-CPU)
GPU code (cuDF, cuML, cuGraph, cuVS, RMM, CuPy, Numba, OpenAI Trition)
Python/CUDA libraries (Hybrid Python/CUDA code) (CuPy RawKernels, Numba, Cython wrappers for CUDA)
C++/CUDA high level (RAFT, CCCL Cub, etc.)
CUDA Toolkit (cuBLAS, cuDNN, CuSolver, cuSPARSE)
cupdf.pandas (see above)
https://colab.research.google.com/drive/12tCzP94zFG2BRduACucn5Q_OcX1TUKY3#scrollTo=KP0oc3PboQDv
Can accelerate stuff like ibis.set_backend("pandas")
nx-cugraph (see above)
https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#networkx.algorithms.centrality.betweenness_centrality has a note that says:
Additional backends implement this function
cugraphGPU-accelerated backend.
weight parameter is not yet supported.
Numba CUDA
Shared memory across the ecosystem. Prevents unnecessary memory contention across ecosystem.
DLPack and CUDA Array interface. Zero copy between ecosystem e.g. cudf and torch:
import torch
import cudf
s = cudf.Series([0, 1, 1, 2, 3])
tensor = torch.from_dlpack(s.to_dlpack())
tensor([0, 1, 1, 2, 3], device='cuda:0')
cudf.Series(tensor)
For dask you can do
dask.config.set({"dataframe.backend": "cudf"}) which will allow you can now use dask.dataframe like dask_cudf
dask-expr to cudf to speed and reduce memory (WIP?)
Rapids spark is easy to use - Add jar to classpath and set spark.plugins config
from spark_rapids_ml.clustering import Kmean
PySpark MLib API, cuML MNMG classes, RAFT primitives and NCCL comms.
Offers speed up and cost savings
XGBoost on rapidsai channel offers RMM-enabled build for efficient memory pool sharing. Column-based split for federated learning (train next to data and don't share data0 with GPU NVFlare. Multi-target trees with vector-leaf outputs. Device parameter for each GPU config. Approx tree method is now GPU accelerated. Improved learning-to-rank. Quantile regression, improved memory support. Improved pyspark significant GPU speed up over multi-CPU node. UCX networking speed ups.
GNN libraries - cuGraph-DGL (extends https://docs.dgl.ai/index.html). cuGraph-PyG (extends https://pyg.org/ ). Model and Storage Backends (cuGraph-ops/pylibcuGraph-ops - accelerated GNN fwd/bwd layer kernels) WholeGraph (distributed feature/kv store). CuGraph equivariant
Vector Search - cuVS. Algorithms support - CAGRA. Built on top of RAPIDS RAFT. Distances (e.g. pairwise distances), Cluster (e.g. K-means)
NeMo curator - data mining modules for training LLMS
Installation and packaging - Devcontainers for devs wanting to contribute.
NVDashboard - like dask dashboard for GPU
NVIDIA Nsight Systems - Profile GPU code
NVIDIA Tools Extension (NVTX) - use @nvtx.annotate to get GPU profiling
Get started with NVIDIA AI workbench - Streamlined Setup, start locally and scale to a data center,
NVIDIA LaunchPad - for enterprises
RAPIDS (ecosystem) vs legate (distributed run time).
Accelerating Pandas with Zero Code Change using RAPIDS cuDF
Anhwin Srinath (NVIDIA)
Extisting pandas code can run orders of magnitude faster on the GPU with 0 effort
10-100 x faster than pandas.
Build using libcudf C++/CUDA library
%load_ext cudf.pandas or python -m cudf.pandas script.py
Pandas is everywhere. ChatGPT's default it to return pandas code if asking a python question.
Alternatives: Polars, DuckDB, Modin, spark, dask, rapids, xorbits
Supports 100% of the pandas API, fall back to CPU
Help pass to third-party libraries e.g. seaborn
Import a proxy module with proxy types and functions which are dispatched to cuDF or pandas
>100 x faster join, 40 x faster groupby on H20.ai benchmark
Used tools like langchain and pandasai to help generate pandas code. %load_ext cudf.pandas can accelerate this up to 4-10 x faster with agent evaluation.
cpu env:
uv pip install ipython langchain langchain-experimental langchain-openai pandas tabulate
gpu env:
uv pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==24.2.* ipython langchain langchain-experimental langchain-openai tabulate
#%load_ext cudf.pandas
import pandas as pd
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import OpenAI
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)
expected = int(agent.invoke("how many rows are there?")["output"])
actual = len(df)
df1 = df.copy()
df1["Age"] = df1["Age"].fillna(df1["Age"].mean())
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), [df, df1], verbose=True)
agent.invoke("how many rows in the age column are different?")
When to use cudf.pandas vs. cudf?
cudf.pandas if you have pandas code and want to run run on GPU now
cudf if you don't want CPU fallback (expensive) and has faster algos or extends API compared to pandas.
Every PR on cudf runs against pandas test (latest release e.g. 2.2.1) and passing 94%. Also turn on cudf.set_option('mode.pandas_compatible', True) which does things like gets ordering same as pandas which slows things down a little.
User %%cudf.pandas.profile (or from cudf.pandas import Profiler; with Profiler() as p: pas) (there is also %%cudf.pandas.line_profile) to see what's running on the GPU and what's running on the CPU. e.g. indexer_between_time not supported in cudf.
GPU memory is limited use sparingly.
Where possible use built-in API over a custom UDF.
Accelerating NetworkX: The Future of Easy Graph Analytics
Mridul Seth (NetworkX), Rick Ratzel (NVIDIA)
All data can be put into a graph if you try hard enough :)
NetworkX is everywhere. It's THE graph library and chatGPT will suggest code for.
NetworkX is mostly a dictionary or dictionaries. Struggles at million node scale.
User-facing API + pluggable backends (dispatching and conversions):
nx-cugraph
graphblas-algorithms (CPU + OpenMP; GPU coming soon)
nx-parallel (joblib)
scipy.sparse (possibility)
nx-cugraph code runs against NetworkX tests on every PR and passes.
Customizable pluggable back-ends. Reminds me of xarray's open_dataset.
nx-cugraph has 60 graph algorithms, 42 accelerated graph generators
U.S. patent dataset has 3.7M nodes and 16.5 edges.
Next steps:
networkX will have configuration API, introspection and logging, more dispatchable APIs
nx-cugraph will have more algorithms and multi-GPU support to networkX calls
Large-Scale Graph GNN Training Accelerated With cuGraph
Joe Eaton (NVIDIA)
Used at:
Node-level (entity classification, recommenders, part-of-speech)
Edge-level (relationship classification, fraud detection)
Graph-level (molecular reactivity)
Dense embedding vector representation of sparse data
100TB graphs
Property Graph Model
Graph-as-a-service (cuGraph)
Multi-threads multiple-GPUs
DGL (https://www.dgl.ai/) and PyG (https://www.pyg.org/)
See structure of a GNN at https://www.cse.ust.hk/~yqsong/papers/2018-WWW-Text-GraphCNN.pdf
WholeGraph helps with SubGraph sampling (cuGraph-ops, distributed Pytorch) - host and device storage. Efficient memory
What's next? - easier to deploy, customer features, move code to C++ layer,
Integrated containers https://catalog.ngc.nvidia.com/orgs/nvidia/containers/dgl https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pyg
More cuGraph Features
Getting Started with Large-Scale GNNs using cuGraph Packages for DGL and PyG
Alexandria Barghi (NVIDIA), Vibhu Jawa (NVIDIA)
WORKSHOP.
Accelerate and Scale your Graph Analytics with RAPIDS CuGraph and HPC SDKs
Oded Green (NVIDIA), Chuck Hastings (NVIDIA), Seunghwa Kang (NVIDIA), Rick Ratzel (NVIDIA), Brad Rees (NVIDIA), Erik Welch (NVIDIA)
IN PERSON.
Reducing the Cost of your Data Science Workloads on the Cloud
Jacob Tomlinson (NVIDIA)
cudf.pandas. Pandas is single-threaded. Not a query engine.
Alternatives: Faster underlying implementation (C++, Rust, CUDA), query engines, SQL-inspired, distributed computing, hardware accelerated (GPUs).
cudf.pandas is 100% of the pandas API. Falls back to CPU is GPU doesn't work.
Deploy methods: shared node: e.g. Triton Inference Server and the Forest Inference Library ; Single Node e.g. a GPU; Multi Node e.g. Dask and Spark.
Deploy on managed notebook platform e.g. SageMAker
Deploy on compute pipelines e.g. Amazon EMR, Cloud Dataproc, Azure Databricks
Virtual machines: EC2
Why reduce cost? $, competitive advantage, reduce context switching, develop faster, environment impact, improve accuracy
20x time cheaper if on GPU than CPU. https://blogs.nvidia.com/blog/spark-rapids-energy-efficiency/
Autoscaling can save costs https://docs.rapids.ai/deployment/stable/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook/
How to deploy GPU Data Science Workloads on the Cloud
Taurean Dyer (NVIDIA), Sheilah Kirui (NVIDIA), Mike McCarty (NVIDIA), Jacob Tomlinson (NVIDIA)
IN PERSON.
More Data, Faster: GPU Memory Management Best Practices in Python and C++
Mark Harris (NVIDIA)
88% of cuDF time is spent in CUDA memory management (cuaMalloc/cudaFree)
RMM - Common interface for customing device/host memory allocation, a collection of implementations of the interface, data containers that use the interface for the memory allocation
NVIDIA Morpheus Digital Fingerprinting Inference
Perform High-Efficiency Search, Improve Data Freshness, and Increase Recall With GPU-Accelerated Vector Search and RAG Workflows
Charles Xie (Zilliz), Corey Nolet (NVIDIA)
What is a Vector Database? Stores embedding and enables semantic search across different types of unstructured data.
Milvus is an example
Retrival Augmented Generation (RAG) - A technique that combines the strength of retrieval-base generative models (improved accuracy an relevance, provide private/domain specific knowledge, reduce hallucination. Has challenges of data refreshness and throughput.
Indexes with high throughput often require more time to construct.
GPU powered Milvus. Many X speed up.
cuVS - Vector Search (Brute force, CAGRA); Distance, Cluster built on top of RAFT (High Performance Machine Learning Primitives)
Unlock the Full Potential of your AI Workflows With Dataiku and NVIDIA RAPIDS
Not much to say. You can run RAPIDS on Dataiku (a workbench tool).
Accelerating Data Analytics on GPUs with the RAPIDS Accelerator for Apache Spark
Matt Ahrens (NVIDIA)
IN PERSON.
RAPIDS Accelerator for Apache Spark Propels Data Center Efficiency and Cost Savings
Eyal Hirsch (Taboola)
Disk IO bottlenecks
Create issues along the way
Accelerate ETL and Machine Learning in Apache Spark
Sameer Raheja (NVIDIA), Erik Ordentlich (NVIDIA)
221 ZB of data by 2026
Rapids-spark requires zero-code changes. Add jar to classpath and set spark.plugins config (com.nvidia.spark SQL Plugin)
https://github.com/nvidia/spark-rapids-benchmarks
https://github.com/NVIDIA/spark-rapids-tools
Improved reliability with spill framework, improved IO, JSON handling, scaling to 100s of TB
Roadmap: Support Apache Iceberg, IO from cloud storage
https://github.com/NVIDIA/spark-rapids-ml
Package import change. Ues cuML MNMG classes / Raft NCCL communication
Can use on databricks https://github.com/NVIDIA/spark-rapids-ml/tree/main/python/benchmark
Training/fit time is 6 - 100x faster
How PayPal Reduced Cloud Costs by up to 70% With Spark RAPIDS
Ilaay Chen (PayPal)
PayPal has 430+ million users, 25+ billion transactions every year
Fraud Detection; Recommendation systems, Risk, Credit, Customer Support
Find sweet spot of spark.rapids.sql.concurrentGpuTasks
Reduce machine count 140 -> 30, increase isk counts per node 4 -> 8
Everything, All at Once: Processing Spatial Transcriptomics Data using Accelerated Computing
Jonny Hancox (NVIDIA)
IN PERSON
Compute
Breaking Down The Wall: Accelerator-Native Now (Presented by Voltron Data)
Rodrigo Aramburu (Voltron Data)
Data systems face the walls as it takes 10x more effort to preprocess data for AI/ML than to train. Issues with Interoperability and speed and scale.
TPC-H 10TB Benchmark - the speed increase flatten after 100 nodes
Theseus is 72x faster, 71x cheaper and requires 100x fewer servers.
Lots of spilling
Accelerator-native: GPU as a co-processor (CPU an GPU work together but shipping data over PCle is slow)
GPU as a core-processor: multiple processes on same data but transfers between nodes is slow.
Accelereate the full-system: Heterogenous compute, Shared memory, IO, networking, GPU Direct Storage
Advances in Optimization AI
Alex Fender (NVIDIA)
cuOpt - vehicle routing optimization. >15k tasks at once. Objectives, Constraints an variants.
Availible as an API. Beats Best Known Solutions (BKS).
Linear programming.
cuOpt agent e.g. use in an LLM.
CUDA: New Features and Beyond
Stephen jones (NVIDIA)
How To Write A CUDA Program: The Ninja Edition
Stephen jones (NVIDIA)
Multi GPU Programming Models for HPC and AI
Jiri Kruas (NVIDIA)
https://github.com/NVIDIA/multi-gpu-programming-models
MPI, NVSHEM, NCCL.
Product
From Netflix Recommendations to Conversational Multi-agents: The (R)Evolution of AI-Driven Product Innovation
Xavier Amatrianin (Google)
The past: Data an algorithms-driven product innovation
Yesterday: ML -> Deep Learning
Today: Product innovation in the Age of Gen AI
Explanations (and UX) matters
Multi-stage sysems
GPT = Genereative (Genearate new content based on a natural language input) + Pretrained (Trained on huge internet size datasets, can learn on the fly and be fine-tuned) + Transformer (DL architecture using attention)
LLM recommendations (e.g. I live these movies can you suggest others), (Ask yes or no questions and i'll recommend artist for you)
Gen AI products are similar to old AI: Importance or product an UX design, importance of evaluation and metrics, importance of domain knowledge
Gen AI proucts are different to old AI: The UX is the AI, new eval metrics and frameworks (e.g. reduce hallucinations), domain knowledge needed less)
Agent = LLM based system that has access to tools and can decide how to ue them
Unlock AI’s Potential: Best Practices for Business-Led Digital Roadmaps and Implementation Challenges
Anne Hecht (NVIDIA), Stefan Goebel (SAP), Giovanni Di Napoli (Medtronic), Stefano Pasquali (BlackRock), Albert Greenberg (Uber)
GenAI value streams at BlackRock - Client Experience (Help Chat, Navigation, Visualization)
Investments (Summarization, Trading and investment signals)
Productivity (translation, coding assistant)
Medtronic - Patient care. Model building and deploying via NVIDIA GPUs
SAP - Supply chain management, human captial management, sped management, customer relationship management, business technology platform
Uber - docker fix tool, agent, 20 hours would take a developer