source: https://www.statnews.com .
https://x.com/jermdemo/status/1816135124326711388
https://x.com/OmicsOmicsBlog/status/1815755567626850557
We've tried/proposed
handholding
20% budget
epiTerra Extension: https://chromewebstore.google.com/search/epiTerra
chrome://extensions/?id=gfnefhciljgadngmaongkndnlhcaenpc
Compare with https://portal.isb-cgc.org ... governance at the API bearer token + well supported BigQuery SDKs
API
Gemini Vertex AI: https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#gemini-text-only-samples-drest
Open AI: https://cloud4bio.github.io/
local Llama? - https://gemini.google.com/app/aed4b306c8ac30c9
Hosting the Llama model entirely in the browser is still an emerging area, but here are the current approaches and resources:
Thiggle: This project (https://matt-rickard.com/llama-2-in-the-browser) uses WebGPU for running Llama models in the browser. However, it's limited to smaller models due to current hardware constraints.
You can quantize the Llama models to reduce their size significantly. This might make it possible to load them in the browser, but it could come at the cost of reduced performance.
Ollama: This project allows you to host the model locally and interact with it through a browser interface. It provides an OpenAI-compatible API, making integration easier.
Model API Gateway (Thiggle): This service lets you compare different models (including Llama 2) in the cloud. It might not be purely in-browser, but it offers a convenient way to experiment without local setup.
Additional Considerations:
Hardware Requirements: Running LLMs in-browser is computationally intensive and requires powerful devices with good GPU support.
Browser Compatibility: WebGPU is still a developing standard, and support varies across browsers.
Check the subreddit r/LocalLLaMA for discussions and updates on running LLMs locally.
Look into the Llama.cpp project, which might offer insights into browser-based deployment.
Follow developments in WebGPU and browser technologies as they could enable more efficient in-browser LLM hosting.
Model garden: https://cloud.google.com/model-garden
[Lorena - LLM cost control] https://medium.com/google-cloud/how-to-reduce-your-ml-model-inference-costs-on-google-cloud-e3d5e043980f
Heartbeat API (for reference)