This document describes how to use the AI Agent in the E-Learning Lab, an interactive, intelligent AI agent that can be connected to various large language models like GPT-4, Claude and Gemini. You can customize the agent's personality, use speech recognition, and leverage high-quality text-to-speech models.
How to Run:
After adding your API keys and going through the initial setup (see below), can toggle on "AI Agent Assistance" on the Options tab and "Speech Recognition" to use speech recognition when speaking with the agent. For customized prompts per slide use the "Prompts" tab of the Assets section to add prompts and then right click and use "Add to Scene" to add it to the current scene.
Interaction
Hold either the '2' key or the RH "B" button to start speaking, let go to stop and AI agent will respond. If INTERUPT_AI is True, you can press '2' on Desktop or RH "B" in VR to interrupt and speak again.
If SPEECH_RECOGNITION is set to False, press '2' to bring up the chat window
To stop the conversation, type "q" and click "OK" in the text chat, or say "exit" if using speech recognition (press '2' again to bring the chat window back).
Placing in the Scene
To place your Agent in your scene, open your slide in Inspector and use File-Add to place the 'START_POINT_AI_AGENT' (from scene_objects), then use the move and rotate tools to place the object where you want your AI Agent in the scene.
Key Features
Interact and converse with custom AI Large Language Models in real-time VR or XR simulations.
Choose from OpenAI models (including GPT-4, custom GPTs), Anthropic models (like Claude 3 Opus or 3.5 Sonnet) or Gemini. Requires an API key.
Modify avatar appearance, animations, environment, and more. Works with most avatar libraries (Avaturn, ReadyPlayerMe, Mixamo, Rocketbox, Reallusion, etc.).
Customize the agent's personality, contextual awareness, emotional state, interactions, and more. Save your creations as custom agents.
Use speech recognition to converse using your voice or text-based input.
Choose from high-quality voices from Open AI TTS or Eleven Labs (requires API)
Train the agent as it adapts using conversation history and interactions.
Installation
Ensure you have the required libraries installed using the Vizard Package Manager. These include:
openai (for OpenAI GPT agents)
anthropic (for Anthropic Claude agent)
google-generativeai for Gemini
elevenlabs (for ElevenLabs text-to-speech)
SpeechRecognition
pyaudio
python-vlc
Need to install vlc player (for Open AI TTS). Seems to need at least version 3.0.20
For elevenlabs you may need to install ffmpeg and mpv player (see below)
Note: Requires an active internet connection
Recent versions also may require these libraries:
numpy
sounddevice
API Keys
Obtain API keys from OpenAI (if using Chat GPT), Anthropic (if using that model), and ElevenLabs (if using elevenlabs instead of OpenAI's TTS). See below for specific information on obtaining API keys.
In windows search type "cmd" enter setx OPENAI_API_KEY "your-api-key", setx GEMINI_API_KEY "your-api-key",
setx ELEVENLABS_API_KEY "your-api-key", setx ANTHROPIC_API_KEY "your-api-key"
4. Adding Custom Prompts
Select the "Prompts" tab, add a new text file with your prompt or select an existing one. Right click and select "Add to Scene"
5. Configuration
Open the AI_Agent_Config_Education.py script (in the configs folder in E-Learning Lab\AI_Enabled\AI_Agent\configs) and configure the following options. Add new config files if wanting to have multiple configurations (would then change the top line in the AI_Agent.py script where it's being imported)
AI_MODEL: Choose between 'CHAT_GPT', 'CLAUDE' and 'GEMINI'.
OPENAI_MODEL: Specify the OpenAI model name (e.g., "gpt-4o").
ANTHROPIC_MODEL: Specify the Anthropic model name (e.g., "claude-3-5-sonnet-20240620")
MAX_TOKENS: The amount of tokens each exchange will use. Set to a higher number to increase number of responses (token limit for most models is 4096, gpt-4 has 8192)
SPEECH_MODEL: Choose Open AI TTS or Eleven Labs
ELEVEN_LABS_VOICE: Choose Voice for Eleven Labs
OPEN_AI_VOICE: Choose Voice for Open AI TTS
AVATAR_MODEL: Add avatar model to use. Use your own or find some in sightlab_resources/avatar/full_body
Obtaining API Keys
To use certain features of the AI Agent, you'll need to obtain API keys from the following services:
OpenAI (for ChatGPT and Open AI Text to Speech):
Visit the OpenAI website (not the ChatGPT login page): https://openai.com/
Sign up for an account if you don't have one, or log in if you already do (you may also need to buy some credits, but don't need much for this (most likely would not surpass a few dollars a month))
Navigate to the API section of your account.
Click "Create a new secret key" and copy the key.
Might need to buy a certain amount of credits, but even $5 should be sufficient. Go to "Usage" and increase usage limit.
Can add whichever models are available in the config file
Eleven Labs (for ElevenLabs Text-to-Speech):
Log in to your ElevenLabs account: https://elevenlabs.io/
Click your profile icon in the top-right corner.
Click the eye icon next to the "API Key" field.
Copy your API key.
Paste the copied key into a text file named "elevenlabs_key.txt" and place it in your root SightLab folder.
Anthropic API:
Go to the Anthropic login page here (Anthropic Console ) and either Sign up or login.
Fill out the sign-up form with your email address and other required information. You may need to provide details about your intended use case for the API.
After submitting the form, you should receive a confirmation email. Follow the instructions in the email to verify your account.
Once your account is verified, log in to the Anthropic website using your credentials.
Navigate to the API section of your account dashboard
Gemini and Gemini Ultra:
Install with install -q -U google-generativeai in the Package Manager command line.
Refer to Google AI Python Quickstart for setup details.
In windows search type "cmd" enter setx OPENAI_API_KEY "your-api-key", setx GEMINI_API_KEY "your-api-key",
setx ELEVENLABS_API_KEY "your-api-key", setx ANTHROPIC_API_KEY "your-api-key"
Additional Information:
For prompts, add "" quotation marks around GPT prompt and use "I am... " for configuring agent. For Anthropic do not need quotes and can use "You are..."
For elevenlabs, refer to the ElevenLabs Python documentation for more details: https://github.com/elevenlabs.
You can connect "Assistants" through the openai API, but not custom GPTs.
Issues and Troubleshooting
There may be an error if you have your microphone set to your VR headset and the sound output device set to not be the headset
May see an error if are using the free version of elevenlabs and run out of the 10,000 character limit (paid accounts get larger quotas)
ffplay error with elevenlabs - may need to install ffmpeg and add it to the Vizard environment path https://www.gyan.dev/ffmpeg/builds/
mpv player error with elevenlabs- may need to install mpv and add it to the Vizard environment path https://mpv.io/installation/
If getting an error with Gemini "out of quota" try using a model with more quota, like gemini-1.5-flash-latest, or enable billing for much higher limits.
Tips
Environment Awareness: To give your agent an understanding of the environment use the Gemini model with built in vision or take a screenshot (/ key in SightLab) and use ChatGPT online to generate a description. Include this in the prompt.
Interact and converse with custom AI Large Language Models in a VR or XR simulation in real time.
Choose from Openai models, including GPT-4, custom GPTs, and Anthropic (such as Claude 3 Opus)- Requires API key
Modify avatar appearance, animations, environment and more. Works with most avatar libraries
Customize personality of agent, contextual awareness, emotional state, interactions and more. Save as custom agents.
Use speech recognition to converse using your voice or text based
Choose from high quality voices from eleven labs and other libraries (requires API) or customize and create your own
Train agent as it adapts using a history of the conversation and its interactions