Outside the Chatbot AI, I used Whisper to record a wav file of what the user would say and cover it to text.
That would then be sent to OpenAI with a System Role and we'd receive it's result in the form of text.
The Text was then given to ElevenLabs to be given a voice. The Voice is very customizable through the ElevenLabs account holder, but I'm currently working on letting the user of the program configure the voice.
Rather simple. I connected to OpenAI and used their models with System prompts and Role Description. The Role Description is another thing I'm working on for the user to configure themselves. Either letting them type it out or choosing between presets that I'd make.