AI training data comes from public databases, the internet, company records, contributions from many people, and even artificially created data.
Both OpenAI and Google released previews of their new AI models, GPT-4o and Project Astra. Coming soon! These AI assistants demonstrate true multimodality, where they can hear, speak, and see with virtually no latency.
A concerning issue regarding artificial intelligence is creating information where the information is not present. When training data does not contain the information, AI may haullincate or generate the wrong information. Although this is less seen in voice assistants due to limited programmed responses, the growing use of generative AI in this segment poses a challenge. A significant aspect of this is that users should be able to verify and research information they have found.
Widespread use of voice recognition in apps like ChatGPT can be less accessible for people who do not speak with mainstream accents. When LLMs are primarily trained in English, their performance in other languages may be subpar or non-existent. This limits the accessibility and usability of these models for non-English speakers, creating a barrier to effective communication and information access. Additionally, synthetic voices carry social and cultural biases that influence users' perceptions and expectations, an area requiring further research (Heaven, 2023).
"[W}ith indigenous languages, it's a huge gap in the science and most languages in the world are highly polysynthetic. It's just a small minority that dominates the technological space, which is basically colonization, the West. Western Europe, North America, they dominate technology. They only talk to themselves, and they don't see the rest of the population in the world. And there's this fundamental science beyond linguistics that's not being conducted because we're only addressing the problems of the affluent at 95%." (Kentbye, 2023)
The risks of LLMs that power AI conversational agents include spreading misinformation and false narratives. In this video, Phaedra Boinodiris explains four areas of risk mitigation for LLMs: hallucinations (misinformation), bias, consent, and security.
Mitigating the risk of falsehoods involves explainability, providing real data and data lineage to understand the model's reasoning. Bias can be present in LLM outputs, and addressing this risk requires cultural awareness, diverse teams, and regular audits. Consent-related risks can be mitigated through auditing and accountability. Security risks of LLMs include potential misuse for malicious tasks.
Education is crucial in understanding AI's strengths and weaknesses and responsible curation, including the environmental impact and the need for safeguards.