AI Technology

AI technology advancements 2024

Reality check: more realistic expectations

When generative AI first hit mass awareness, a typical business leader's knowledge came mostly from marketing materials and breathless news coverage. Tangible experience (if any) was limited to messing around with ChatGPT and DALL-E. Now that the dust has settled, the business community now has a more refined understanding of AI-powered solutions.

Source: ibm.com

Multimodal AI

That being said, the ambition of state-of-the-art generative AI is growing. The next wave of advancements will focus not only on enhancing performance within a specific domain, but on multimodal models that can take multiple types of data as input. While models that operate across different data modalities are not a strictly new phenomenon text-to-image models like CLIP and speech-to-text models like Wave2Vec have been around for years now they've typically only operated in one direction, and were trained to accomplish a specific task.

Source: ibm.com

Small(er) language models and open source advancements

In domain-specific models particularly LLMs we've likely reached the point of diminishing returns from larger parameter counts. Sam Altman, CEO of OpenAI (whose GPT-4 model is rumored to have around 1.76 trillion parameters), suggested as much at MIT's Imagination in Action event last April: “I think we're at the end of the era where it's going to be these giant models, and we'll make them better in other ways,” he predicted. “I think there's been way too much focus on parameter count.” Massive models jumpstarted this ongoing AI golden age, but they're not without drawbacks. Only the very largest companies have the funds and server space to train and maintain energy-hungry models with hundreds of billions of parameters. According to one estimate from the University of Washington, training a single GPT-3-sized model requires the yearly electricity consumption of over 1,000 households; a standard day of ChatGPT queries rivals the daily energy consumption of 33,000 U.S. households.

Source: ibm.com

GPU shortages and cloud costs

“The big companies (and more of them) are all trying to bring AI capabilities in-house, and there is a bit of a run on GPUs,” says James Landay, Vice-Director and Faculty Director of Research, Stanford HAI. “This will create a huge pressure not only for increased GPU production, but for innovators to come up with hardware solutions that are cheaper and easier to make and use.”1 As a late 2023 O'Reilly report explains, cloud providers currently bear much of the computing burden: relatively few AI adopters maintain their own infrastructure, and hardware shortages will only elevate the hurdles and costs of setting up on-premise servers. In the long term, this may put upward pressure on cloud costs as providers update and optimize their own infrastructure to effectively meet demand from generative AI.

Source: ibm.com

Model optimization is getting more accessible

Many key advancements have been (and will continue to be) driven not just by new foundation models, but by new techniques and resources (like open source datasets) for training, tweaking, fine-tuning or aligning pre-trained models. Notable model-agnostic techniques that took hold in 2023 include:

Source: ibm.com

Regulation, copyright and ethical AI concerns

Elevated multimodal capabilities and lowered barriers to entry also open up new doors for abuse: deepfakes, privacy issues, perpetuation of bias and even evasion of CAPTCHA safeguards may become increasingly easy for bad actors. In January of 2024, a wave of explicit celebrity deepfakes hit social media; research from May 2023 indicated that there had been 8 times as many voice deepfakes posted online compared to the same period in 2022.

Source: ibm.com

More powerful virtual agents

With more sophisticated, efficient tools and a year's worth of market feedback at their disposal, businesses are primed to expand the use cases for virtual agents beyond just straightforward customer experience chatbots As AI systems speed up and incorporate new streams and formats of information, they expand the possibilities for not just communication and instruction following, but also task automation. “2023 was the year of being able to chat with an AI. Multiple companies launched something, but the interaction was always you type something in and it types something back,” says Stanford's Norvig. “In 2024, we'll see the ability for agents to get stuff done for you . Make reservations, plan a trip, connect to other services.” Multimodal AI, in particular, significantly increases opportunities for seamless interaction with virtual agents. For example, rather than simply asking a bot for recipes, a user can point a camera at an open fridge and request recipes that can be made with available ingredients. Be My Eyes, a mobile app that connects blind and low vision individuals with volunteers to help with quick tasks, is piloting AI tools that help users directly interact with their surroundings through multimodal AI in lieu of awaiting a human volunteer.

Source: ibm.com

Shadow AI (and corporate AI policies)

For businesses, this escalating potential for legal, regulatory, economic or reputational consequences is compounded by how popular and accessible generative AI tools have become. Organizations must not only have a careful, coherent and clearly articulated corporate policy around generative AI, but also be wary of shadow AI: the “unofficial” personal use of AI in the workplace by employees.

Source: ibm.com

Moving forward

As we proceed through a pivotal year in artificial intelligence, understanding and adapting to emerging trends is essential to maximizing potential, minimizing risk and responsibly scaling generative AI adoption.

Source: ibm.com

Customized chatbots

Source: technologyreview.com

You get a chatbot! And you get a chatbot! In 2024, tech companies that invested heavily in generative AI will be under pressure to prove that they can make money off their products. To do this, AI giants Google and OpenAI are betting big on going small: both are developing user-friendly platforms that allow people to customize powerful language models and make their own mini chatbots that cater to their specific needs—no coding skills required. Both have launched web-based tools that allow anyone to become a generative-AI app developer.

Source: technologyreview.com

Generative AI's second wave will be video

Source: technologyreview.com

It's amazing how fast the fantastic becomes familiar. The first generative models to produce photorealistic images exploded into the mainstream in 2022—and soon became commonplace. Tools like OpenAI's DALL-E, Stability AI's Stable Diffusion, and Adobe's Firefly flooded the internet with jaw-dropping images of everything from the pope in Balenciaga to prize-winning art. But it's not all good fun: for every pug waving pompoms, there's another piece of knock-off fantasy art or sexist sexual stereotyping.

Source: technologyreview.com

AI-generated election disinformation will be everywhere

Source: technologyreview.com

If recent elections are anything to go by, AI-generated election disinformation and deepfakes are going to be a huge problem as a record number of people march to the polls in 2024. We're already seeing politicians weaponizing these tools. In Argentina, two presidential candidates created AI-generated images and videos of their opponents to attack them. In Slovakia, deepfakes of a liberal pro-European party leader threatening to raise the price of beer and making jokes about child pornography spread like wildfire during the country's elections. And in the US, Donald Trump has cheered on a group that uses AI to generate memes with racist and sexist tropes.

Source: technologyreview.com

Robots that multitask

Source: technologyreview.com

The last few years in AI have seen a shift away from using multiple small models, each trained to do different tasks—identifying images, drawing them, captioning them—toward single, monolithic models trained to do all these things and more. By showing OpenAI's GPT-3 a few additional examples (known as fine-tuning), researchers can train it to solve coding problems, write movie scripts, pass high school biology exams, and so on. Multimodal models, like GPT-4 and Google DeepMind's Gemini, can solve visual tasks as well as linguistic ones.

Source: technologyreview.com

Deep Dive

Large language models can do jaw-dropping things. But nobody knows exactly why.

Source: technologyreview.com

What's next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Source: technologyreview.com

Stay connected

Discover special offers, top stories, upcoming events, and more.

Source: technologyreview.com

Is robotics about to have its own ChatGPT moment?

Researchers are using generative AI and other techniques to teach robots new skills—including tasks they could perform in homes.

Source: technologyreview.com

Page updated

Google Sites

Report abuse