Abdulkabir Ayanniyi
Introduction
Generative artificial intelligence or generative AI or GenAI or GAI is the use of artificial intelligence systems to generate original media such as text, images, video, or audio, often in response to prompt. [Figure 1] Generative AI models learn the patterns and structure of their input training data and then generate new data that has similar characteristics.1 Examples include large language model (LLM) chatbot (ChatGPT), Copilot, Gemini, LlaMA, text-to-image AI art systems (Stable Diffusion), Midjourney and DALL-E, and text-to-video AI generators (Sora). Companies such as OpenAI, Anthropc, Microsoft, Google, and Baidu as well as numerous smaller firms have developed generative AI models. Generative AI systems trained on words or word tokens include GPT-3, LaMDA, LLaMA, BLOOM, GPT-4, Gemini and others. They are capable of natural language processing, machine translation, and natural language generation and can be used as foundation models for other tasks. Data sets including BookCorpus, Wikipedia and others. Producing high-quality visual art is a prominent application of generative AI. Generative AI systems trained on sets of images with text captions include Imagen, DALL-E, Midjourney, Adobe Firefly, Stable Diffusion and others. They are commonly used for text-to-image generation and neural style transfer.2 [Figures 2,3]
History: The concept of automated art dates back at least to the automata of ancient Greek civilization, where inventors such as Daedalus and Hero of Alexandria were described as having designed machines capable of writing text, generating sounds, and playing music.3 The academic discipline of artificial intelligence was established at a research workshop held at dartmouth College in 1956 and has experienced several waves of advancement and optimism in the decades since.4 Improvements in transformer-based deep neural networks enabled an AI boom of generative AI systems in the early 2020s. [Figure 4]
Modalities: A generative AI system is constructed by applying unsupervised or self-supervised machine learning to a data set. The capabilities of a generative AI system depend on the modality or type of the data set used. Generative AI can be either unimodal or multimodal; unimodal systems take only one type of input, whereas multimodal systems can take more than one type of input.5
Uses: Generative AI has uses across a wide range of industries, including software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, fashion, and product design. [Figure 5] In coding, large language models (OpenAI Codex) has been trained on programming language text, allowing them to generate source code for new computer programs. Generative AI can also be trained extensively on audio clips to produce natural-sounding speech synthesis and text-to-speech capabilities, exemplified by ElevenLabs' context-aware synthesis tools or Meta Platform's Voicebox. Generative AI trained on annotated video (Gen-1 and Gen-2) can generate temporally-coherent, detailed and photorealistic video clips. Generative AI systems can be trained on sequences of amino acids or molecular representations such as SMILES representing DNA or proteins. Generative AI can also be trained on the motions of a robotic system to generate new trajectories for motion planning or navigation (e.g UniPi from Google Research uses prompts like pick up blue bowl or wipe plate with yellow sponge to control movements of a robot arm.6 GenAI systems are often used to develop synthetic data as an alternative to data produced by real-world events. Such data can be deployed to validate mathematical models and to train machine learning models while preserving user privacy, including for structured data. Artificially intelligent computer-aided design (CAD) can use text-to-3D, image-to-3D, and video-to-3D to automate 3D modeling. Ai CAD libraries could also be developed using linked open data of schematics and diagrams. Ai CAD assistantts are used as tools to help streamline workflow. Also, GenAI models are used to power chatbot (ChatGPT), programming tools (GitHub Copilot), text-to-image products (Midjourney), and text-to-video products (Runway Gen-2).7 Generative AI features have been integrated into a variety of existing commercially available products such as Microsoft Office, Google Photos, and Adobe Photoshop. Larger models with tens of billions of parameters can run on laptop or desktop computers. The advantages of running generative AI locally include protection of privacy and intellectual property, and avoidance of rate limiting and censorship. Language models with hundreds of billions of parameters, such as GPT-4 or PaLM, typically run on datacenter computers equipped with arrays of GPU (such as NVIDIA's H100) or AI Accelerator chips (such as Google's TPU). These very large models are typically accessed as cloud services over the Internet. There is free software on the market capable of recognizing text generated by generative artificial intelligence (such as GPTZero), as well as images, audio or video coming from it. [Figure 6]
Disadvantages: Generative AI has limited creativity as it can only generate new data based on what it has learned from existing data and cannot think beyond that. Also Generative AI can also be biased if the data it was trained on is biased especially against a group. Further, it has limited application in cases where there is limited data available, or the data is highly complex. Additionally, generative AI requires significant computing resources and training time making it expensive to train and deploy generative AI models. Finally, there is ethical concern about potential misuse or malicious use of GenAI such as cybercrimes, generating fake news, deepfakes, or other types of false information.
Law and regulation: Limiting the risks associated with generative AI has been of interest. This has been demonstrated by the US (companies to report information to the federal government when training large AI models), European Union (requirements to disclose copyrighted material used to train generative AI systems, and to label any AI-generated output as such) and China (watermark generated images or videos, regulations on training data and label quality, restrictions on personal data collection, and a guideline that generative AI must "adhere to socialist core values").8
Training with copyrighted content: Generative AI systems such as ChatGPT and Midjourney are trained on large, publicly available datasets that include copyrighted works. AI developers have argued that such training is protected under fair use, while copyright holders have argued that it infringes their rights.
Copyright of AI-generated content: The US Copyright Office has ruled that works created by AI without any human input cannot be copyrighted, because they lack human authorship. However, the office has also begun taking public input to determine if these rules need to be refined for generative AI.
Conclusion: Generative artificial intelligence is the use of artificial intelligence systems to generate original media such as text, images, video, or audio, often in response to prompts. The concept of AI dates back to antiquity rudimentary though. The academic discipline of artificial intelligence was established in 1956 and has experienced several waves of advancement and optimism in the decades since followed by a boom of generative AI systems in the early 2020s. Generative AI has uses in software development, healthcare, finance, entertainment, customer service, sales and marketing, art, writing, fashion, and product design. Its drawbacks include limited creativity, can be biased, limited application in finite data, expensive and can be used maliciously. The need for the law regulating generative AI is underscored.
References
1. Pasick, Adam (March 27, 2023). "Artificial Intelligence Glossary: Neural Networks and Other Terms Explained". The New York Times. ISSN 0362-4331. Retrieved February 21, 2024.
2. Epstein, Ziv; Hertzmann, Aaron; Akten, Memo; Farid, Hany; Fjeld, Jessica; Frank, Morgan R.; Groh, Matthew; Herman, Laura; Leach, Neil; Mahari, Robert; Pentland, Alex “Sandy”; Russakovsky, Olga; Schroeder, Hope; Smith, Amy (2023). "Art and the science of generative AI". Science. 380 (6650): 1110–1111. arXiv:2306.04141. Bibcode:2023Sci...380.1110E. doi:10.1126/science.adh4451. PMID 37319193. S2CID 259095707.
3. Crevier, Daniel (1993). AI: The Tumultuous Search for Artificial Intelligence. New York, New York: BasicBooks. p. 109. ISBN 0-465-02997-3.
4. "How Generative AI Can Augment Human Creativity". Harvard Business Review. June 16, 2023. ISSN 0017-8012. Retrieved February 21, 2024.
5. "finetune-transformer-lm". GitHub. Retrieved February 21, 2024.
6. Agostinelli, Andrea; Denk, Timo I.; Borsos, Zalán; Engel, Jesse; Verzetti, Mauro; Caillon, Antoine; Huang, Qingqing; Jansen, Aren; Roberts, Adam; Tagliasacchi, Marco; Sharifi, Matt; Zeghidour, Neil; Frank, Christian (January 26, 2023). "MusicLM: Generating Music From Text". arXiv:2301.11325 [cs.SD].
7. Stöckl, Andreas (November 2, 2022). "Evaluating a Synthetic Image Dataset Generated with Stable Diffusion". arXiv:2211.01777 [cs.CV].
8. Ye, Josh (July 13, 2023). "China says generative AI rules to apply only to products for the public". Reuters. Retrieved February 20, 2024.
Figure 2: Generative AI Application Landscape.
https://images1.calcalist.co.il/picserver3/crop_images/2022/10/27/rJV6cnwNj/rJV6cnwNj_0_0_1080_1440_0_x-large.jpg
Figure 4: History of Generative AI
https://www.venturecrowd.com.au/file-asset/GenerativeAIDevelopments_Timeline02?v=1
Figure 5: Applications of Generative AI
https://redblink.com/wp-content/uploads/2023/05/Use-cases-for-generative-AI.png
Figure 6: Applications of Generative AI
https://research-assets.cbinsights.com/2023/03/07153053/Gen-AI-applications-in-retail-030323-VF.png