By David Song '25
Abstract:
Artificial Intelligence (AI) has revolutionized the field of computer generation and recognition, completely remodeling both the efficiency and applicability of state-of-the-art technologies. Recent AI technologies have completely turned from being professional algorithms to an everyday application available to the public. This article explores the current state of AI technology, its application, structures, and considerations in the context of image generation and recognition.
Introduction:
The advancements and applications of artificial intelligence have skyrocketed over the past few years; in fact, almost every novel technology labeled as “autonomous” or “automatic,” such as autonomous vehicles, robots, drones, and even face recognition uses AI. This article would mostly be exploring the deep learning systems, which are the most commonly used system, in the larger sphere of machine learning (ML) algorithms in both generation and recognition.
Image Generation:
Artificial neural networks (ANN), specifically deep learning models, all fall under the larger category of ML, which uses large amounts of labeled data to accomplish similar tasks as the data offered. Generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), are used in various art, design, and even medical fields for creating similar, but novel images based on any style the algorithms had in the datasets. If large amounts of oil painting data was sent into the generative models, for example, the art it generates would also be similar to oil paintings.
Most common day generative AI are text-to-image (TTI) models, which, as its name suggests, creates sophisticated images based on the text or “tags” put in by the user. The power of these AI is that a large model would be able to generate images almost instantaneously while not having limitations or inconsistency. It should also be noted that many systems, such as ChatGPT and other natural language processing (NLP) architectures are also counted as generative AI.
Despite the remarkable progress AI achieved, old and new concerns such as content control and ethical concerns still persist. Anyone who is familiar with art generation algorithms has probably met problems with generating the hand or fingers as it is difficult to tell the algorithm what a finger is and how many there should be. Generative AI had also raised unease because of various generations, which might replace artists, programmers, and other works; deep fakes, which are often used for satire or deception; and lack of content control, which could lead to audiences of different ages and identities interacting with possible prejudiced or unhealthy content.
Image Recognition:
ANN are as useful in image recognition as they are in image generations. They enable computer vision to identify objects, faces, and other visual elements. Their applications range from autonomous vehicles and augmented security cameras to medical and even relationship analysis.
Convolution Neural Networks (CNN) are by far the most used type of recognition models. Its model convolutes through any image using small, usually three by three or four by four patches, in order to learn the features of the labeled object. The CNN first starts “learning” different combinations of lines, edges, curves, and colors of the object, and goes into more complex features with the addition of activation and other auxiliary functions. Facial recognition, autonomous driving, among many other common-day technologies uses CNN.
Although CNN is very flexible, there are some types of detection which it cannot accomplish. Deducing the relationship between people in an image, such as who is hitting who and what they are standing on in a karate fight for example, or acting upon three dimensional files such as CT images, would not be possible with basic CNN and would need other recognition models such as Graph Neural Networks (CNN) to accomplish.
The widespread use of AI in image recognition raises concerns over both the reliability of the algorithms and ethical concerns such as privacy and surveillance. Tesla, for example, has been sued before because of an accident caused by their autonomous algorithms.
Future Directions:
AI is progressing at a terrifying rate, with many versions of advanced algorithms such as You Only Look Once (YOLO), which is one of the most efficient CNN systems, and ChatGPT, which just announced its newest release on November 6, coming out in an annual or even shorter timespan. Future AI will continue to improve accuracy and speed, both resolving some current concerns and opening paths to new fields.
Conclusion:
Artificial Intelligence has transformed image generation and recognition, empowering improved professional and amateur utilizations in art, security, and medical fields. AI, however, grows at an alarming rate, creating many new stability and ethical concerns.
References
Raghav, P. (2018, March 4). Understanding of Convolutional Neural Network (CNN) — Deep Learning. Medium. https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
Bie, F., Yang, Y., Zhou, Z., Ghanem, A., Zhang, M., Yao, Z., Wu, X., Holmes, C., Golnari, P., Clifton, D. A., He, Y., Tao, D., & Song, S. L. (2023). RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model. ArXiv.org. https://arxiv.org/abs/2309.00810
Lisa, A. (2021, December 9). An Intro to AI Image Recognition and Image Generation. Hackernoon.com. https://hackernoon.com/an-intro-to-ai-image-recognition-and-image-generation
Natalie. ChatGPT — Release Notes | OpenAI Help Center. (2023). Openai.com. https://help.openai.com/en/articles/6825453-chatgpt-release-notes
YOLOv8: A New State-of-the-Art Computer Vision Model. (2023). Yolov8.com. https://yolov8.com/