Tactile Intelligence in Robotic Hands
Based on Tactile Data Learning for
Manipulating Irregular Multiple Types of Objects
Development of autonomous generation technology for
high-DoF robotic hand motions based on learning
Supermicrosurgical Robot System for Sub-0.8mm Vessel Anastomosis through Human-Robot Autonomous Collaboration in Surgical Workflow Recognition
Development of surgical assistance and surgical technique evaluation technology based on surgical image analysis
Artificial Humans with Enhancing Embodied AI through Accumulated Experiential Interaction and Sharing
Development of Visuo-Tactile-Based Manipulation Intelligence for Humanoid Robots
Past research projects
Research on hierarchical reinforcement learning-based navigation and manipulation (2023-2024)
Development of navigation intelligence based on situational awareness (2022-2023)
Development of Interactive Kiosk for Barista Robot (2021-2022)
Intelligent Transformable Robot Mobility Multidisciplinary Research Cluster (2022-2023)
Development of tourism demand and congestion prediction technology for customized tourism curation (2021-2022)
Building persona-based virtual person montage data (2021)
Development of Recognition and Future Prediction Technology for Medical Assistive Robots (2020)
Development of Robot Hand Manipulation Intelligence to Learn Methods and Procedures for Handling Various Objects with Tactile Robot Hands (2019)
Intelligent Robot System for Human-Robot Emotional Interaction and Collaboration based on Machine Learning (2017-2019)
Machine Learning Based SMT Process Optimization System Development (2017-2019)
Development of Robot task intelligence technology that can perform task more than 80% in inexperience situation through autonomous knowledge acquisition and adaptational knowledge application (2015-2018)
Technology Development of Virtual Creatures with Digital Emotional DNA of Users (2015-2017)
We propose Context-conditional 2D Affordance Generation (CAG)—a language-leveraged affordance map generation model. We utilize foundation models to extract contextual knowledge from human video datasets where various objects are interacted with across different environments. Our approach successfully understands given objectives, even when presented with complex sentences, and generates relevant conditional affordance maps.
Geonkuk Kim, Tae-Min Choi, Shinsuk Park, and Juyoun Park*, “CAG: Context-conditional 2D Affordance Generation,” The IEEE International Conference on Image Processing (IEEE ICIP 2025), Sep., 2025, Accepted.
[Placed 4th in the Surgical Instrument Instance Segmentation Task of the PhaKIR 2024 Challenge part of MICCAI 2024]
We formulate a new task called Robust text-promptable Surgical Instrument Segmentation (R-SIS). Under this setting, prompts are issued for all candidate categories without access to instrument presence information. We evaluate existing segmentation methods under the R-SIS protocol using surgical video datasets and observe substantial false-positive predictions in the absence of ground-truth instruments. These findings demonstrate a mismatch between current evaluation protocols and real-world use cases, and support the need for benchmarks that explicitly account for prompt uncertainty and instrument absence.
Choi, Tae-Min, and Juyoun Park. "RoSIS: Robust Framework for Text-Promptable Surgical Instrument Segmentation Using Vision-Language Fusion." arXiv preprint arXiv:2411.12199.
We introduce the Textual Attention Region Proposal Network (TA-RPN). This network enhances proposal generation by integrating visual and textual features from the CLIP text encoder, utilizing pixel-wise attention for a comprehensive fusion across the image space. Our approach also incorporates prompt learning to optimize textual features for better localization. TA-RPN outperforms existing state-of-the-art methods, demonstrating its effectiveness in detecting novel object categories.
Tae-Min Choi, Inug Yoon, Jong-Hwan Kim, and Juyoun Park, “Textual Attention RPN for Open-Vocabulary Object Detection,” The 35th British Machine Vision Conference (BMVC) 2024, Nov., 2024, url: https://bmvc2024.org/proceedings/85/.
We propose a multi-level semantic segmentation data generation method based on a scene-specific word tree. Our scene-specific word trees leverage linguistic hierarchies to group scene components by considering relationships between words in that scene. In the proposed data generation method, we build each word tree within a single image, so it enables us to group the objects to user-defined levels with the relative relationship between objects in that specific scene.
Soomin Kim and Juyoun Park, “Multi-Level Segmentation Data Generation based on a Scene-Specific Word Tree,” IEEE Access, vol. 12, Jun. 2024, doi: 10.1109/ACCESS.2024.3418515.
We exploit the heterogeneous MARL algorithm with a novel modularized policy network that can consider the agents’ heterogeneity while learning cooperative tasks. Furthermore, each module network can be separately trained for its role, improving adaptation in new environments. We conduct experiments to demonstrate the effect of modularized network-based policy on enhancing the heterogeneity of teams and adapting well to unseen environments or scenarios.
Hyeong Tae Kim and Juyoun Park, “Heterogeneous Multi-Agent Reinforcement Learning based on Modularized Policy Network,” Expert Systems With Applications, vol. 284, Jul. 2025, doi: 10.1016/j.eswa.2025.127856.
In our Peer Review-based Policy Learning (PRPL) algorithm for the homogeneous multi-agent reinforcement learning (MARL) tasks, each agent receives advice from other teammates on how to respond in their current situation. Most importantly, unlike conventional actor-centric suggestion approaches, we utilize other teammates’ particular advice that only considers their interests.
Hyeong Tae Kim and Juyoun Park, “Peer Review based Credit Assignment for Collaborative Behaviour in Homogeneous Multi-Agent Reinforcement Learning.”
We propose a path planning system for robot Navigation Among Movable and IMmovable Obstacles (NAMIMO) with hierarchical reinforcement learning approach: a higher-level agent learns to predict whether the obstacle is movable or not by using the Contrastive Language-Image Pre-Training (CLIP) method while lower-level agents are responsible for avoiding the obstacles or removing them.
Han Jun Bae and Juyoun Park, “Hierarchical Reinforcement Learning for Navigation among Movable and Immovable Obstacles,” IEEE Access, 2025.
To address challenges posed by a large action space, we propose a base action combination method based on Hierarchical Reinforcement Learning (HRL). This approach involves training lower-level agents to anticipate the value of base actions, which can represent all action policies in a linear combination of themselves. Additionally, a higher-level agent learns to predict the value of combination action, how to combine the base actions.
Han Jun Bae and Juyoun Park, “Reinforcement Learning in Large Action Space through Base Action Combinations.”
The proposed memory architecture can memorize information acquired through interactions with people and learn meaningful knowledge inherent in the information. Based on the memory architecture that remembers the order history of returning customers and learns what factors affect each customer’s taste, we develop a kiosk system that allows emotional interaction with coffee recommendations that consider each customer’s taste.
Juyoun Park, Jinyoung Lee, and Ig-Jae Kim, “Emotional Interactive Kiosk System for Automatic Ordering,” The 18th Korea Robotics Society Annual Conference (KRoC 2023), Feb., 2023.
The proposed online robotic tool detection method can extract visual features that focus on each robotic surgical tool without prior learning. Based on the features, the encoder-decoder framework is designed to recognize the current surgical action and predict the sequence of the next surgical actions to be performed.
Juyoun Park and Chung Hyuk Park, “Recognition and Prediction of Surgical Actions Based on Online Robotic Tool Detection,” IEEE Robotics and Automation Letters (RA-L), vol. 6, no. 2, Feb. 2021, doi: 10.1109/LRA.2021.3060410.
The simulation allows the robot to learn the policy of moving closer to the person to initiate a simple medical interaction such as temperature measuring. Facial expression of a person in confrontation of a robot and the distance between the person and the robot are incorporated in a reinforcement learning process. In the simulation, the robot moves according to the learned trust knowledge to lower the discomfort level and successfully approach a person.
Juyoun Park and Chung Hyuk Park, “Trust Learning for Initiating Physical Human-Robot Interaction,” ICRA Workshop on Integrating Multidisciplinary Approaches to Advance Physical Human-Robot Interaction, IEEE International Conference on Robotics and Automation (ICRA), May 31, 2020.
The proposed propose MarsNet, a CNN based end-to-end network for multi-label classification with an ability to accept various size inputs, is applied to an PCB solder paste inspection (SPI) task. SPI is usually added as a supplementary component in Surface Mount Technology (SMT) assembly to inspect whether the solder is applied properly on the PCB in a screen printer.
Ju-Youn Park, Yewon Hwang, Dukyoung Lee, and Jong-Hwan Kim, “MarsNet: Multi-label Classification for Images of Various Sizes for PCB Solder Paste Inspection,” IEEE Access, vol. 8, no. 1, Jan. 2020, doi: 10.1109/ACCESS.2020.2969217.
The proposed online incremental hierarchical classification resonance network (OIHCRN) that shows superior performance in online incremental hierarchical classification is applied to the multimedia recommendation system for digital storytelling on a digital companion embedded in a smartphone.
Ju-Youn Park and Jong-Hwan Kim, “Online Incremental Hierarchical Classification Resonance Network,” Pattern Recognition, vol. 111, Mar. 2021, doi: 10.1016/j.patcog.2020.107672.
The online face identification system is demonstrated in real time for a scenario in which two people interact with a robot. The robot calls the name and greets the person the robot has met before. When the robot meets a new person, the robot asks the person for a name and uses it to learn facial identity. Human identity is learned in real time and immediately reflected in the interaction.
Juyoun Park, "Online Incremental Classification Resonance Networks for Human-Robot Interaction," Ph.D. Dissertation in Electrical Engineering from KAIST, 2019.
The proposed online incremental classification resonance network (OICRN) that enables incremental class learning in multi-class classification with high performance online is applied to the face identification system on a humanoid robot. The robot learns human face identities through human-robot interaction.
Ju-Youn Park and Jong-Hwan Kim, “Online Incremental Classification Resonance Network and Its Application to Human-Robot Interaction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31 , no. 5 , Jun. 2019, doi: 10.1109/TNNLS.2019.2920158.
The proposed adaptive resonance theory-supervised predictive mapping for hierarchical classification (ARTMAP-HC) network that allows incremental class learning for raw data without normalization in advance is applied to the multimedia recommendation system for digital storytelling on a digital companion embedded in a smartphone.
Ju-Youn Park and Jong-Hwan Kim, “Incremental Class Learning for Hierarchical Classification,” IEEE Transactions on Cybernetics, vol. 50, no. 1, Sep. 2018, doi: 10.1109/TCYB.2018.2866869.