Research

Research Projects

Tactile Intelligence in Robotic Hands

Based on Tactile Data Learning for

Manipulating Irregular Multiple Types of Objects

Development of autonomous generation technology for

high-DoF robotic hand motions based on learning

Supermicrosurgical Robot System for Sub-0.8mm Vessel Anastomosis through Human-Robot Autonomous Collaboration in Surgical Workflow Recognition

Development of surgical assistance and surgical technique evaluation technology based on surgical image analysis

Artificial Humans with Enhancing Embodied AI through Accumulated Experiential Interaction and Sharing

Development of Visuo-Tactile-Based Manipulation Intelligence for Humanoid Robots

Past research projects

Research on hierarchical reinforcement learning-based navigation and manipulation (2023-2024)
Development of navigation intelligence based on situational awareness (2022-2023)
Development of Interactive Kiosk for Barista Robot (2021-2022)
Intelligent Transformable Robot Mobility Multidisciplinary Research Cluster (2022-2023)
Development of tourism demand and congestion prediction technology for customized tourism curation (2021-2022)
Building persona-based virtual person montage data (2021)
Development of Recognition and Future Prediction Technology for Medical Assistive Robots (2020)
Development of Robot Hand Manipulation Intelligence to Learn Methods and Procedures for Handling Various Objects with Tactile Robot Hands (2019)
Intelligent Robot System for Human-Robot Emotional Interaction and Collaboration based on Machine Learning (2017-2019)
Machine Learning Based SMT Process Optimization System Development (2017-2019)
Development of Robot task intelligence technology that can perform task more than 80% in inexperience situation through autonomous knowledge acquisition and adaptational knowledge application (2015-2018)
Technology Development of Virtual Creatures with Digital Emotional DNA of Users (2015-2017)

Research Output

Surgical Video Understanding with Label Interpolation

We propose a novel framework that combines optical flow–based segmentation label interpolation with multi-task learning. optical flow estimated from annotated key frames is used to propagate labels to adjacent unlabeled frames, thereby enriching sparse spatial supervision and balancing temporal and spatial information for training. This integration improves both the accuracy and efficiency of surgical scene understanding and, in turn, enhances the utility of robot-assisted surgery.

Garam Kim, Tae Kyeong Jeong, and Juyoun Park. "Surgical Video Understanding with Label Interpolation." arXiv preprint arXiv:2509.18802. [dataset]

Microsurgical Instrument Segmentation for Robot-Assisted Surgery

We propose Microsurgery Instrument Segmentation for Robotic Assistance(MISRA), a segmentation framework that augments RGB input with luminance channels, integrates skip attention to preserve elongated features, and employs an Iterative Feedback Module(IFM) for continuity restoration across multiple passes. In addition, we introduce a dedicated microsurgical dataset with fine-grained annotations of surgical instruments including thin objects, providing a benchmark for robust evaluation.

Tae Kyeong Jeong, Garam Kim, and Juyoun Park. "Microsurgical Instrument Segmentation for Robot-Assisted Surgery." arXiv preprint arXiv:2509.11727. [dataset]

CAG: Context-conditional 2D Affordance Generation

We propose Context-conditional 2D Affordance Generation (CAG)—a language-leveraged affordance map generation model. We utilize foundation models to extract contextual knowledge from human video datasets where various objects are interacted with across different environments. Our approach successfully understands given objectives, even when presented with complex sentences, and generates relevant conditional affordance maps.

Geonkuk Kim, Tae-Min Choi, Shinsuk Park, and Juyoun Park*, “CAG: Context-conditional 2D Affordance Generation,” The IEEE International Conference on Image Processing (IEEE ICIP 2025), Sep., 2025, doi: 10.1109/ICIP55913.2025.11084719.

Rethinking Text-Promptable Surgical Instrument Segmentation with Robust Framework

[Placed 4th in the Surgical Instrument Instance Segmentation Task of the PhaKIR 2024 Challenge part of MICCAI 2024]

We formulate a new task called Robust text-promptable Surgical Instrument Segmentation (R-SIS). Under this setting, prompts are issued for all candidate categories without access to instrument presence information. We evaluate existing segmentation methods under the R-SIS protocol using surgical video datasets and observe substantial false-positive predictions in the absence of ground-truth instruments. These findings demonstrate a mismatch between current evaluation protocols and real-world use cases, and support the need for benchmarks that explicitly account for prompt uncertainty and instrument absence.

Choi, Tae-Min, and Juyoun Park. "RoSIS: Robust Framework for Text-Promptable Surgical Instrument Segmentation Using Vision-Language Fusion." arXiv preprint arXiv:2411.12199.

Textual Attention RPN for Open-Vocabulary Object Detection

We introduce the Textual Attention Region Proposal Network (TA-RPN). This network enhances proposal generation by integrating visual and textual features from the CLIP text encoder, utilizing pixel-wise attention for a comprehensive fusion across the image space. Our approach also incorporates prompt learning to optimize textual features for better localization. TA-RPN outperforms existing state-of-the-art methods, demonstrating its effectiveness in detecting novel object categories.

Tae-Min Choi, Inug Yoon, Jong-Hwan Kim, and Juyoun Park, “Textual Attention RPN for Open-Vocabulary Object Detection,” The 35th British Machine Vision Conference (BMVC) 2024, Nov., 2024, url: https://bmvc2024.org/proceedings/85/.

Multi-level Segmentation Data Generation based on a Scene-Specific Word Tree

We propose a multi-level semantic segmentation data generation method based on a scene-specific word tree. Our scene-specific word trees leverage linguistic hierarchies to group scene components by considering relationships between words in that scene. In the proposed data generation method, we build each word tree within a single image, so it enables us to group the objects to user-defined levels with the relative relationship between objects in that specific scene.

Soomin Kim and Juyoun Park, “Multi-Level Segmentation Data Generation based on a Scene-Specific Word Tree,” IEEE Access, vol. 12, Jun. 2024, doi: 10.1109/ACCESS.2024.3418515.

Modularized Heterogeneous Multi-Agent Reinforcement Learning

We exploit the heterogeneous MARL algorithm with a novel modularized policy network that can consider the agents’ heterogeneity while learning cooperative tasks. Furthermore, each module network can be separately trained for its role, improving adaptation in new environments. We conduct experiments to demonstrate the effect of modularized network-based policy on enhancing the heterogeneity of teams and adapting well to unseen environments or scenarios.

Hyeong Tae Kim and Juyoun Park, “Heterogeneous Multi-Agent Reinforcement Learning based on Modularized Policy Network,” Expert Systems With Applications, vol. 284, Jul. 2025, doi: 10.1016/j.eswa.2025.127856.

Peer Review based Credit Assignment in Homogeneous Multi-Agent Reinforcement Learning

In our Peer Review-based Policy Learning (PRPL) algorithm for the homogeneous multi-agent reinforcement learning (MARL) tasks, each agent receives advice from other teammates on how to respond in their current situation. Most importantly, unlike conventional actor-centric suggestion approaches, we utilize other teammates’ particular advice that only considers their interests.

Hyeong Tae Kim and Juyoun Park, “Peer Review based Credit Assignment for Collaborative Behaviour in Homogeneous Multi-Agent Reinforcement Learning.”

Hierarchical Reinforcement Learning for Navigation among Movable and Immovable Obstacles

We propose a path planning system for robot Navigation Among Movable and IMmovable Obstacles (NAMIMO) with hierarchical reinforcement learning approach: a higher-level agent learns to predict whether the obstacle is movable or not by using the Contrastive Language-Image Pre-Training (CLIP) method while lower-level agents are responsible for avoiding the obstacles or removing them.

Han Jun Bae and Juyoun Park, “Hierarchical Reinforcement Learning for Navigation among Movable and Immovable Obstacles,” IEEE Access, 2025.

Reinforcement Learning in Large Action Space through Base Action Combinations

To address challenges posed by a large action space, we propose a base action combination method based on Hierarchical Reinforcement Learning (HRL). This approach involves training lower-level agents to anticipate the value of base actions, which can represent all action policies in a linear combination of themselves. Additionally, a higher-level agent learns to predict the value of combination action, how to combine the base actions.

Han Jun Bae and Juyoun Park, “Reinforcement Learning in Large Action Space through Base Action Combinations.”

Memory-based Emotional Interactive Kiosk [Exhibited at Robot World 2023]

The proposed memory architecture can memorize information acquired through interactions with people and learn meaningful knowledge inherent in the information. Based on the memory architecture that remembers the order history of returning customers and learns what factors affect each customer’s taste, we develop a kiosk system that allows emotional interaction with coffee recommendations that consider each customer’s taste.

Juyoun Park, Jinyoung Lee, and Ig-Jae Kim, “Emotional Interactive Kiosk System for Automatic Ordering,” The 18th Korea Robotics Society Annual Conference (KRoC 2023), Feb., 2023.

Recognition and Prediction of Surgical Actions Based on Online Robotic Tool Detection

The proposed online robotic tool detection method can extract visual features that focus on each robotic surgical tool without prior learning. Based on the features, the encoder-decoder framework is designed to recognize the current surgical action and predict the sequence of the next surgical actions to be performed.

Juyoun Park and Chung Hyuk Park, “Recognition and Prediction of Surgical Actions Based on Online Robotic Tool Detection,” IEEE Robotics and Automation Letters (RA-L), vol. 6, no. 2, Feb. 2021, doi: 10.1109/LRA.2021.3060410.

Trust Learning for Initiating Physical Human-Robot Interaction

The simulation allows the robot to learn the policy of moving closer to the person to initiate a simple medical interaction such as temperature measuring. Facial expression of a person in confrontation of a robot and the distance between the person and the robot are incorporated in a reinforcement learning process. In the simulation, the robot moves according to the learned trust knowledge to lower the discomfort level and successfully approach a person.

Juyoun Park and Chung Hyuk Park, “Trust Learning for Initiating Physical Human-Robot Interaction,” ICRA Workshop on Integrating Multidisciplinary Approaches to Advance Physical Human-Robot Interaction, IEEE International Conference on Robotics and Automation (ICRA), May 31, 2020.

MarsNet: Multi-label Classification for Images of Various Sizes for PCB Printer Inspection

The proposed propose MarsNet, a CNN based end-to-end network for multi-label classification with an ability to accept various size inputs, is applied to an PCB solder paste inspection (SPI) task. SPI is usually added as a supplementary component in Surface Mount Technology (SMT) assembly to inspect whether the solder is applied properly on the PCB in a screen printer.

Ju-Youn Park, Yewon Hwang, Dukyoung Lee, and Jong-Hwan Kim, “MarsNet: Multi-label Classification for Images of Various Sizes for PCB Solder Paste Inspection,” IEEE Access, vol. 8, no. 1, Jan. 2020, doi: 10.1109/ACCESS.2020.2969217.

Multimedia Recommendation for Digital Storytelling with OIHCRN

The proposed online incremental hierarchical classification resonance network (OIHCRN) that shows superior performance in online incremental hierarchical classification is applied to the multimedia recommendation system for digital storytelling on a digital companion embedded in a smartphone.

Ju-Youn Park and Jong-Hwan Kim, “Online Incremental Hierarchical Classification Resonance Network,” Pattern Recognition, vol. 111, Mar. 2021, doi: 10.1016/j.patcog.2020.107672.

Online Incremental Face Identification System through Human-Robot Interaction

The online face identification system is demonstrated in real time for a scenario in which two people interact with a robot. The robot calls the name and greets the person the robot has met before. When the robot meets a new person, the robot asks the person for a name and uses it to learn facial identity. Human identity is learned in real time and immediately reflected in the interaction.

Juyoun Park, "Online Incremental Classification Resonance Networks for Human-Robot Interaction," Ph.D. Dissertation in Electrical Engineering from KAIST, 2019.

Online Incremental Classification Resonance Network applied to Human-Robot Interaction

The proposed online incremental classification resonance network (OICRN) that enables incremental class learning in multi-class classification with high performance online is applied to the face identification system on a humanoid robot. The robot learns human face identities through human-robot interaction.

Ju-Youn Park and Jong-Hwan Kim, “Online Incremental Classification Resonance Network and Its Application to Human-Robot Interaction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31 , no. 5 , Jun. 2019, doi: 10.1109/TNNLS.2019.2920158.

Multimedia Recommendation for Digital Storytelling with ARTMAP-HC

The proposed adaptive resonance theory-supervised predictive mapping for hierarchical classification (ARTMAP-HC) network that allows incremental class learning for raw data without normalization in advance is applied to the multimedia recommendation system for digital storytelling on a digital companion embedded in a smartphone.

Ju-Youn Park and Jong-Hwan Kim, “Incremental Class Learning for Hierarchical Classification,” IEEE Transactions on Cybernetics, vol. 50, no. 1, Sep. 2018, doi: 10.1109/TCYB.2018.2866869.

Page updated

Google Sites

Report abuse