歡迎對電腦視覺、資料分析與處理、人工智慧技術、自然語言有興趣的同學,一同加入研究行列。
歡迎對電腦視覺、資料分析與處理、人工智慧技術、自然語言有興趣的同學,一同加入研究行列。
Reidentification, Scene Understanding, Object Detection, Denoising, Super Resolution, Dense Captioning
Stock Forecasting, Data Mining, Natural Language Processing, Prediction, Reasoning
Human-Like Intelligence, Innovative Artificial Intelligence Model, Reinforcement Learning
The Robotic Vision Scene Understanding Challenge evaluates how well a robotic vision system can understand the semantic and geometric aspects of its environment. The challenge consists of two distinct tasks: Object-based Semantic SLAM, and Scene Change Detection.
Hairui Yang, Baoli Sun, Baopu Li, Caifei Yang, Zhihui Wang, Jenhui Chen, Lei Wang, and Haojie Li, "Iterative Class Prototype Calibration for Transductive Zero-shot Learning," IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1236-1246, March 2023. (SCI, IF: 8.4, Rank: 25/275 (9.091% (Q1), 2022) in Engineering, Electrical & Electronic) DOI: 10.1109/TCSVT.2022.320920
Zero-shot learning (ZSL) typically suffers from the domain shift issue since the projected feature embedding of unseen samples mismatch with the corresponding class semantic prototypes, making it very challenging to fine-tune an optimal visual-semantic mapping for the unseen domain. Some existing transductive ZSL methods solve this problem by introducing unlabeled samples of the unseen domain, in which the projected features of unseen samples are still not discriminative and tend to be distributed around prototypes of seen classes. Therefore, how to effectively align the projection features of samples in unseen classes with corresponding predefined class prototypes is crucial for promoting the generalization of ZSL models. In this paper, we propose a novel Iterative Class Prototype Calibration (ICPC) framework for transductive ZSL which consists of a pseudo-labeling stage and a model retraining stage to address the above key issue. First, in the labeling stage, we devise a Class Prototype Calibration (CPC) module to calibrate the predefined class prototypes of the unseen domain by estimating the real center of projected feature distribution, which achieves better matching of sample points and class prototypes. Next, in the retraining stage, we devise a Certain Samples Screening (CSS) module to select relatively certain unseen samples with high confidence and align them with predefined class prototypes in the embedding space. A progressive training strategy is adopted to select more certain samples and update the proposed model with augmented training data. Extensive experiments on AwA2, CUB, and SUN datasets demonstrate that the proposed scheme achieves new state-of-the-art in the conventional setting under both standard split (SS) and proposed split (PS).
零樣本學習(ZSL)通常會遇到域轉移問題,因為未見樣本的投影特徵嵌入與相應的類語義原型不匹配,這使得針對未見域微調最佳視覺語義映射非常具有挑戰性。一些現有的轉導式 ZSL 方法透過引入未見域的未標記樣本來解決這個問題,其中未見樣本的投影特徵仍然不具有區分性,並且往往分佈在已見類的原型周圍。因此,如何有效地將未見類別中樣本的投影特徵與相應的預定義類別原型對齊對於促進ZSL模型的泛化至關重要。在本文中,我們提出了一種新穎的用於轉導式 ZSL 的迭代類原型校準(ICPC)框架,該框架由偽標記階段和模型再訓練階段組成,以解決上述關鍵問題。首先,在標記階段,我們設計了類原型校準(CPC)模組,透過估計投影特徵分佈的真實中心來校準未見域的預定義類原型,從而實現樣本點和類原型的更好匹配。接下來,在再訓練階段,我們設計了一個特定樣本篩選(CSS)模組,以高置信度選擇相對確定的未見過的樣本,並將它們與嵌入空間中預先定義的類別原型對齊。採用漸進式訓練策略來選擇更多確定的樣本,並以增強的訓練資料更新所提出的模型。對 AwA2、CUB 和 SUN 資料集的大量實驗表明,所提出的方案在標準分割(SS)和建議分割(PS)下的傳統設定中實現了新的最先進技術。
Zhiqun Hu, Fukun Yang, Zhaoming Lu, and Jenhui Chen*, "Enhancing Autonomous Lane-changing Safety: Deep Reinforcement Learning via Pre-exploration in Parallel Imaginary Environments," IEEE Transactions on Industrial Informatics, vol. 20, no. 10, pp. 12385-12395, October 2024. (SCI, IF: 11.7, Rank: 3/169 (1.78% (Q1), 2023) in Computer Science, Interdisciplinary Applications) DOI: 10.1109/TII.2024.3423423
This paper introduces a novel deep reinforcement learning (DRL) framework to enhance the safety of autonomous lane-changing in connected vehicles. Traditional DRL approaches often involve random exploration, which can lead to unsafe behaviors like collisions, particularly unsuitable for real-world, safety-critical scenarios. To address this, the authors propose the Safe-TD3 algorithm, which integrates safety considerations directly into the learning process to minimize risky maneuvers. This approach combines two main components: a convex occupancy model and domain randomization in parallel imaginary environments.
The Safe-TD3 framework evaluates safety during lane-changing by using a convex occupancy approximation to restrict action choices, reducing the risk of unsafe actions. Additionally, the framework creates a simulated environment based on domain randomization, which generates various action scenarios, allowing the model to explore potential hazards without endangering actual road safety. In this setup, Monte Carlo tree search (MCTS) further ensures that selected actions are the safest available within the given situation. The authors used the twin-delayed deep deterministic policy gradient (TD3) algorithm to train the model, and their experimental results show that Safe-TD3 enables more rapid and stable learning, with a marked reduction in collision rates compared to conventional DRL methods. This approach to integrating safety constraints demonstrates significant improvements in the efficiency and safety of autonomous lane-changing. The paper suggests Safe-TD3 as a promising method to address the challenges of reliable decision-making in connected and autonomous vehicle systems, with potential applications in mixed traffic scenarios.
本文提出了一種創新的深度強化學習(DRL)框架,用於提升聯網自動駕駛車輛在進行車道變換時的安全性。傳統的DRL方法因為依賴隨機探索,可能導致不安全的行為,例如碰撞,在真實場景中尤其不適合高安全性需求的情境。為了解決此問題,作者提出了Safe-TD3演算法,將安全性直接融入訓練過程中,以最大限度地減少高風險動作。該方法結合了凸包佔位模型來評估並限制不安全的動作,並在平行的虛擬環境中應用領域隨機化,以模擬多樣化的場景進行安全探索。Safe-TD3框架在車道變換過程中,透過凸包佔位逼近模型來評估安全性,以降低發生不安全動作的可能性。此外,它建立了一個基於領域隨機化的模擬環境,生成各種動作場景,使模型在不影響真實道路安全的情況下探索潛在風險。在此設定下,蒙地卡羅樹搜索(MCTS)進一步協助識別最安全的動作。研究採用雙延遲深度確定性策略梯度(TD3)演算法進行模型訓練,實驗結果顯示,Safe-TD3相比於傳統DRL方法,顯著降低了碰撞率,同時提升了學習的速度和穩定性。透過將安全限制直接整合於學習過程中,Safe-TD3展現了在自動車道變換效率和可靠性方面的顯著改進,特別在混合交通場景中具有廣泛的應用潛力。
Obinna Agbodike, Weijin Zhang, Jenhui Chen*, and Lei Wang, "A Face and Body-shape Integration Model for Cloth-Changing Person Re-Identification," Image and Vision Computing, vol. 140, p. 104843, December 2023. (SCI, IF: 4.7, Rank: 20/108 (18.519% (Q1), 2022) in Computer Science & Software Engineering) DOI: 10.1016/j.imavis.2023.104843
Among the existing deep learning-based person re-identification (ReID) methods, human parsing based on semantic segmentation is the most promising solution for ReID because such models can learn to identify fine-grained details of different body parts or apparel of a target semantically. However, intra-class variations such as illumination changes, multi-pose angles, and cloth-changing (CC) across different non-overlapping camera viewpoints present a crucial challenge for this approach. Among these challenges, a person CC is the most distinctive problem for ReID models, which often fail to associate the target in new cloth against the learned feature semantics of the previous cloth worn in a different timeline. In this paper, we propose a face and body-shape integration (FBI) network as a tactical solution to address the long-term person CC-ReID problem. The FBI comprises hierarchically stacked parsing and edge prediction (PEP) CNN blocks that generate fine-grained human-parsing output at the initial stage. We then aligned the PEP to our proposed model agnostic plug-in feature overlay module (FOM) to mask cloth-relevant body attributes except the facial features pooled from the input sample. Thus, our human parsing PEP and FOM modules are attuned to discriminatively learn cloth-irrelevant features of the target pedestrian(s) to optimize the effectiveness of person ReID in solitary or minimally crowded areas. In our extensive person CC-ReID experiments, our FBI model achieves 83.4/61.8 in R1 and 91.7/65.8 in mAP evaluation results on the PRCC and LTCC datasets, respectively; thereby significantly out-competing several previous state-of-the-art ReID methods, and validating the effectiveness of the FBI.
在現有的基於深度學習的行人重識別(ReID)方法中,基於語義分割的人體解析是最有前途的ReID解決方案,因為此類模型可以學習從語義上識別目標不同身體部位或服裝的細粒度細節。然而,不同非重疊攝影機視點之間的類內變化(例如照明變化、多姿勢角度和布料更換 (CC))對這種方法提出了嚴峻的挑戰。在這些挑戰中,人員 CC 是 ReID 模型最獨特的問題,它通常無法將新衣服中的目標與不同時間線中所穿的先前衣服的學習特徵語義相關聯。在本文中,我們提出了一種臉部和體型整合(FBI)網路作為解決長期人員 CC-ReID 問題的戰術解決方案。FBI 包含分層堆疊的解析和邊緣預測 (PEP) CNN 模組,可在初始階段產生細粒度的人類解析輸出。然後,我們將 PEP 與我們提出的與模型無關的插件特徵覆蓋模組 (FOM) 對齊,以掩蓋除從輸入樣本中匯集的面部特徵之外的與布料相關的身體屬性。因此,我們的人體解析 PEP 和 FOM 模組能夠有區別地學習目標行人的與服裝無關的特徵,以優化行人 ReID 在孤獨或最少擁擠區域中的有效性。在我們廣泛的人體 CC-ReID 實驗中,我們的 FBI 模型在 PRCC 和 LTCC 資料集上的 R1 評估結果分別達到 83.4/61.8 和 mAP 評估結果 91.7/65.8;從而顯著超越了先前幾種最先進的 ReID 方法,並驗證了 FBI 的有效性。
Muhammad Ibrar, Lei Wang*, Gabriel-Miro Muntean, Jenhui Chen*, Nadir Shah, and Aamir Akbar, "IHSF: An Intelligent Solution for Improved Performance of Reliable and Time-sensitive Flows in Hybrid SDN-based FC IoT Systems," IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3130-3142, March 2021. (SCI, IF: 10.238, Rank: 9/164 (5.49% (Q1), 2021) in Computer Science, Information Systems) DOI: 10.1109/JIOT.2020.3024560
2022 Best Journal Paper Award, TACC
The integration of software-defined networking (SDN) into legacy networks causes both operational and deployment issues. In this context, this article proposes a novel approach, called An Intelligent Solution for Improved Performance of Reliable and Time-sensitive Flows in hybrid SDN-based fog computing IoT systems (IHSF). The proposed IHSF approach has three solutions: 1) a novel algorithm to deploy SDN switches between legacy switches to improve network observability; 2) a K -nearest neighbor regression algorithm to predict in real time the reliability of legacy links at the SDN controller based on historic data; this enables the SDN controller to make timely decisions, improving system performance; and 3) a reliable and time-sensitive deep deterministic policy gradient algorithm (RT-DDPG), which optimally computes forwarding paths in hybrid SDN-F for time-critical traffic flows generated by IoT applications. The simulation results show that our proposed IHSF solution has a better performance than the existing approach in terms of network observability time, number of disturbed flows, end-to-end delay, and packet delivery ratio.
將軟體定義網路 (SDN) 整合到傳統網路中會導致營運和部署問題。在此背景下,本文提出了一種新穎的方法,稱為「提高基於 SDN 的混合霧運算物聯網系統 (IHSF) 中可靠且時間敏感流性能的智慧解決方案」。所提出的 IHSF 方法有三種解決方案:1)一種在傳統交換器之間部署 SDN 交換器的新穎演算法,以提高網路可觀測性;2)一個K- 最近鄰迴歸演算法,根據歷史資料即時預測 SDN 控制器上遺留連結的可靠性;這使得SDN控制器能夠及時做出決策,提高系統效能;3) 可靠且時間敏感的深度確定性策略梯度演算法 (RT-DDPG),可針對物聯網應用產生的時間關鍵型流量最佳化計算混合 SDN-F 中的轉送路徑。模擬結果表明,我們提出的 IHSF 解決方案在網路可觀測時間、幹擾流數量、端到端延遲和封包傳遞率方面比現有方法具有更好的性能。
Zhiqun Hu, Yujing Zhang, Hao Huang, Xiangming Wen, Obinna Agbodike, and Jenhui Chen*, "Reinforcement Learning for Energy Efficiency Improvement in UAV-BS Access Networks: A Knowledge Transfer Scheme," Engineering Applications of Artificial Intelligence, vol. 120, 105930, April 2023. (SCI, IF: 8.0, Rank: 5/90 (5.556% (Q1), 2022) in Engineering, Multidisciplinary) DOI: 10.1016/j.engappai.2023.105930
Recently the possibility of forming unmanned aerial vehicle base station (UAV-BS) network systems with energy harvesting capabilities to support persistent wireless access services for pedestrian users has been validated. Due to the need of sustaining wireless access services of the UAV-BSs, we investigate an optimal policy to maximize the overall energy utilization efficiency (renewable energy) of the UAV-BSs during their active in-flight network access operations. Since the natural sources of renewable energy (e.g., solar energy or wind energy harvesting) have stochastic properties with respect to the arrival rate of the dynamics of the unknown environment, we exploit an actor–critic reinforcement learning framework, which considers the continuous-valued states and action space for learning the best policy during interaction with the environment. To enhance and expedite the learning process, a transfer asynchronous advantage actor–critic (TA3C) algorithm is proposed, which enables UAV-BSs to transfer (i.e., share) knowledge gained in historical periods, during parallel task asynchronous executions on multiple instances of the environment. Numerical results reveal that the proposed TA3C algorithm surpasses the classic A3C and A2C algorithms in terms of throughput and optimal energy utilization efficiency.
最近,形成具有能量收集功能的無人機基地台(UAV-BS)網路系統以支援行人用戶持續無線存取服務的可能性已得到驗證。由於需要維持無人機基地台的無線存取服務,我們研究了一種最佳策略,以最大限度地提高無人機基站在主動飛行網路存取操作期間的整體能源利用效率(可再生能源)。由於再生能源(例如太陽能或風能採集)的天然來源相對於未知環境動態的到達率具有隨機特性,因此我們利用了一個行動者批評家強化學習框架,該框架考慮了連續值在與環境互動過程中學習最佳策略的狀態和行動空間。為了增強和加快學習過程,提出了一種傳輸非同步優勢參與者-批評家(TA3C)演算法,該演算法使UAV-BS 能夠在多個實例上並行任務非同步執行期間傳輸(即共享)在歷史時期獲得的知識。數值結果表明,所提出的TA3C演算法在吞吐量和最佳能量利用效率方面超越了經典的A3C和A2C演算法。