Research

NeuFlow System-on-Chip

Motivation

We develop a convolution neural network (Convnet) accelerating system-on-chip, a.k.a. NeuFlow SoC, for bio-inspired vision applications in Advanced Driver Assistance Systems (ADASs) and robotic navigation. Advanced Driver Assistance Systems (ADASs) for high-end cars can alert drivers to dangerous condition such as undesired lane-change, undetected crossing pedestrian, and traffic sign. The SoC is able to mimic human-like visual tasks, such as extracting visual features from multiple objects, tracking the objects, extracting deep features, navigating, and learning object relationships. It can meet real-time requirement of vision processing, power-efficient for embeddability, and highly programmable for wide range of applications.

Accomplishment

The vision SoC design is designed based on the neuFlow computing architecture (Fig. 1) implementing Convolutional Neural Networks (ConvNets). By networking of MUX-based routers, runtime data-flow graphs can be formed to accelerate ConvNets computation. The system, prototyped in FPGA for functional validation, is capable of solving complex vision tasks for navigation such as street scene parsing.

Fig. 1: NeuFlow SoC architecture.

The SoC is successfully taped out for fabrication in IBM 45nm SOI CMOS process (Fig. 2).

Post-layout characterization shows that the implemented SoC is able to meet the real-time requirements of typical driving assistance vision tasks whilst satisfying the power budget for use in embedded systems.

Fig. 2: NeuFlow SoC layout (IBM 45nm SOI CMOS process, flip-chip packaged).

References

NeuFlow project website www.neuflow.org
Phi-Hung Pham, Darko Jelaca, Clement Farabet, Berin Martini, Yann LeCun, and Eugenio Culurciello, “NeuFlow: Dataflow Vision Processing System-on-a-Chip,” in Proc. of The 55th IEEE International Midwest Symposium on Circuits and systems (MWSCAS), Aug. 2012, Idaho, USA, pp. 1044-1047. (Invited)

On-chip Network for Coarse-grained Processor Array

Motivation

Reconfigurable computing devices accelerate system performance by combining the area/power efficiency of application-specific integrated circuits (ASICs) with high programability of traditional general-purpose processors (GPPs) (Fig. 1). A dynamic reconfigurability of those reconfigurable devices plays a key role for the system to accelerate different computational data-flow graphs (DFGs) during run time. In spectrum of run-time reconfigurable devices, coarse-grained arrays (CGAs) are more favored than field-programmable gate arrays (FPGAs) due to their quick adaptation with lower reconfiguration overhead at coarse-grained granularity. By the nature of flexible on-chip networking, NoC approach is an ideal solution for adapting different runtime graphs in CGAs. However, the challenge for NoCs in CGAs is that it must meet the requirement of device's run-time reconfigurability, and very area-efficient for integrating with CGA's compact processing elements (PEs).

Fig. 1: Reconfigurable computing bridges the gap between ASICs and traditional GPPs.

Accomplishment

This research proposes a novel on-chip network fabric (Fig. 2) for coarse-grained processor array that is capable of dynamically supporting guaranteed throughput for real-time DFGs among 64 processing elements (PEs).

Fig. 2: Coarse-grained processor array interconnected by the proposed OCN.

The silicon-proven (Fig. 3) on-chip network fabricated in a 0.13um CMOS process results in area- and power-efficiency, and suitable for integration in embedded CGA. In addition, its distributed dynamic path setup brings system scalability in fast run-time mapping of DFGs. An compact CGA interconnected by the proposed network significantly reduces the reconfiguration latency up to an order(s) of magnitude compared to those with centralized interconnect schemes.

Fig. 3: Microchip fabricated in a 0.13um CMOS process.

Figure 4 shows demonstration of basic functional testing on the prototype chip. In the test, the FPGA board loads two hex files from C-compiled code based on Window PC into the program memories of two coarse-grained processors, respectively. One processor acts as a data sending node and the other as a receiver. The receiver receives the data transferred from the on-chip network and output the captured data to its UART. The sent data coming out from UART is finally displayed on oscilloscope.

Fig. 4: On-chip network basic functional test.

References

Phi-Hung Pham, Phuong Mau, Jungmoon Kim, and Chulwoo Kim, "An On-Chip Network Fabric Supporting Coarse-Grained Processor Array" IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 10.1109/TVLSI.2011.2181546.
Phi-Hung Pham, Phuong Mau, and Chulwoo Kim, "A 64-PE Folded-Torus Intra-chip Communication Fabric for Guaranteed Throughput in Network-on-Chip Based Applications," in Proc. of IEEE Custom Integrated Circuits Conference (CICC), 2009, CA, US, pp. 645-648.
Phi-Hung Pham, Phuong Mau, and Chulwoo Kim, “A Compact 64-IP Network-on-Chip for Low-cost Multi-core Platform,” in Proc. of the 16th Korea Conference on Semiconductors (KCS), 2009, pp. 372-373. (Chip Design – ASIC Demonstration)
Phi-Hung Pham, Phuong Mau, and Chulwoo Kim, “A Compact 64-IP Network-on-Chip for Low-cost Multi-core Programmable Platform,” in Proc. of the 2008 IEEE Seoul Section – Student Paper Contest, 2008, pp. 129-132.

On-chip Permutation Network for Multiprocessor System-on-Chip

Motivation

Permutation traffic, in which each input sends traffic to exactly one output and each output receives traffic from exactly one input, is one of the most important traffic patterns exhibited from MPSoC applications. Standard permutations occur in general-purpose MPSoCs, for example, polynomial, sorting, and fast Fourier transform (FFT) computations cause shuffled permutation, whereas matrix transposes or corner-turn operations exhibit transpose permutation. Other examples, MPSoCs implementing Turbo/LDPC decoders exhibit arbitrary and concurrent traffic permutations due to multi-mode and multi-standard implementation. Moreover, as required by many MPSoC computing in real-time, guaranteeing throughput (data lossless, predictable latency, guaranteed bandwidth, and in-order delivery) for those permutation traffics is mandatory.

Most Network-on-Chips in practice are general-purpose using dimension-ordered and/or minimal adaptive routing algorithms. Application-specific NoCs are needed to achieve better performance under permutation traffics compared to general-purpose ones. However, their routing schemes are typically pre-configured and not efficiently support dynamic changing of permutation pattern during application execution. A design difficulty is how to (re-)compute the routes following dynamic change of permutation, and to guarantee the permutated dataflows in realtime. This becomes a great challenge when the permutation networks are implemented on-chip with very limited power and area budgets.

Accomplishment

In this work, we design a novel permutation network-on-chip capable of proving guaranteed throughput for arbitrary permutation patterns. Unlike conventional packet-switching approaches, the proposed network employs dynamic path-setup circuit switching in combination with a multistage network topology (Fig. 1). The dynamic path setup tackles the challenge of runtime (re-)computing the routes for conflict-free permuted dataflows. Without overhead of queuing buffers, a compact design is easily achieved and stacking multiple networks for concurrent permutations is feasible.

Fig. 1: Proposed on-chip network architecture for 16 input-to-16 output permutation.

A proof-of-concept microchip fabricated in a 0.13um CMOS process (Fig. 2) validates the efficiency of the proposed design. Experimental results show that the network-on-chip achieves 1.9x to 8.2x reduction in silicon overhead compared to other comparable designs.

Fig. 2: Microchip with tile-based layout fabricated in a 0.13um CMOS process.

References

Phi-Hung Pham, Junyong Song, Jongsun Park, and Chulwoo Kim, "Design and Implementation of an On-Chip Permutation Network for Multiprocessor System-on-Chip," IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 10.1109/TVLSI.2011.2181545.
Phi-Hung Pham, Jongsun Park, and Chulwoo Kim, "ProMINoC: An Efficient Network-on-Chip Design for Flexible Data Permutation," IEICE Electron. Express, vol. 7, no. 12, pp. 861-866, 2010.
Phi-Hung Pham, Jungmoon Kim, Junyoung Song, and Chulwoo Kim, "A Network-on-Chip Fabric for Traffic Permutation in Multi-Core Applications," in International SoC Design Conference (ISOCC), 2010. (Chip Design – Panel Demonstration)

Backtracking Wave-pipeline On-chip Router

Motivation

Many network-on-chip based systems used in hard real-time applications demand hard guaranteed throughput for its on-chip communication. It is a great challenge to design on-chip switches/routers capable of dynamically supporting (hard) guaranteed throughput, meanwhile still fit into tight constraints of on-chip power, timing and area.

Accomplishment

In this work, we proposes the architecture, design and implement a novel pipeline circuit-switching router to supporting on-chip guaranteed throughput. The proposed circuit-switched router/switch (Fig. 1), based on a backtracked probing path setup, operates with source-synchronous (wave-pipeline/direct-forwarding) scheme. The switch implementing the dynamic path-setup is dead- and livelock-free, and capable of delivering a high aggregate bandwidth with great area- and energy-efficiency.

Fig. 1: Proposed switch architecture.

A silicon prototype of 5-bidirectional-port switch in a 0.18um CMOS standard-cell technology shows that the design offers a high aggregate bandwidth while utilizing a modest silicon area (Fig. 2). Moreover, the synthesizable implementation of the switch in digital ASIC flow results in fast development time and high portability.

Fig. 2: Microchip die photo.

References

Phi-Hung Pham, Jongsun Park, Phuong Mau, and Chulwoo Kim, “Design and Implementation of Backtracking Wave-pipeline Switch to Support Guaranteed Throughput in Network-on-Chip,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 270-283, Feb. 2012.
Phi-Hung Pham, Yogendera Kumar, and Chulwoo Kim, "A Compact and High-Performance Switch for Circuit-Switched Network-on-Chip," in Proc. of IEEE International System-on-Chip Conference (SOCC), Sep. 2006, TX, US, pp. 53-56.

Circuit-switched Network-on-Chip with Dynamic Path-setup

Motivation

As complexity of Systems-on-Chips (SoCs) increases, Network-on-Chip (NoC) is adopted as viable communication-centric solution for integrating numerous on-chip components. Many NoC-based SoCs, particularly for hard real-time applications, demand hard guaranteed throughput within its on-chip communication (i.e., data lossless, predictable latency, guaranteed bandwidth, and in-order delivery). With very limited on-chip resources of silicon area and power, designing on-chip networks to support the guaranteed throughput is a great challenge.

To implement guaranteed throughput in practice, packet-switching approaches with time-division multiplexing (TDM) or virtual channels (VC) are adopted. This approach faces a difficulty in management of huge time-slot tables and/or require excessive area overhead for queuing buffers. In contrary, traditional circuit-switching approach can easily provides guaranteed throughput once circuits are set up. This approach also has advantage of low implementation overhead but it suffers from static and/or high setup latency.

Accomplishment

We propose a novel design approach of circuit switching with dynamic path-setup scheme to support on-chip guaranteed throughput. For using with the scheme, a compact switch-by-switch handshake and end-to-end flow control are specified (Fig. 1).

Fig. 1: End-to-end flow-control operation based on the switch-by-switch handshake.

The proposed backtracking path-setup scheme combined with the rich path-diversity of a given network topology alleviates the drawback of high path-setup latency in traditional circuit switching (Fig. 2). The scheme ensures a deadlock and live-lock freedom. Moreover, its distributed feature brings system scalability in terms of dynamic guaranteed circuit allocation.

Fig. 2: Path-setup latency performance of the proposed scheme (in 8x8-network size, uniform traffic).

References

Phi-Hung Pham,"Design and implementation of circuit-switched network-on-chip with dynamic path-setup scheme," Ph.D. thesis, Graduate School, Korea University, Seoul, South Korea, 2010.
Phi-Hung Pham, Jongsun Park, Phuong Mau, and Chulwoo Kim, “Design and Implementation of Backtracking Wave-pipeline Switch to Support Guaranteed Throughput in Network-on-Chip,” IEEE Trans. on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 270-283, Feb. 2012.
P.T. Hong, Phi-Hung Pham, Xuan-Tu Tran, and Chulwoo Kim, "Analysis and Evaluation of Traffic-Performance in Backtracked Routing Network-on-Chip," in Proc. of International Conference on Communications and Electronics (ICCE), 2008. pp. 13-17.
Phi-Hung Pham, Yogendera Kumar, and Chulwoo Kim, "High Performance and Area-Efficient Circuit-Switched Network on Chip Design," in Proc. of IEEE International Conference on Computer and Information Technology (CIT), Sep 2006, pp. 243-243.

IP-converged Telecommunication Testbed

Motivation

We develop a network testbed for education and research in network engineering at College of Technology, Vietnam National University, Hanoi during 2000-2003. The idea is to design a heterogeneous hierarchical telecommunication network that can mimic the real-life network with a fraction of investment cost. On the testbed, students and researchers can freely carry out experiments of integrated voice-data services, new value added services, IP-based services, doing protocol analysis, and maintenance/admin tasks inside the campus instead of relying on costly public network.

Accomplishment

The hierarchical testbed is architected with a core network and access networks (Fig. 1). The core network has ring topology and comprises of ISDN PBXs from different common vendors (Siemen, NEC, etc.). These PBXs are networked by using 2B+D/30B+D trunks with QSIC/Cornet signaling. The access networks includes various end-user equipments in practice such as PSTN phones, analog modems, ISDN voice/data access terminals.

Fig. 1: Heterogeneous telecommunication network testbed

The system design requirements covers numbering plan, signaling plan, synchronizing plan, and network services. The open testbed is also compatible with possible future upgrade of network and services. For example, upgrading core network to STM-1 link with Cisco WAN Switch IGX- 8400 (supporting ATM, FR, VoIP, etc.) or adding Wireless Access Points is available. Based on this underlying heterogenous telecommunication infrastructure, an IP-converged network is designed for communications crossing platforms (Fig. 2).

Fig. 2: IP-converged network built on top of the testbed.

The testbed has been effectively used for education and research in the field of network and telecommunication engineering at Vietnam National University, Hanoi.

References

Nguyen Kim Giao, Nguyen Thi Hong, Pham Thi Hong, and Pham Phi Hung “Design and deployment of ADSL access network in Telecommunication System Lab,” Journal of Science - Natural Sciences and Technology, Vietnam National University Hanoi, vol. XXI, No2AP, pp. 100-109, 2005.
Nguyen Kim Giao, Pham Phi Hung, and Nguyen Quoc Tuan, “Research and development of advanced communication services on integrated digital exchange,” Journal of Science - Natural Sciences and Technology, Vietnam National University Hanoi, vol. XXI, pp. 111-118, No2AP, 2005.
Nguyen Thi Hong, Nguyen Kim Giao, Pham Phi Hung, and Pham Thi Hong, “A global signaling system for corporate networking - QSIC and application in private telecommunication network,” Journal of Science - Natural Sciences and Technology, Vietnam National University Hanoi, vol. XXI, pp. 135-144, No2AP, 2005.

Page updated

Google Sites

Report abuse