EDAthon 2021

August 06, 2021

Virtual Contest

Sponsored by Cadence; IEEE CEDA; IEEE CEDA Hong Kong Chapter

Organized by City University of Hong Kong

EDAthon is a whole-day programming contest (9:00am-3:00pm programming + 3:30pm-4:30pm seminar) that features interesting and challenging topics in Electronic Design Automation (EDA). It is also a unique opportunity to bring together talents for EDA which enables the rapid advancement in computer technology. Due to CoVID- 19, this year we will have an online virtual contest. The contest will involve solving interesting problems in the broad context of Computer-Aided Design (CAD) of integrated circuits and systems. It will emphasize on team work, problem solving skills and programming techniques for EDA applications. It is a goal of EDAthon and CEDA HK to promote EDA in Hong Kong and her neighboring regions, and to nurture the best of the next-generation students and professionals for the EDA community.

The contest is open to two-person teams of graduate students or senior undergraduate students currently full-time enrolled in a university, specializing in EDA or related areas. In the contest, there will be five problems selected from the following areas:

- - System Design and Analysis
  - Logic and High-level Design
  - Physical Design
  - Circuit Analysis
  - Emerging Technologies, e.g., DFM, Security, Biochip, Machine Learning in EDA etc.

During the contest, students will be given the problem statements and some sample test data. The answers will be judged based on their correctness under the given constraints using hidden benchmarks. Three teams winning the contest will be rewarded with trophies and cash prizes as the follows.

First Prize (one team): 5000 CNY, instructor (2000 CNY)
Second Prize (one team): 3000 CNY, instructor (1000 CNY)
Third Prize (one team): 2000 CNY, instructor (500 CNY)

IMPORTANT DATES

May 15, 2021: Call for participation released, open for enrollment emails
July 15, 2021: Registration deadline

Registration (Deadline: July 15, 2021)

Please complete the following google form for registration.

Schedule and Zoom link

Problem Descriptions

Problem 1: Global placement with a bivariate gradient-based wirelength model

Usually, the analytical global placement requires a differentiable wire-length model that is able to approximate a golden one. Unlike some conventional differentiable wire-length models such as the log-sum-exp(LSE) and weighted-average wirelength (WAWL) models. BiG (Bivariate Gradient-based wirelength model) can derive a gradient directly on the basis of any bivariate or multivariate smooth maximum (minimum) function. In this question, we expect a correct implementation of the WAWL-based BiG operator including the forward and backward pass based on the DREAMPlace framework. More specifically, in the forward pass, we compute the wirelength given pin locations; in the backward pass, we compute the gradient w.r.t. pin locations.

Reference:

[1] [DAC 2019] DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement

[2] [DAC 2019] BiG: A Bivariate Gradient-Based Wirelength Model for Analytical Circuit Placement

Problem 2: Post-Synthesis Metric Estimation by Extreme Learning Machine (ELM)

High-level synthesis (HLS) tools provide automatic conversion from C/C++/SystemC-based specification to hardware description languages like Verilog or VHDL, which has dramatically improved productivity in customized hardware design. The reports provided by HLS tools can convey important information such as expected performance, timing, resource usage, and composition of the synthesized register-transfer-level (RTL) design. However, the reported values are often inaccurate. To obtain a more accurate quality of results estimates, machine learning (ML) techniques have been applied to improve HLS tools. Extreme learning machine (ELM) represents a suite of ML techniques in which the traditional backpropagation approach is NOT needed to tune the weights of the hidden neurons. Rather, only a one-step pseudo-inverse operation is required. In this problem, we investigate the ability and potential of ELM to deal with the post-synthesis metric estimation on both regression and classification tasks using a basic single-layer feedforward network (SLNFs) based on ELM.

Reference:

[1] Huang, G. B. (2015). What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cognitive Computation, 7(3), 263-278.

[2] Dai, S., Zhou, Y., Zhang, H., Ustun, E., Young, E. F., & Zhang, Z. (2018, April). Fast and accurate estimation of quality of results in high-level synthesis with machine learning. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (pp. 129-132). IEEE. (Codes : https://github.com/cornell-zhang/quickest )

[2] Huang, et al. (2021). Machine learning for electronic design automation: A survey. arXiv preprint arXiv:2102.03357.

Problem 3: Placing a neural network on a hardware fabrics

Targeted towards accelerating computation-intensive deep learning applications, AI hardware have been adopted ubiquitously. A critical challenge naturally arises is how to maximize the on-chip resource utilization to achieve a substantial increase in FLOPS/Watt with respect to different DNN architectures. This problem of placing a neural network model on a hardware fabrics mimic a traditional floorplanning problem but with a high degree of regularity in terms of the kernel sizes and netlist, which allows people to solve the problem more easily with dynamic programming-like approach. This problem will be about placing a neural network model on a large fabrics of process elements.

Reference:

[1] B. Jiang, J. Chen, J. Liu, L. Liu, F. Wang, X. Zhang, and E. F. Young. CU.POKer: placing DNNs on wafer-scale AI accelerator with optimal kernel sizing. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD), pp.1-9. 2020.

[2] B. Li, Q. Du, D. Liu, J. Zhang, G. Chen, and H. You. Placement for wafer-scale deep learning accelerator. In IEEE/ACM Asia and South Pacific Design Automation Conference, pages 665-670, 2021.

Problem 4: Triple Modular Redundancy Formal Verification

Triple modular redundancy (TMR) is one of the most-known implementation technique to improve the reliability for the system. It becomes one sign-off step for safety implementation to verify if the circuit is implemented with TMR correctly. In this problem, you need to design an algorithm that can verify if a circuit is correctly implemented with TMR through formal verification. The algorithm answers pass if we can guarantee the function of the circuit is implemented with TMR.

Reference:

[1] Triple Modular Redundancy: MailScanner has detected a possible fraud attempt from "urldefense.com" claiming to be https://en.wikipedia.org/wiki/Triple_modular_redundancy

[2] AIGER Format: A. Biere, “The AIGER And-Inverter Graph (AIG) format,” 2007. MailScanner has detected a possible fraud attempt from "urldefense.com" claiming to be http://fmv.jku.at/aiger/FORMAT.aiger

[3] ABC Synthesis: A. Mishchenko “ABC: System for Sequential Logic Synthesis and Formal Verification.” MailScanner has detected a possible fraud attempt from "urldefense.com" claiming to be https://github.com/berkeley-abc/abc

Problem 5: Neural architectural search for RRAM-based AI accelerator

RRAM-based in-memory computing is a promising technique to accelerate future AI. It employs a matrix of RRAM, or tunable resistors, to perform vector-matrix multiplications using Ohm’s law and Kirchhoff’s law. (e.g. Fig. 7a of Ref. [1]) However, mapping neural networks to a memristor array involves partition the array into non-overlapping sub-arrays, where each sub-array implements either one convolutional layer or a dense layer. [2] How to maximize the network performance on a given dataset subject to this hardware constraint is a tough optimization question. Neural architectural search (NAS) provides an efficient way to figure out networks that maximize the performance of a task. [3] So in this problem, you are encouraged to employ NAS to design a convolutional neural network for a N×N (e.g. N=256 gives 64k weights, you can ignore differential pairs in representing negative weights at your wish.) RRAM array to classify MNIST dataset. [Notes: Feel free to pooling, batch normalization, or dropout layers that do not involve RRAM arrays.]

Reference:

[1] Wang, Z., Wu, H., Burr, G.W. et al. Resistive switching materials for information processing. Nat Rev Mater 5, 173–195 (2020). https://doi.org/10.1038/s41578-019-0159-3

[2] Wang, Z., Li, C., Lin, P. et al. In situ training of feed-forward and recurrent convolutional memristor networks. Nat Mach Intell 1, 434–442 (2019). https://doi.org/10.1038/s42256-019-0089-1

[3] Zhang, L. L., Yang, Y., Jiang, Y., Fast Hardware-Aware Neural Architecture Search, https://arxiv.org/abs/1910.11609

Problem 6: CGRA Scheduling with Multi-ALU PEs

Note: Our problem releases the mapping problem from many constraints in conventional CGRA by introducing the concept of multi-ALU PE. The core of our problem is how to map a DFG on an FPGA so it will not require any background of CGRA.

Description: Scheduling the dataflow operations on an array of directly connected simple configurable processing elements (CPEs) can share computational resources among the operations while getting rid of the long runtime of FPGA synthesis, placement, and routing.

As the density of FPGA devices keeps increasing, soft Coarse-Grained Reconfigurable Arrays (CGRA) with relatively complex CPEs become a feasible solution to computation-intensive applications. Compared to existing CGRAs, where a PE usually includes only one ALU and a small register file, here we allow PEs in 2-D CGRA to have a given number of ALUs and the PEs are interconnected with a light-weighted NoC. Accordingly, the flexibility and scale of the CGRA operation scheduling are raised.

In these scenarios, execution latency of a given dataflow graph, utilization ratio of PEs, and communication congestion in the NoC should be considered as evaluation metrics.

Reference:

[1] Lin, C., & So, H. K. H. (2012). Energy-efficient dataflow computations on FPGAs using application-specific coarse-grain architecture synthesis. ACM SIGARCH Computer Architecture News, 40(5), 58-63.

[2] Liu, C., Ng, H. C., & So, H. K. H. (2015, December). QuickDough: A rapid FPGA loop accelerator design framework using soft CGRA overlay. In 2015 International Conference on Field Programmable Technology (FPT) (pp. 56-63). IEEE.

[3] Karunaratne, M., Mohite, A. K., Mitra, T., & Peh, L. S. (2017, June). Hycube: A cgra with reconfigurable single-cycle multi-hop interconnect. In Proceedings of the 54th Annual Design Automation Conference 2017 (pp. 1-6).