Preliminary:
Every graph-based application (such as edge-cloud processing in IoT systems; accelerating the execution of deep learning frameworks on GPUs; processing astronomical observations; or scheduling medical appointments
) consists of multiple tasks with inter-task data dependencies (i.e., each task generates inputs for specific other tasks) that can be represented with a Directed Acyclic Graph (which we refer to as a "Task Graph"). An input job for an application is completed once all the tasks are executed by machines according to the inter-task dependencies.
Every task must be executed on a single machine within a heterogeneous distributed computing environment (which we refer to as the "Network"). Regarding the network, each machine (i.e., compute node) possesses its own execution speed. Additionally, the communication delay between pairs of machines is determined by the communication bandwidth available between them.
The problem of "task scheduling" has to do with determining what machine to execute a task for all tasks, given the task graph and the network information.
Efficient task scheduling plays a crucial role in improving the utilization of computing resources and reducing the required time to execute tasks, as well as leading to significant profits for service providers.
Almost all of the scheduling schemes work well only in relatively small settings; once a task graph becomes large or extremely large, they require very long computation times. It is anticipated that applications in many domains, such as IoT for smart cities or GPU acceleration, will result in increasingly complex applications with numerous interdependent tasks, and scheduling may need to be repeated quite frequently in the presence of network or resource dynamics. Therefore, designing a faster method to schedule tasks for such large-scale task graphs is essential.
We propose the GCNScheduler, the first graph convolutional network-based scheduler, which carefully integrates the inter-task data dependency structure and the computational network into a single input graph. The GCNScheduler can efficiently schedule tasks of complex applications for any given objective, such as the makespan (the time it takes to run all tasks, i.e., the finishing time for the last task), throughput, etc.
The GitHub repository of the implementation of the GCNScheduler is here.
The following is the implementation of the GCNScheduler over a dynamic network with four machines where the execution speed of machines and the communication bandwidth between machines vary over time. The task graph is shown in the left figure, where the task assignment to (compute) nodes is illustrated with red, green, blue, and purple colors. The middle figure shows the network at a specific time with the communication bandwidths written on the edges. Finally, the right figure demonstrates the makespan over time.
The 1st Place Winner of the 2nd Student Design Competition on Networked Computing on the Edge - 2022
2022 Best Poster - Honorable Mentioned by Ming Hsieh Department of Electrical and Computer Engineering
Summary:
Proposing a rigorous hybrid model-and-data-driven approach to risk scoring based on a time-varying SIR epidemic model that ultimately yields a simplified color-coded risk level for each community. The risk score Γ𝑡 is proportional to the probability of someone currently healthy getting infected in the next 24 hours based on their locality. This work explains how this risk score can be estimated using another useful metric of infection spread, 𝑅𝑡, the time-varying average reproduction number, which indicates the average number of individuals an infected person would infect in turn. The proposed approach also allows for the quantification of uncertainty in the estimates of 𝑅𝑡 and Γ𝑡 in the form of confidence intervals.
Here is the demo of the color-coded risk level for each community of Los Angeles County over time:
Here is the Risk Score for LA County over time:
Here is the Reproductive Rate (R_t) for 8 Select Communities of LA County based on Case Data with 14-day Moving Average:
For more information regarding this project, please visit our website. For more information regarding the paper, please check our paper. Please see our GitHub repository, which contains open-source releases of our software, data, and these plots. The codes are open-sourced here.
Achievements of this work:
Winning the 1st place in the 2020 International Covid-19 Computational Challenge-City of Los Angeles & RMDS Lab.
Steven's Center for Innovations at the University of Southern California has licensed this work.