Guangyi Zhang

Hi! I'm Guangyi Zhang (张广怡). 

See my new homepage: https://www.scholat.com/guangyizhang.

My research interests broadly lie in algorithms with provable approximation guarantees for data mining. 

Selected topics include submodular optimization, explainable rule-based systems, active learning, diversity maximization, and graph mining.

Currently, I pursue my curiosity at Shenzhen Technology University (深圳技术大学).

Email (guangyizhang.jan AT gmail.com) and Google Scholar

Education

Awards

Publications

While I love all my papers, papers 3,5,7 are the closest to my heart.

7. Zhang, Guangyi, Nikolaj Tatti, and and Aristides Gionis. "Finding favourite tuples on data streams with provably few comparisons." Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023. (KDD '23, 313/1416 = 22.1%) arXiv:2307.02946. code.

We propose a single-pass streaming algorithm to find personalized high-quality tuples among a database of n d-dimension tuples by adaptively making O(log n) pairwise comparisons. This is the first non-trivial worst-case guarantee for more than a decade!

6. Zhang, Guangyi, Nikolaj Tatti, and Aristides Gionis. "Ranking with submodular functions on the fly." Proceedings of the 2023 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2023. (SDM '23) arXiv:2301.06787. code.

Best paper award (among 459 submissions), kudos my great co-authors!

This is a continuation of our previous DAMI '22 paper. We study the problem of ranking with multiple submodular functions in two different streaming settings, item-arriving and function-arriving settings. We also propose multiple novel applications for them.

5. Zhang, Guangyi, Nikolaj Tatti, and Aristides Gionis. "Coresets remembered and items forgotten: submodular maximization with deletions." 22nd IEEE International Conference on Data Mining, 2022. (ICDM '22) arXiv:2203.01241. code.

Accepted as a full paper (9.77%).

In a streaming setting, how to maximize a non-decreasing submodular function in the presence of unexpected item deletion? Remembering all items and re-computing a new solution would help, but can we do better? We show that it suffices to remember a small deletion-robust coreset, and we are the first to find such a coreset with the (asymptotically) smallest possible size. For a deletion of d items, the coreset size is only O(d).

4. Zhang, Guangyi, and Aristides Gionis. "Regularized impurity reduction: Accurate decision trees with complexity guarantees." Data Mining and Knowledge Discovery (2022). (DAMI '22, Journal Track of ECML PKDD 2022) link. arXiv:2208.10949. code.

Do popular greedy algorithms such as C4.5 and CART produce "small" and more interpretable trees? We prove such a complexity guarantee if the greedy algorithm is guided by a regularized impurity function.

3. Zhang, Guangyi, Nikolaj Tatti, and Aristides Gionis. "Ranking with submodular functions on a budget." Data Mining and Knowledge Discovery (2022). (DAMI '22, Journal Track of ECML PKDD 2022) link. arXiv:2204.04168. code .

How to find a ranking of items that simultaneously maximizes a sum of multiple non-decreasing submodular functions, each of which is associated with a different budget? We devise a novel 4-approximation algorithm that returns the best between a greedy solution and a dynamic program.

2. Zhang, Guangyi, and Aristides Gionis. "Diverse Rule Sets." Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020. (KDD '20) arXiv:2006.09890. code.

How to find a set of decision rules that "overlap" less, so that the classification decision made on each example is unambiguous? Try the maximum-diversity approach proposed by us, which can be optimized provably.

1. Zhang, Guangyi, and Aristides Gionis. "Maximizing diversity over clustered data." Proceedings of the 2020 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2020. (SDM '20) arXiv:2001.03050. code.

We study the classic Max-Sum diversification problem over (potentially overlapping) clustered input. The goal is to maximize the sum of intra-cluster diversity. Existing algorithms fall short and we propose a novel 6-approximation algorithm.

Services

Teaching:


PC Member:

Subreviewer: WebConf 2022, ICML 2022, ICDM 2022, CIKM 2023