# Short Bio

I am Yiheng Shen. My Chinese name is 沈毅恒(Shen Yiheng). I am currently a Ph. D. student at Duke University. My advisor is Kamesh Munagala. My research interests lies in the interface of economics and computation, especially mechanism design and algorithmic game theory.

Prior to joining Duke, I received my bachelor's degree at Institute for Interdisciplinary Information Sciences (Yao Class), Tsinghua University. My advisor in Tsinghua was Pingzhong Tang.

In the Spring of 2020, I was a visiting student in the Harvard EconCS group. My advisor was Yiling Chen. We conducted researches in Data Purchasing mechanisms and information advertising. In the Summer of 2019, I was a research intern in The University of Hong Kong. My advisor was Zhiyi Huang. We did researches on sample complexity in designing auctions.

Contact

E-mail: ys341@duke.edu

Mobile:

(+1) 984 259 8508 (US)

(+86) 189 303 53859 (China)

# Working Papers

Approximate Core for Committee Selection via Multilinear Extension and Market Clearing. Kamesh Munagala, Yiheng Shen, Kangning Wang, Zhiyi Wang. [Working Paper]

It is well-known that an exact core need not exist even when utility functions of the voters are additive across candidates. We therefore relax the problem to allow \emph{approximation}: Voters can only deviate to the blocking committee if after they choose any extra candidate (called an additament), their utility still increases by an $\alpha$ factor. If no blocking committee exists under this definition, we call this an $\alpha$-core. Our main result is that an $\alpha$-core, for $\alpha < 67.37$, always exists when utilities of the voters are arbitrary monotone submodular functions, and this can be computed in polynomial time. This result improves to $\alpha < 9.27$ for additive utilities, albeit without the polynomial time guarantee. %and to $\alpha < 15.2$ for this case that runs in polynomial time. Our results are a significant improvement over prior work that only shows logarithmic approximations for the case of additive utilities. We complement our results with a lower bound of $\alpha > 1.015$ for submodular utilities, and a lower bound of any function in the number of voters and candidates for general monotone utilities.

Selling Information via Pooling Advertising. Yiling Chen, Yiheng Shen, Shuran Zheng. [Working Paper]

We study the problem of promoting an information product by revealing some free partial information. We consider a long-term, credible information platform that sells a certain type of information, which is valuable to some decision makers. The decision makers have their own observations and thus can hold different personal beliefs about the information product. The platform can provide some free partial information about the product to change people's valuations so that the overall profit can possibly be increased. In this work, we consider an information product that can be represented by a real number. When selling an information product, the platform first reveals the range of the number in a predefined way and then charges a price for the full revelation of the information. Two types of revealing methods will be considered: (1) the revealed range is a single interval, (2) the revealed range consists of multiple intervals. For the first method, we give a polynomial-time algorithm that finds the optimal mechanism when the information product is a discrete random variable with finite support. When the information product is a continuous random variable, we give an approximation algorithm that finds an $\varepsilon$-optimal mechanism within $poly(1/\varepsilon)$ time. For the second method, we show that finding the optimal multi-interval mechanism is hard in general. For some special cases when the seller targets a specific group of buyers, the optimal mechanism can be solved or approximated.

# Publication

Targeting makes Sample Efficiency in Auction Design. Yihang Hu, Zhiyi Huang, Yiheng Shen, Xiangning Wang. EC 2021. [arxiv]

This paper introduces the targeted sampling model in optimal auction design. In this model, the seller may specify a quantile interval and sample from a buyer's prior restricted to the interval. This can be interpreted as allowing the seller to, for example, examine the top 40% bids from previous buyers with the same characteristics. The targeting power is quantified with a parameter $\Delta \in [0, 1]$ which lower bounds how small the quantile intervals could be. For instance, for $n$ buyers with bounded values in $[0, 1]$, $\tilde{O}(\epsilon^{-1})$ targeted samples suffice while it is known that at least $\tilde{\Omega}(n \epsilon^{-2})$ i.i.d.\ samples are needed. In other words, targeted sampling with sufficient targeting power allows us to remove the linear dependence in $n$, and to improve the quadratic dependence in $\epsilon^{-1}$ to linear. In this work, we introduce new technical ingredients and show that the number of targeted samples sufficient for learning an $\epsilon$-optimal auction is substantially smaller than the sample complexity of i.i.d.\ samples for the full spectrum of $\Delta \in [0, 1)$. Even with only mild targeting power, i.e., whenever $\Delta = o(1)$, our targeted sample complexity upper bounds are strictly smaller than the optimal sample complexity of i.i.d.\ samples.

Truthful Data Acquisition via Peer Prediction. Yiling Chen, Yiheng Shen, Shuran Zheng. NeurIPS 2020. [arxiv]

We consider the problem of purchasing data for machine learning or statistical estimation. The data analyst has a budget to purchase datasets from multiple data providers. She does not have any test data that can be used to evaluate the collected data and can assign payments to data providers solely based on the collected datasets. We consider the problem in the standard Bayesian paradigm and in two settings: (1) data are only collected once; (2) data are collected repeatedly and each day's data are drawn independently from the same distribution. For both settings, our mechanisms guarantee that truthfully reporting one's dataset is always an equilibrium by adopting techniques from peer prediction: pay each provider the mutual information between his reported data and other providers' reported data. Depending on the data distribution, the mechanisms can also discourage misreports that would lead to inaccurate predictions. Our mechanisms also guarantee individual rationality and budget feasibility for certain underlying distributions in the first setting and for all distributions in the second setting.

# Fun & Misc

I am in favor of puzzle games and card games (esp. Gwent and Texas Hold'em).

My favorite sport is Table Tennis. I won the 2nd Prize (Man's Singles) and 1st Prize (Team) in the IIIS Table Tennis Tournament in my senior year.