Research

Multi-armed Bandit Learning

- The multi-armed bandit (MAB) framework is an important and popular problem in Reinforcement Learning (RL). This problem captures the trade-off between exploration and exploitation in making decisions in the face of uncertainty. The examples of related questions are: Which drugs should be given to a patient? Which version of a website or an ad will return the most revenue? The MAB framework has recently received a lot of attention and used in practice by big companies, and it is growing fast. A learning agent sequentially takes actions, observes rewards and aims to maximize the total reward over a period of time. The multi-armed bandit problem has been extensively studied under the stationary assumption. However in reality, this assumption often does not hold because the distributions of rewards themselves may change over time. In this work, we propose a change-detection (CD) based framework for multi-armed bandit problems under the piecewise-stationary setting, and study a class of change-detection based UCB (Upper Confidence Bound) policies, CD-UCB, that actively detects change points and restarts the UCB indices. We then develop CUSUM-UCB and PHT-UCB, that belong to the CD-UCB class and use cumulative sum (CUSUM) and Page-Hinkley Test (PHT) to detect changes. We show that CUSUM-UCB obtains the best known regret upper bound under mild assumptions. We also demonstrate the regret reduction of the CD-UCB policies over arbitrary Bernoulli rewards and Yahoo! datasets of webpage click-through rates.

Context-aware Mobile Computing / Networking

Today, most people use mobile devices for various reasons. We can do many things such as calling, messaging, photographing and gaming through mobile applications. Even now, many people feel frustrated to the fastly decreased battery lifetime by using lots of mobile applications. How can we solve this problem? We find that context-aware application scheduling can be the solution. If we know the next application what we will use in several contexts, we can preload that application or unload other applications to extend the battery lifetime with the minimal launch delay. We will consider how to use contexts of users to find the best probability of guessing next application which will be used correctly.

Cloud-assisted Compression

In this work, we raise a question on why the abundant information previously shared between a server and its client is not effectively utilized in the exchange of a new data which may be highly correlated with the shared data. We formulate this question as an encoding problem that is applicable to general data synchronization services including a wide range of Internet services such as cloud data synchronization, web browsing, messaging, and even data streaming. To this problem, we propose a new encoding technique, SyncCoding that maximally replaces subsets of the data to be transmitted with the coordinates pointing to the matching subsets included in the set of relevant shared data, called references. SyncCoding can be easily integrated into a transport layer protocol such as HTTP and enables significant reduction of network traffic. Our experimental evaluations of SyncCoding implemented in Linux shows that it outperforms existing popular encoding techniques, Brotli, LZMA, Deflate, and Deduplication in two practical use networking applications: cloud data sharing and web browsing.