科研費シンポジウム
「ベイズ統計の理論と応用」
科研費シンポジウム 「ベイズ統計の理論と応用」
日時:2022年3月18日(金) 10:00〜16:45
(zoomによるオンライン開催)
参加登録: https://docs.google.com/forms/d/e/1FAIpQLSc7aKAEFwdRUwIBVi-VaWEc_vNKupqe6q5jU5Fd5dHM_35mcQ/viewform
(参加登録後に当日のzoomアドレスが自動送信されます。)
開催責任者:菅澤 翔之助(東京大学)、入江 薫(東京大学)、橋本 真太郎 (広島大学)、小林弦矢 (千葉大学)、中川智之 (東京理科大学)
*本シンポジウムは、以下のプロジェクトの支援を受けて開催されます。
科学研究費補助金 基盤研究(B) 21H00699
「大規模データに対するベイズモデリングの新展開」 (研究代表者: 菅澤翔之助)
プログラム
10:00-11:45 セッション1 (座長: 菅澤 翔之助)
「Tree boosting for learning probability measures」
粟屋 直 (Duke大学)
「Exact minimax Bayesian predictive synthesis」
マクリン 謙一郎 (Temple大学)
「Semiparametric Bayesian instrumental variables estimation for nonignorable missing instruments」
加藤 諒 (神戸大学)
11:45-13:00 お昼休憩
13:00-14:45 セッション2 (座長: 入江 薫)
「周辺尤度最大化による認知診断モデルのQ行列推定」
岡田 謙介(東京大学)
「Measuring regional economic uncertainty」
中島 上智 (日本銀行)
「Finding influential users by topic in unstructured user-generated content」
五十嵐 未来(筑波大学)
14:45-15:00 休憩
15:00-16:45 セッション3 (座長: 中川 智之)
「Haar-Weave-Metropolis kernel」
宋 小林 (大阪大学)
「分布的ロバストな確率尺度に対する最大化問題のためのベイズ最適化手法」
稲津 佑 (名古屋工業大学)
「逐次モンテカルロ法のベイズ統計学への応用」
米倉 頌人 (千葉大学)
講演概要
「Tree boosting for learning probability measures」
粟屋 直 (Duke大学)
Learning probability measures based on an i.i.d. sample is a fundamental inference task, but is challenging when the sample space is high-dimensional. Inspired by the success of tree boosting in high-dimensional classification and regression, we propose a tree boosting method for learning high-dimensional probability distributions. We formulate concepts of "addition" and "residuals" on probability distributions in terms of compositions of a new, more general notion of multivariate cumulative distribution functions (CDFs) than classical CDFs. This then gives rise to a simple boosting algorithm based on forward-stagewise (FS) fitting of an additive ensemble of measures, which sequentially minimizes the entropy loss. The output of the FS algorithm allows analytic computation of the probability density function for the fitted distribution. It also provides an exact simulator for drawing independent Monte Carlo samples from the fitted measure. Typical considerations in applying boosting--namely choosing the number of trees, setting the appropriate level of shrinkage/regularization in the weak learner, and the evaluation of variable importance--can all be accomplished in an analogous fashion to traditional boosting in supervised learning. Numerical experiments confirm that boosting can substantially improve the fit to multivariate distributions compared to the state-of-the-art single-tree learner and is computationally efficient.
「Exact minimax Bayesian predictive synthesis」
マクリン 謙一郎 (Temple大学)
We analyze the combination of multiple predictive densities for time series data when all forecasts are misspecified. To evaluate the theoretical properties of ensemble methods, we develop a novel theoretical strategy based on stochastic processes. Using this framework, we define the Kullback-Leibler risk for non-stationary time series data, providing a metric to compare predictive performances. We show that a specific dynamic form of Bayesian predictive synthesis– a general and coherent Bayesian framework for ensemble methods– produces exact minimax predictive densities under this risk, providing theoretical support for finite sample predictive performance over existing ensemble methods. A simulation study that highlights the theoretical result is presented. We show, through a high-dimensional economic application, that exact minimax Bayesian predictive synthesis provides superior predictive performance over other ensemble and selection methods.
「Semiparametric Bayesian instrumental variables estimation for nonignorable missing instruments」
加藤 諒 (神戸大学)
This paper considers the case where instrumental variable (IV) are available to infer the effect of interested variable to the outcome (or the causal effect), but some components of IV are missing with the missing mechanism of not missing at random (NMAR). Although NMAR requires the analysis to prespecify the missing mechanism, it is unknown for us and what is worse, it is generally not identified. We use the IV distribution of original population as an auxiliary information, and show that missing mechanism can be represented as identifiable nonparametric generalized additive model. We also introduce MCMC algorithm that impute the missing values and simultaneously estimate parameters of interested.
「周辺尤度最大化による認知診断モデルのQ行列推定」
岡田 謙介(東京大学)
認知診断モデル(cognitive diagnostic models)は診断分類モデル(diagnostic classification models)とも呼ばれる、心理・教育測定分野で応用される制約付き潜在クラスモデルの一種である。その主要な目的は、解答者の診断テスト解答データに基づき、二値変数である認知的スキル習得状態を推定し、得られた診断情報を学習改善に役立てることである。このモデル上、テスト項目と学習要素の間の関係を規定するQ行列は、モデル構造の一部として事前に設定済みであることが仮定される。しかし、これは強い仮定である。そこで本研究は、変分ベイズ推定と確率的最適化の考え方を導入することによって、高い計算効率で周辺尤度を最大化するQ行列推定を実現する方法を提案する。なお本発表はOka & Okada (2021, arXiv; https://arxiv.org/abs/2105.09495 )に基づくものである。
「Measuring regional economic uncertainty」
中島 上智 (日本銀行)
This paper proposes an econometric framework for measuring the time-varying uncertainty of regional economic activity. A dynamic factor model with stochastic volatility is exploited to forecast the regional economic activity, and the uncertainty is defined as the conditional volatility of forecasted errors in the model. The framework is illustrated using Japan's regional economic data. We provide its application to climate-change analysis and show that irregular climate events significantly impact the uncertainty of economic activity.
「Finding influential users by topic in unstructured user-generated content」
五十嵐 未来 (筑波大学)
Social media users generate a variety of content, which has the potential to change their followers’ behavior and preferences, i.e., social influence. However, since most parts of user-generated content on social media are unstructured, such as text and images, it is difficult to infer how individuals affect others’ generating processes of unstructured data by using the existing models, which suppose the numerical component of the behavioral data. In this study, the authors develop an approach to determine influential users who have remarkable effects on other users’ interest in the topic of generated content. The empirical analysis applying the proposed model to user-generated image content shows that the proposed model outperforms the conventional topic models in terms of both predictive accuracy and topic interpretability. Moreover, the authors also demonstrate that visualization of the estimated social influence on the network can provide insightful implications for social media marketing through finding influential users for each topic of the content.
「Haar-Weave-Metropolis kernel」
小林 宋 (大阪大学)
Recently, many Markov chain Monte Carlo methods have been developed with deterministic reversible transform proposals inspired by the Hamiltonian Monte Carlo method. The deterministic transform is relatively easy to reconcile with the local information (gradient etc.) of the target distribution. However, as the ergodic theory suggests, these deterministic proposal methods seem to be incompatible with robustness and lead to poor convergence, especially in the case of target distributions with heavy tails. On the other hand, the Markov kernel using the Haar measure is relatively robust since it learns global information about the target distribution introducing global parameters. However, it requires a density preserving condition, and many deterministic proposals break this condition. In this paper, we carefully select deterministic transforms that preserve the structure and create a Markov kernel, the Weave-Metropolis kernel, using the deterministic transforms. By combining with the aar measure, we also introduce the Haar-Weave-Metropolis kernel. In this way, the Markov kernel can employ the local information of the target distribution using the deterministic proposal, and thanks to the Haar measure, it can employ the global information of the target distribution. Finally, we show through numerical experiments that the performance of the proposed method is superior to other methods in terms of effective sample size and mean square jump distance per second.
「分布的ロバストな確率尺度に対する最大化問題のためのベイズ最適化手法」
稲津 佑 (名古屋工業大学)
製造業をはじめとする様々な実応用において, 評価コストが高いブラックボックス関数がコントロール可能な変数 (デザイン変数)とコントロールできない確率変数 (環境変数)の両方を含むケースは多い. こうした状況下で最適なデザイン変数を決定するためには, まず初めに環境変数の影響を取り除いてデザイン変数の良さを定量化する必要がある.
デザイン変数の良さの尺度のひとつとして, 分布的ロバストな確率尺度と呼ばれるものがあり, 環境変数の分布が未知であっても定義できるという点で有用である.本発表では, 分布的ロバストな確率尺度を最大とするデザイン変数を効率的に発見するためのベイズ最適化手法について紹介する.
「逐次モンテカルロ法のベイズ統計学への応用」
米倉 頌人 (千葉大学)
逐次モンテカルロ法(SMC)とは,その名の通りモンテカルロ積分を逐次的に行う手法の総称である.この講演では特にベイズ統計学で生ずる諸問題をFeynman-Kac modelとして統一的に捉え,その近似としてSMCのアルゴリズムを解説する.またSMCによって構成された推定量がもついくつかの重要な性質についても述べ,時間が許せば近年の発展についても触れる.