*Facts: a repeatable observation or measurement that is confirmed in scientific research, which might cause significant impact to the real world;
*Methods: by which we can obtain facts in general;
*Laws: based on which we develop methods.
Potentially Identified Subgroups (in clinical trials)
Motivation: to address the inconsistency issue when the boundary of candidate subgroup is not zero measure.
Method: add a shift to the estimated boundary so that the underlying boundary could be covered in high probability. Our proposed method could achieve full efficiency as long as it is possible.
Application: except from the regular cases, in practice it can work well in those nonregular cases where the data is discretized, or where there exists homogeneity in the target population. Randomized shift can also help with privacy protection.
Post-hoc Identified Subgroups (in observational studies)
Motivation: when candidate subgroups are identified based on the full data set, the selection bias is unavoidable, and it gets even worse in the nonregular cases.
Method: (i) for regular case, we propose not to perturb the estimated score, only the influence functions are perturbed; (ii) for nonregular cases, we propose a subsampling technique with the estimated score fixed. By adaptively implementation we can handle the general case without knowing whether it is regular. In the regular case, our method works as good as the oracle. In the nonregualr case, we can deliver a valid conditional inference.
Application: when the sample size is large (which is usually assumed in observational studies), our proposed method works well and can address the selection bias issue whenever it is regular or not.
Inference of the Optimal Mean Outcome of Treatment Regimes with Possibly Non-unique Optimal Decision Rule (Under Exceptional Laws)
Motivation: when there is a subpopulation (with positive proportion) where the individulized treatment effects are all zero, any estimated optimal decision rule would be inconsistent, which induces bias issue. Many approaches are proposed to address the bias issue and keep improving the statistical efficiency. We would like to know where is the end of such efficiency improvement.
Method: (i) we find current state-of-the-art estimator can be embedded in a more general class, in which the decision rule is continuous; (ii) in this class, the asymptotic standard deviation could be uniquely minimized; (iii) we propose an adaptive smoothing estimator to achieve this optimal efficiency; (iv) The achieved efficiency is the SAME as the semiparametric efficiency bound of our interested value here. In other words, the behavior of our proposed estimator is asymptotically optimal, in the sense that NO "regular" estimator can improve the efficiency.
Application: to evaluate the efficacy of OTR, real data analysis also shows that we can get the shortest confidence interval.
A Model-free Equivalent Rank Test by Sequential Bootstrap
Motivation: the asymptotic distribution of two arm bandit process can be obtained through nonlinear limit theory, which has a sharper/flatter density function than normal distribution (depending on whether the outcome of two arm bandit is maximized/minimized in the stochastic process).
Method: (i) the classic bootstrap procedure treats the series of bootstrap estimators as exchangeable version, while the developed sequential bootstrap breaks the structure and proposes using a two-armed bandit (TAB) process to strategically integrate the estimators; (ii) compared with the classical normal distribution based test statistics, the distribution of our proposed test statistic has a sharper density function under the null, and a flatter density under the alternative, thereby increasing the testing power.
Application: bioequivalence test, to test whether a novel medicinal product can be recommended equivalently for the narrow safety margin compared with the original product.
On the Spontaneous Elimination of Selection Bias under Selective Reporting
Motivation: once I ran a simulation about the empirical coverage of the two sided level-95% confidence interval of the maximum effect of two subgroups, I surprisingly found that even though I had NOT considered any techniques addressing the winner's curse (a special case of selection bias), the empirical coverage is nearly 95% (the same accuracy for other significance levels). But according to the classical theory about selection bias, the test statistic selected based on the same data should suffer from selection bias. What caused this phenomenon?
Conclusion: by some computations in calculus, we found that for two asymptotic normal estimated effects, concerns about selection bias is not necessary when (i) the difference between two means dominates the variances; (ii) the difference between two means is dominated by the variances; (iii) when the two variances are asymptotically equivalent.
Application: (i) much alleviate the concern of existing findings (no need to redo the analysis); (ii) unlike most existing approaches for maximum inference, it could deliver sharp inference without knowing ANY information about the covariance, which could address the case with censored data; (iii) could help with the Kidney Exchange (able to assess condition of the best kidney with randomly only one of the two kidneys checked for each patient).
Integrative Analysis of Site-Specific Parameters with Nuisance Parameters on the Common Support
Motivation: for multi-site learning task in precision medicine, people tend to choose the same model across different sites. We consider the problem about how to consistently select this common support set.
Method: (i) we consider zero norm penalization, while the treatment effect is not penalized; (ii) define groups of features, each contains one feature across all the sites; (iii) adjust the existing polynomial time zero norm based set selection algorithm to adapt to our model.
Application: integrate the data from different hospitals.
Control the energy consumption by controlling environmental factors (e.g. temperature and humidity)
Initial goal: design feedback system which controls the energy consumption by manipulating the temperature and humidity.
Methods: (i) estimation of the energy consumption based on given temperature and humidity; (ii) inference on the maximum/minimum individualized treatment effect of a selected subgroup; (iii) using parametric model to partition the covariate space, by which we shall figure out how the subgroup and its effect would change if we adjust certain covariates.
Further action: collect more covariates.