报告题目:A fast and accurate estimator for large scale linear model via data averaging
报告人:许王莉教授(中国人民大学)
报告时间:2022年6月13日上午9:00-11:00
报告地点:腾讯会议ID:396-918-659
报告摘要: This work is concerned with the estimation problem of linear model when the sample size is extremely large and the data dimension can vary with the sample size. In this setting, the least square estimator based on full data is not feasible with limited computational resources. Many existing methods for this problem are based on sketching technique. We derive fine-grained lower bounds of the conditional mean squared error for sketching methods. For sampling methods, our lower bound provides an attainable optimal convergence rate. We propose a new sketching method based on data averaging. The proposed method reduces the original data to a few averaged observations. These averaged observations still satisfy the linear model and are used to estimate the regression coefficients. The asymptotic behavior of the proposed estimation procedure is studied. Our theoretical results show that the proposed method can achieve a faster convergence rate than the optimal convergence rate for sampling methods. Theoretical and numerical results show that the proposed estimator has good statistical performance as well as low computational cost.
报告人简介:许王莉,中国人民大学明理书院副院长,统计学教授,博士生导师,近年来一直从事模型拟合优度检验,高维数据分析,随机缺失数据,两阶段抽样数据以及医学分析等方面的统计推断研究。先后承担了教育部新世纪优秀人才支持计划、4项国家自然科学基金、教育部人文社会科学重点研究基地重大项目,北京市自然科学基金重点项目等多项科研课题,在统计学国际一流期刊(包括顶级期刊)发表论文70余篇,并在科学出版社合作出版《非参数蒙特卡洛检验及其应用》和独著《缺失数据的模型检验及其应用》。

