数学物理学报 ›› 2010, Vol. 30 ›› Issue (5): 1364-1376.

• 论文 • 上一篇    下一篇

数据流的非平稳性度量——纪念李国平院士吴新谋教授诞辰100周年

丁义明*|范文涛|谭秋衡|吴克坤**|邹永杰   

  1. 中国科学院武汉物理与数学研究所 武汉 430071;中国科学院武汉物理与数学研究所 武汉 430071|中国科学院研究生院 北京 100049;中国科学院武汉物理与数学研究所 武汉 430071|中国科学院研究生院 北京 100049;武汉理工大学理学院 武汉 430070
  • 收稿日期:2010-10-08 出版日期:2010-10-25 发布日期:2010-10-25
  • 通讯作者: **现地址:香港城市大学管理科学系,香港 E-mail:ding@wipm.ac.cn
  • 基金资助:

    国家自然科学基金(Nos.70571079, 60534080)资助

Nonstationarity Measure of Data Stream

 DING Yi-Ming, FAN Wen-Tao, TAN Qiu-Heng, WU Ke-Kun, ZOU Yong-Jie   

  1. Wuhan Institute of Physics and Mathematics, The Chinese Academy of Sciences, Wuhan 430071|Wuhan Institute of Physics and Mathematics, The Chinese Academy of Sciences, Wuhan 430071|Graduate School of the Chinese Academy of Sciences, The Chinese Academy of Sciences, Beijing 100049|School of Sciences, |Wuhan University of Technology, Wuhan 430070
  • Received:2010-10-08 Online:2010-10-25 Published:2010-10-25
  • Supported by:

    国家自然科学基金(Nos.70571079, 60534080)资助

摘要:

该文融合遍历论、粗粒化方法和信息论的观点研究数据流的非平稳性度量问题. 引入了数据流的非平稳性度量的概念, 给出了数据流非平稳性度量的有效的近似算法. 数据流的非平稳性度量为$0$和$1$之间的实数,平稳性较好的数据流的非平稳性度量较小. 作者将数据流的非平稳性度量应用到模型选择问题中,提出残差序列非平稳性度量最小化的模型选择标准. 作者用数值试验检验了该文提出的数据流非平稳性度量的近似算法, 并检验了其作为模型选择标准的能力.数值试验的结果表明, 非平稳性度量是衡量数据流非平稳程度的一个合理指标, 可以很好地区分趋势平稳数据和差分平稳数据, 区分独立同分布序列、白噪声序列和鞅差序列.

关键词: 数据分析, 非平稳性度量, 模型选择, 稳定集合, Shannon信息熵

Abstract:

We study the nonstationarity measure for data streams by integration ideas from ergodic theory, coarse grain and information theory. We introduce nonstationarity measure for data streams. An effective approximation algorithm is designed for implementation. The nonstationarity measure is a real number between 0 and 1. The nonstationarity measure is smaller for a more stationary data stream. We apply the nonstationarity measure to model selection, and propose a criterion for model selection which requires least nonstationarity measure for residual sequence. Numerical experiments are performed to test our approximation algorithm and to validate the least nonstationarity measure as a criterion for model selection. The numerical results indicate that the nonstationarity measure is a sound index to compare the level of
nonstationarity among data streams. By comparing the nonstationarity measure, we can distinguish trend-stationary process and difference-stationary process effectively, and discern i.i.d. sequence, white noise sequence and martingale difference sequence.

Key words: Data analysis, Nonstationarity measure, Model selection, Stable set, Shannon entropy

中图分类号: 

  • 62M10