计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 73-78.doi: 10.11896/jsjkx.190500125

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于优化可辨识矩阵和改进差别信息树的属性约简算法

徐怡1,2,唐静昕2   

  1. (安徽大学计算智能与信号处理教育部重点实验室 合肥230039)1;
    (安徽大学计算机科学与技术学院 合肥230601)2
  • 收稿日期:2019-05-22 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 徐怡([email protected])
  • 基金资助:
    国家自然科学基金(61402005);安徽省自然科学基金(1308085QF114);安徽省高等学校省级自然科学基金(KJ2013A015)

Attribute Reduction Algorithm Based on Optimized Discernibility Matrix and Improving Discernibility Information Tree

XU Yi1,2,TANG Jing-xin2   

  1. (Key Laboratory of Intelligent Computing and Signal Processing and Ministry of Education, Anhui University, Hefei 230039, China)1;
    (College of Computer Science and Technology, Anhui University, Hefei 230601, China)2
  • Received:2019-05-22 Online:2020-03-15 Published:2020-03-30
  • About author:XU Yi,born in 1981,Ph.D, associate professor, is member of ChinaComputer Federation.Her main research interests include intelligent information processing and rough set.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China under Grant (61402005), Natural Science Foundation of Anhui Province under Grant (1308085QF114) and provincial Natural Science Foundation of Anhui higher education institute under Grant (KJ2013A015).

摘要: 运用可辨识矩阵表示信息系统中所有对象的区分信息,为研究属性约简提供了新方向。然而,传统的可辨识矩阵在构造结束后才利用核属性消除冗余元素项,忽略了核属性在矩阵构建过程中的作用。针对这一问题,文中做了以下研究:1)优化可辨识矩阵的构造方式,在计算任意两个对象的区分信息之前,先判断核属性上的取值是否相等,如果不相等,则直接将对应元素项记为ø,忽略对其他条件属性的判断;2)提出属性加权重要度的概念,综合考虑每个条件属性占可辨识矩阵中非空元素项的比率(称为宏观重要度)与每个属性对区分对象的贡献程度(称为微观重要度),并通过例子说明了该度量方法的合理性;3)针对优化后的矩阵仍然存在大量冗余元素和空集这一缺陷,结合差别信息树的概念提出基于优化可辨识矩阵和属性加权重要度的差别信息树。按照属性加权重要度对优化可辨识矩阵中所有非空元素项进行排序,使得重要度高的属性被更多的节点共享;且在构建过程中将不包含核属性的元素项映射到树中的一条路径上,而包含核属性的元素项则被直接忽略。最后,提出基于优化可辨识矩阵和改进差别信息树的约简算法HSDI-tree。在UCI的5个数据集上分别比较了HSDI-tree算法与CDI-tree,DI-tree和IDI-tree算法的约简结果和节点个数,实验结果表明HSDI-tree算法能有效找到最小属性约简且空间压缩能力更好。

关键词: 差别信息树, 粗糙集, 可辨识矩阵, 属性约简, 属性重要度

Abstract: Discernibility matrix expresses the distinguishing information of all objects in the information system with matrix elements,which provides a new idea for attribute reduction.However,the traditional discernibility matrix uses the core attributes to eliminate redundant element items after the construction is finished,ignoring the role of the core attributes in the matrix construction process.In response to this problem,the following research is done.Firstly,the definition of the discernibility matrix is optimized.Before calculating the distinguishing information of any two objects,it is first determined whether the values on the core attributes are equal.If not,the corresponding element items are directly recorded as ø,and the judgment of other attributes is ignored.Secondly,the concept of attribute weighted importance is proposed.The ratio of each condition attribute to the non-empty element term in the discernibility matrix (called macro importance) and the contribution of each attribute to the distinguishing object (called micro Importance) are comprehensively considered,and the rationality of the measurement method is illustrated by an example.Thirdly,aiming at the disadvantages that there are a lot of redundant elements and empty sets in the optimized discernibility matrix,by combining the concept of discernibility information tree,discernibility information tree based on optimized discernibility matrix and attribute weighted importance is proposed.All non-empty element items in the optimized discernibility matrix are sorted according to attribute weighted importance,so that attributes with high importance are shared by more nodes.Element items that do not contain core attributes are mapped to a path in the tree during the build process,while element items that contain core attributes are ignored.Finally,a reduction algorithm HSDI-tree based on optimized discernibility matrix and improving discernibility information tree is proposed.This paper compared the reduction results and the number of nodes of the HSDI-tree algorithm,CDI-tree,DI-tree and IDI-tree algorithms on the five data sets of UCI.The experimental results show that the HSDI-tree algorithm can effectively find the minimum attribute reduction and has better space compression ability.

Key words: Attribute importance, Attribute reduction, Discernibility information tree, Discernibility matrix, Rough set

中图分类号: 

  • TP181
[1]PAWLAK Z.Rough set[J].International Journal of Information and Computer Science,1982,11(5):341-356.
[2]SHEN Q,JENSEN R.Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring[J].Pattern Recognition,2004,37(7):1351-1363.
[3]WEN S D,BAO Q H.A fast heuristic attribute reduction approach to ordered decision systems[J].European Journal ofOpe-rational Research,2018,264:440-452.
[4]ZHENG J G,YAN R X.Attribute Reduction Based on Cross Entropy in Rough Set Theory[J].Journal of Information & ComputationalScience,2012,9(3):745-750.
[5]WEN S D,BAO Q H.A fast heuristic attribute reduction approach to ordered decision systems[J].European Journal of Operational Research,2018,264:440-452.
[6]MAJI P,PAUL S.Rough Set Based Maximum Relevance-Maximum Significance Criterion and Gene Selection from Microarray Data[J].International Journal of Approximate Reasoning,2011,152:408-420.
[7]ELEYAN D.Ant Colony Optimization based Feature Selection in Rough Set Theory[J].International Journal of Computer Science and Electronics Engineering,2013,1(2):244-247.
[8]SKOWRON A,RAUSZER C.The discernibility matrices and functions in information systems [M]∥Intelligent Decision Support-handbook of Applicationsand Advances of the Rough Sets Theory.Dordrecht:Kluwer Academic Publisher,1991:331-362.
[9]YAO Y Y,ZHAO Y.Discernibility matrix simplification for constructing attribute reducts[J].Information Sciences,2009,179(5):867-882.
[10]PARTHALÁIN,NEIL M,SHEN Q,et al. distance measure approach to exploring the rough set boundary region for attribute reduction[J].IEEE Transactions on Knowledge and Data Engi- neering,2010,22(3):305-317.
[11]YAMAGUCHI D.Atribute dependency functions considering data Efficiency[J].International Journal of Approximate Reasoning,2009,51(1):89-98.
[12]FELIX R,USHIO T.Rough sets-based machine learning using a binary discernibility matrix[C]∥Proceeding of 2nd InternationalConference on Inteligent Procesing and Manufacturing of Materials.Ha-wai,1999:299-305.
[13]QIAN W B,XU Z Y,HUANG L Y,et al.Atribution reduction algorithm based on binary discernibility matrix of information entropy[J].Computer Enginering and Applications,2010,46(6):120-123.
[14]ZHI T Y,MIAO D Q.The Binary Discernibility Matrix’s Transformation and High Efficiency Attributes Reduction Algorith m’s Conformation[J].Computer Science,2002,29(2):140-143.
[15]JIANG Y.Attribute reduction with rough set based on discerni- bility information tree[J].Control & Decision,2015,30(8):1531-1536.
[16]PAWLAK Z,SKOWRON A.Rudiments of rough sets[J].Information Sciences,2007,177:3-27.
[17]PAWLAK Z.Rough Sets:Theoretical Aspects of Reasoning About Data[M].Boston:Kluwer Academic Publishers,1991.
[18]JIANG Y.Attribute reduction with rough set based on Improving discernibility information tree[J].Control & Decision,2019,34(6):135-140.
[19]YANG L,ZHANG X Y,XU W H.Attribute reduction of discernibility information tree in interval-valued ordered information system[J].Journal of Frontiers of Computer Science & Technology,2019(6):1062-1069.
[20]JIANG Y,YU Y.Minimal attribute reduction with rough set based on compactness discernibility information tree[J].Soft Computing,2016,20:2233-2243.
[1] 程富豪, 徐泰华, 陈建军, 宋晶晶, 杨习贝.
基于顶点粒k步搜索和粗糙集的强连通分量挖掘算法
Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory
计算机科学, 2022, 49(8): 97-107. https://doi.org/10.11896/jsjkx.210700202
[2] 许思雨, 秦克云.
基于剩余格的模糊粗糙集的拓扑性质
Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices
计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123
[3] 方连花, 林玉梅, 吴伟志.
随机多尺度序决策系统的最优尺度选择
Optimal Scale Selection in Random Multi-scale Ordered Decision Systems
计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067
[4] 陈于思, 艾志华, 张清华.
基于三角不等式判定和局部策略的高效邻域覆盖模型
Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy
计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302
[5] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[6] 王子茵, 李磊军, 米据生, 李美争, 解滨.
基于误分代价的变精度模糊粗糙集属性约简
Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost
计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211
[7] 王志成, 高灿, 邢金明.
一种基于正域的三支近似约简
Three-way Approximate Reduction Based on Positive Region
计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067
[8] 薛占熬, 侯昊东, 孙冰心, 姚守倩.
带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法
Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes
计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042
[9] 李艳, 范斌, 郭劼, 林梓源, 赵曌.
基于k-原型聚类和粗糙集的属性约简方法
Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets
计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053
[10] 薛占熬, 孙冰心, 侯昊东, 荆萌萌.
基于多粒度粗糙直觉犹豫模糊集的最优粒度选择方法
Optimal Granulation Selection Method Based on Multi-granulation Rough Intuitionistic Hesitant Fuzzy Sets
计算机科学, 2021, 48(10): 98-106. https://doi.org/10.11896/jsjkx.200800074
[11] 王霞, 彭致华, 李俊余, 吴伟志.
一种基于概念可辨识矩阵的概念约简方法
Method of Concept Reduction Based on Concept Discernibility Matrix
计算机科学, 2021, 48(1): 125-130. https://doi.org/10.11896/jsjkx.200800013
[12] 曾惠坤, 米据生, 李仲玲.
形式背景中概念及约简的动态更新方法
Dynamic Updating Method of Concepts and Reduction in Formal Context
计算机科学, 2021, 48(1): 131-135. https://doi.org/10.11896/jsjkx.200800018
[13] 薛占熬, 张敏, 赵丽平, 李永祥.
集对优势关系下多粒度决策粗糙集的可变三支决策模型
Variable Three-way Decision Model of Multi-granulation Decision Rough Sets Under Set-pair Dominance Relation
计算机科学, 2021, 48(1): 157-166. https://doi.org/10.11896/jsjkx.191200175
[14] 桑彬彬, 杨留中, 陈红梅, 王生武.
优势关系粗糙集增量属性约简算法
Incremental Attribute Reduction Algorithm in Dominance-based Rough Set
计算机科学, 2020, 47(8): 137-143. https://doi.org/10.11896/jsjkx.190700188
[15] 陈玉金, 徐吉辉, 史佳辉, 刘宇.
基于直觉犹豫模糊集的三支决策模型及其应用
Three-way Decision Models Based on Intuitionistic Hesitant Fuzzy Sets and Its Applications
计算机科学, 2020, 47(8): 144-150. https://doi.org/10.11896/jsjkx.190800041
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!