|Table of Contents|

Reasearch into Data Cleaning Algorithm Based on Interval FuzzyMatching Functions and Its Application to Questionnaire Data(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2017年03期
Page:
70-
Research Field:
计算机工程
Publishing date:

Info

Title:
Reasearch into Data Cleaning Algorithm Based on Interval FuzzyMatching Functions and Its Application to Questionnaire Data
Author(s):
Mi Yunlong1Li Jinhai2Mi Chunqiao13Liu Wenqi2Liu Jia1Wang Tian3
(1.School of Computer Science and Engineering,Huaihua University,Huaihua 418000,China)(2.Faculty of Science,Kunming University of Science and Technology,Kunming 650500,China)(3.Hunan Provincial Key Laboratory of Ecological Agriculture Intelligent Control Technology,Huaihua 418000,China)
Keywords:
data cleaningmatching functioninterval-valued fuzzy setinterval-valued fuzzy matching functionquestionnaire data
PACS:
TP311
DOI:
10.3969/j.issn.1672-1292.2017.03.011
Abstract:
Data cleaning is a very important step to ensure data quality. The real-world data often has some unreasonable data even error because of human activites usually with subjectivity and emotionality,such as the questionare data. However,there are some difficulties to process data cleaning due to these unreasonable data often being uncertainty,ambiguity and hidding. For this type of data,the traditional data cleaning methods have difficulty in handling the unreasonable data. Therefore,by combining the basic theories of interval-valued fuzzy set and mathcing function,we propose an interval fuzzy matching function method. Based on this method we construct a new algorithm to clean data and improve data quality,and then apply it to questionaire data. Experiments show that our algorithm have a good precision and running efficiency,and that it is adaptable to process the unreasonable data.

References:

[1] KUMAR R,CHADRASEKARAN D R. Attribute correction-data cleaning using association rule and clustering methods[J]. International journal of data mining and knowledge management process,2011,1(2):22-32.
[2]RAHM E,HONG H D. Data cleaning:problems and current approaches[J]. IEEE data engineering bulletin,2000,23(4):3-13.
[3]GARDEZI J,BERTOSSI L,KIRINGA I. Matching dependencies:semantics and query answering[J]. Frontiers of computer science,2012,6(3):278-292.
[4]LOW W L,LEE M L,LING T W. A knowledge-based approach for duplicate elimination in data cleaning[J]. Information systems,2001,26(8):585-606.
[5]FAN W,JIA X,LI J,et al. Reasoning about record matching rules[J]. Proceedings of the VLDB endowment,2010,2(1):407-418.
[6]FAN W,MA S,TANG N,et al. Interaction between record matching and data repairing[J]. Journal of data and information quality,2014,4(4):1-38.
[7]BERTOSSI L,KOLAHI S,LAKSHMANAN L V S. Data cleaning and query answering with matching dependencies and matching functions[J]. Theory of computing systems,2013,52(3):441-482.
[8]GRAHAM J W. Missing data analysis:making it work in the real world[J]. Annual review of psychology,2009,60:549-576.
[9]WENG C H,CHEN Y L. Mining fuzzy association rules from uncertain data[J]. Knowledge and information systems,2010,23(2):129-152.
[10]CHANG S E,CHANGCHIEN S W,HUANG R H. Assessing users’ product-specific knowledge for personalization in electronic commerce[J]. Expert systems with applications,2006,30(4):682-693.
[11]DOHERTY N,ELLIS-CHADWICK C F,HART C. An analysis of the factors affecting the adoption of the Internet in the UK retail sector[J]. Journal of business research,2003,56(11):887-897.
[12]CHEN Y L,WENG C H. Mining fuzzy association rules from questionnaire data[J]. Knowledge-based systems,2009,22(1):46-56.
[13]MARSHALL G. The purpose,design and administration of a questionnaire for data collection[J]. Radiography,2005,11(2):131-136.
[14]BURTON S H,MORRIS R G,GIRAUD-CARRIER C G,et al. Mining useful association rules from questionnaire data[J]. Intelligent data analysis,2014,18(3):479-494.
[15]YAMANISHI K,LI H. Mining open answers in questionnaire data[J]. IEEE intelligent systems,2002,17(5):58-63.
[16]BROECK J V D,CUNNINGHAM S A,EECKELS R,et al. Data cleaning:detecting,diagnosing,and editing data abnormalities[J]. Plos medicine,2005,2(10):e267.
[17]BOYNTON P M. Administering,analysing,and reporting your questionnaire[J]. BMJ,2004,328(7 452):1 372-1 375.
[18]SAMBUC R. Fonctions and floues:application a l’aide au diagnostic en pathologie thyroidienne[D]. Marseille:University of Marseille,1975.
[19]ZADEH L A. The concept of a linguistic variable and its application to approximate reasoning[J]. Information sciences,1975,8(3):199-249.
[20]SANZ J,FERNáNDEZ A,BUSTINCE H,et al. A genetic tuning to improve the performance of Fuzzy Rule-Based Classification Systems with Interval-Valued Fuzzy Sets:Degree of ignorance and lateral position[J]. International journal of approximate reasoning,2011,52(6):751-766.
[21]DESCHRIJVER G. Triangular norms which are meet-morphisms in interval-valued fuzzy set theory[J]. Fuzzy sets and systems,2008,181(1):88-101.
[22]WU Z G,SHI P,SU H,et al. Network-based robust passive control for fuzzy systems with randomly occurring uncertainties[J]. IEEE transactions on fuzzy systems,2013,21(5):966-971.
[23]ZHANG H,YAN H,YANG F,et al. Quantized control design for impulsive fuzzy networked systems[J]. IEEE transactions on fuzzy systems,2011,19(6):1 153-1 162.
[24]ATANASSOV K. Interval valued intuitionistic fuzzy sets[J]. Fuzzy sets and systems,1989,31(3):343-349.
[25]曾文艺,李洪兴,施煜. 区间值模糊集合的分解定理[J]. 北京师范大学学报(自然科学版),2003,39(2):171-177.
ZENG W Y,LI H X,SHI Y. Decomposition theorem of interval-value fuzzy sets[J]. Journal of Beijing normal university(natural science),2003,39(2):171-177.(in Chinese)
[26]金澈清,刘辉平,周傲英. 基于函数依赖与条件约束的数据修复方法[J]. 软件学报,2016,27(7):1 671-1 684.
JIN C Q,LIU H P,ZHOU A Y. Functional dependency and conditional constraints based data repair[J]. Journal of software,2016,27(7):1 671-1 684.(in Chinese)
[27]钟评,李战怀,陈群. 关系数据中函数依赖检测方法[J]. 计算机学报,2017,40(1):207-222.
ZHONG P,LI Z H,CHEN Q. A functional dependecies checking method in relational data[J]. Chinese journal of computers,2017,40(1):207-222.(in Chinese)
[28]ZADEH L A. Fuzzy sets[J]. Information and control,1965,8(3):338-353.
[29]刘文奇. 中国公共数据库数据质量控制模型体系及实证[J]. 中国科学:信息科学,2014,44(7):836-856.
LIU W Q. Modeling data quality control system for Chinese public database and its empirical analysis[J]. Scientia sinica(informationis),2014,44(7):836-856.(in Chinese)

Memo

Memo:
-
Last Update: 2017-09-30