Keynote Speech: June 29, 2013

Overcoming Big Data Challenges

Professor Taghi M. Khoshgoftaar
Dept. of Computer & Electrical Engineering & Computer Science
Florida Atlantic University, USA


Due to the influx of data across a wide variety of application domains, Big Data has become a central topic in data science research. Big Data provides many opportunities to learn key insights which can only be found from large collections of data, but also poses unique challenges when practitioners are faced with data characteristics which are much more difficult to address on large-scale data. For example, high-dimensionality (having a large number of independent attributes or features) can occur at multiple scales of data mining, but extremely large datasets are not amenable to some traditional approaches. In addition, data imbalance (having many more instances in one class than in other classes) can be especially challenging when large datasets make oversampling infeasible. Finally, one oft-overlooked challenge -- datasets which are inherently difficult to learn from -- is even more difficult to handle with extremely large quantities of noise. In this paper, we will discuss these problems in the context of one important Big Data application domain -- bioinformatics -- and present our work on addressing these challenges, though using techniques designed to solve these problems while operating efficiently and returning meaningful results even when faced with Big Data.

About the Speaker:

Dr. Taghi M. Khoshgoftaar is a professor of the Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University and the Director of the Data Mining and Machine Learning Laboratory, and Empirical Software Engineering Laboratory. His research interests are in big data analytics, data mining and machine learning, health informatics and bioinformatics, and software engineering. He has published more than 500 refereed journal and conference papers in these areas. He was the conference chair of the IEEE International Conference on Machine Learning and Applications (ICMLA 2012). He is the workshop chair of the IEEE IRI Health Informatics workshop (2013). He is the Editor-in Chief of the Big Data journal. He has served on organizing and technical program committees of various international conferences, symposia, and workshops. Also, he has served as North American Editor of the Software Quality Journal, and was on the editorial boards of the journals Multimedia Tools and Applications, Knowledge and Information Systems, and Empirical Software Engineering and is on the editorial boards of the journals Software Quality, Software Engineering and Knowledge Engineering, Fuzzy Systems, and Social Network Analysis and Mining.