Description

Data Mining with Rattle is a unique course that instructs with respect to both the concepts of data mining, as well as to the “hands-on” use of a popular, contemporary data mining software tool, “Data Miner,” also known as the ‘Rattle’ package in R software. Rattle is a popular GUI-based software tool which ‘fits on top of’ R software. The course focuses on life-cycle issues, processes, and tasks related to supporting a ‘cradle-to-grave’ data mining project. These include: data exploration and visualization; testing data for random variable family characteristics and distributional assumptions; transforming data by scale or by data type; performing cluster analyses; creating, analyzing and interpreting association rules; and creating and evaluating predictive models that may utilize: regression; generalized linear modeling (GLMs); decision trees; recursive partitioning; random forests; boosting; and/or support vector machine (SVM) paradigms. It is both a conceptual and a practical course as it teaches and instructs about data mining, and provides ample demonstrations of conducting data mining tasks using the Rattle R package. The course is ideal for undergraduate students seeking to master additional ‘in-demand’ analytical job skills to offer a prospective employer. The course is also suitable for graduate students seeking to learn a variety of techniques useful to analyze research data. Finally, the course is useful for practicing quantitative analysis professionals who seek to acquire and master a wider set of useful job skills and knowledge. The course topics are scheduled in 10 distinct topics, each of which should be the focus of study for a course participant in a separate week per section topic.