Educational Data Mining and Learning Analytics

Back to Primers

Authors: Mimi Recker, Andrew Krumm, Mingyu Feng, Shuchi Grover, Ken Koedinger
Printer-Friendly PDF | Google Doc for Comment | Questions? Contact CIRCL

Overview


(See Learning Analytics for Assessment in the Community Report)

Educational data mining (EDM) is the use of multiple analytical techniques to better understand relationships, structure, patterns, and causal pathways within complex datasets. Learning Analytics (LA) is a closely related endeavor, with somewhat more emphasis on simultaneously investigating automatically collected data along with human observation of the teaching and learning context. Overall, cyberlearning emphasizes the integration of learning sciences theories with these techniques in order to improve the design of learning systems and to better understand how people learn within them.

Educational systems are increasingly engineered to capture and store data on users’ interactions with a system. These data (e.g., big data, system log data, trace data) can be analyzed using statistical, machine learning, and data mining techniques. The development of computational tools for data analysis, standardization of data logging formats, and increased computation/processing power is enabling learning scientists to investigate research questions using this data (Baker & Siemens, in press).

Research goals which EDM/LA can address include:

  1. Predicting students’ future learning by creating models that incorporate information such as students’ knowledge, behavior, motivation, and attitudes.
  2. Discovering or improving models that characterize the subject matter to be learned (e.g. math, science, etc.), identify fruitful pedagogical sequences, and suggest how these sequences might be adapted to students’ needs.
  3. Studying the effects of varied pedagogical enhancements on student learning.
  4. Advancing scientific knowledge about learning and learners through building models of learning processes that incorporate data about students, teachers, understanding of subject matter, pedagogies, and principles from learning sciences.
  5. Supporting learning for all students by adapting learning resources to fit the particular needs identified, including adaptations for individual students when warranted.

In addition, researchers are expanding EDM/LA to new frontiers, such as studying learning in constructionist research where the lack of formal structure in learning environments (such as games and maker spaces) make traditional assessments difficult to implement. Another new frontier for EDM/LA is understanding collaboration in formal and informal learning environments.

Large scale use of learning management systems, games, virtual worlds, augmented reality, simulations, and constructionist spaces in learning, as well as the emergence of online open learning materials (such as Khan Academy) and courseware (including MOOCs) has fueled research in EDM/LA. The NSF-funded Pittsburgh Science of Learning Center (PSLC) or ‘LearnLab’ has spearheaded key research in this field in the past decade. The PSLC Datashop is an important resource serving as a central repository to secure and store research data and provide a set of analysis and reporting tools. Early work in EDM/LA by the PLSC team (Koedinger, Corbett, and others) was conducted in the context of Intelligent Tutoring Systems. The cognitive models of learning they used and developed (drawing on earlier work by John Anderson) have contributed to understanding the design of adaptive, data-rich learning systems, especially in STEM subjects. Other noteworthy efforts include (among others) the development of tools and techniques for mining data and making inferences about non-cognitive aspects of learning (Ryan Baker and colleagues); growing an understanding of conversation analytics (Carolyn Rose’s group at CMU); analytics in games (Constance Steinkuehler and Kurt Squire; Taylor Martin and colleagues); LA to serve teacher needs (Mimi Recker et al.); studying collaborative processes and social learning analytics (Dan Suthers; Simon Buckingham Shum; and others); and multi-model learning analytics in constructionist spaces (Paulo Blikstein and colleagues).

Next »