Speech Technologies and Learning

Back to Primers

Authors: Cynthia D’Angelo and Chad Dorsey
Printer-Friendly PDF | Google Doc for Comments | Questions? Contact CIRCL.

Overview

The classroom learning environment is filled with speech in all forms—classroom discourse has notably been called “the language of learning.” However, learning mediated by speech remains complex and daunting to investigate at any meaningful scale. Characterizing and evaluating processes from collaboration to argumentation to engagement currently demands analysis of mountains of audio and video data.

Today, speech recognition and analysis has come into its own as a highly advanced, data-intensive technological and engineering field, and some fruits of this work have become publicly visible via impressive tools such as Siri, Amazon Echo, and others. However, advances in speech technology remain underutilized by the education research community.

Speech activity detection (SAD) and diarization can be used to detect turn-taking, keywords, questions, total words spoken, and overlapping speech. See Webinar: Exploring the Promise of Speech Technology for Education Research.

Speech and language technologies (SLT) have reached the point where they offer untapped potential across a broad range of natural teaching and learning settings. These advanced applications, fueled mostly by DARPA projects and other national efforts, have made incredible strides toward recognition and classification of a wide set of features of spoken language. Current techniques to capture and characterize human speech and dialogue are well developed and in active use within many fields and applications. Advanced microphone technologies make extended audio capture effective and reliable, established noise-reduction and processing algorithms aid in automatic identification of speakers, and reliable algorithms enable automatic detection of everything from spoken questions and answers to emotion, sentiment, and use of specific content keywords.

Combining these capabilities with education research goals could unlock many possibilities. Targeted convergence research—consisting of the application of speech technologies for the capture, analysis, processing, and reporting on educational activities—could eventually encompass significant, wide-ranging applications. With support from and NSF-funded capacity-building grant, Chad Dorsey and Cynthia D’Angelo led focus group meetings with 24 education and speech researchers that resulted in the following visions for speech-enabled educational activities:

  • A “Fitbit for teaching” that would provide teachers with an overview of their dialogic activity within different classes, including quantifications of conversational turns, numbers of questions asked or initiated by students, and classifications of the teacher’s response and verbal guidance types.
  • A process for establishing speech-based learning analytics for collaboration, using the speech of small groups of students to determine the quality of each group’s collaboration and report to both teachers and researchers about collaboration quality and group dynamics.
  • A process for automatically capturing and analyzing student discourse for argumentation indicators, including software that can identify and auto-extract a concentrated “highlight reel” of argumentation-rich instances across multiple small-group discussions.
  • A means of performing longitudinal analysis of students and classrooms at scale, capturing weeks’ worth of voice data and processing it to look for variables otherwise out of reach because of human analysis constraints, such as how the use of domain-specific keywords evolve over time throughout a unit of instruction or the presence and implication of long-term variables such as instructional styles and average emotional character and sentiment of individuals and classrooms.

These goals are lofty, and many of them are years or even decades away. They will remain even farther out of reach, however, if the right imaginations and community-building activities are not properly engaged across the fields of speech engineering and educational research and the right data repositories and tools are not constructed.

A handful of projects and research groups are working on merging SLT and education research. These interdisciplinary partnerships have begun tapping into the potential of this genre of work, but there is much left to do.

Next »