Situating Big Data across Heterogeneous Data Sets of Game Data Exhaust, Class Assessment Measures, and Student Talk

This project seeks to marry theories of situated cognition to the big data movement by connecting clickstream data from technologies in isolation to key forms of multimodal data available from their contexts of use. Using a data corpus gathered from a five-day implementation of the STEM game Virulent (targeting cellular biology), we are combining multiple analytic strategies commonly considered incommensurate including educational data mining, qualitative coding, discourse analysis, natural language processing, and standard classroom assessments. In this paper, we review the project goals and preliminary findings, and discuss the benefits and drawbacks to analysis across heterogeneous data sets. Our goal is to provide a more complete model for big data analysis, one that includes both talk and play data equally or, where not possible, identify its limitations so that future “data rich” attempts on learning might be better informed by the limitations of technology-rich but talk-poor data sets.

PDF Articles

/sites/default/files/articles/Proceedings%20Articles/GLS11/40.Situating%20Big%20Data%20Across%20Heterogeneous%20Data%20Sets%20of%20Game%20Data.pdf