Ghosts, goblins, gore, oh my!

Futuristic Graphics for decoration only.

Now that I have your attention, remember, that Allhallowtide has passed for this year.

Ghost of Learning Analytics Yet to Come
Created by DALL-E via Bing by my request

Data Collection

Data collection presents a set of challenges. In educational settings, there is no lack of data available, rather the problem could be too much data. What data is relevant to what the goal is? What data pertains to the task at hand? How is the data obtained?

While there is such a thing as “bad data” it is my belief that most data are not bad as much as it is either “dirty” or irrelevant (gory?).

Dirty data (wrong format, in the wrong category/field/row/column, misspelled, etc.) can be cleaned up. While this data cleansing can be time-consuming, if the data is relevant then it is a worthwhile process to get the data standardized and/or normalized so that it can turn into information.

Irrelevant data is problematic in that it tends to gum up the works and may even start as dirty data that then consumes many resources to clean before it is found to be irrelevant. This irrelevant data can bring a large amount of error into the equation which can then easily take a project off course and arrive at conclusions that make no sense. Identifying this data as early in the process and disregarding it is crucial to the success of the predictive model.

Cross-Validation

Cross-validation is used to ensure that your predictive model works accurately based on the available data set. No, it does not keep vampires away.

Taking a subset of the existing data ensures that the model is customized for this application and not just a random algorithm. Sampling from within the dataset allows for not using the entire dataset to build and train the predictive model and allows for the remaining data to be used for verification and validation (V&V). Once the model is trained and then validated, it can be a trusted resource for analyzing the data at hand.

The Ghost of Education Yet to Come

“Before I draw nearer to that stone to which you point,” said Scrooge, “answer me one question.
Are these the shadows of the things that Will be, or are they shadows of things that May be, only?” (Dickens)

“The Last of the Spirits” — The Pointing Finger
John Leech
1843

Predictions in education should be true in the absence of intervention and that prediction is part one with intervention to address a shortcoming or potential failure by the student is part two of the process and “purpose of education” according to Andrew Ho’s comments within Brooks and Thompson’s chapter (Brooks & Thompson, 2022). Brooks and Thompson take a slightly different approach in that while they agree with Ho that there should be prediction and intervention in education, it could be that the intervention is used to direct the limited resources of the educational institution toward the best use of those resources by policy. And that could mean giving further support to potentially successful students at the expense of those predicted to fail.

I would venture to think more along the lines of Scrooge. Is this destiny inevitable or is this destiny adaptable via course corrections we make based on the prediction analysis? Ever the optimist, I believe the future is never guaranteed, it may be highly probable, but not yet set in stone.

Implications/Applications

Data science begets data engineering. One consequence of gathering all the data and attempting to run analysis on it is the myriad of types, styles, and amounts and quality of the data. As such, there needs to be something or someone that can take the data, integrate it, and make it suitable for the analysis that is to be performed on it to turn it into information. This process is sometimes referred to as “massaging” or “manipulating” the data to make it ready to use. This process is more formally known as Data Engineering as the data is being engineered into a more useful format. For those who are not computer engineers or computer programmers but know what they want, but maybe not exactly how to present it to the data scientists, the availability of a data engineer would be ideal. The data engineer would not necessarily have to run the analysis, but they would be able to get the data into that “clean” or compatible format necessary to run the analysis.

Surprise!

The authors’ calling out the “who’s got better accuracies” one-upmanship in academic publications. This was a bit surprising to me as I read through this. I thought that to be a bit of a tangent (trick or treat?!). I can understand why they took the time to make the comment, I just thought it a bit misplaced to put it here.

That’s new!

The authors brought up the Weka software platform (https://www.cs.waikato.ac.nz/~ml/weka/) and introduced it as an alternative to R and Python for data mining. With my recent introduction to RapidMiner and its commercial nature, it is good to see this platform is a free and GNU open-source solution.

Not this week, but maybe a review of Weka (the software, not the bird) is an option for a future post.

 

References

Brooks, C., & Thompson, C. (2022). Predictive Modelling in Teaching and Learning. In C. Lang, G. Siemens, A. F. Wise, D. Gašević, & A. Merceron (Eds.), Handbook of Learning Analytics (pp. 29-37). Society for Learning Analytics Research (SoLAR).

Dickens, C. (n.d.). A Christmas Carol.

 

 

Comments are closed.