Category Archives: talks attended

Getting to the truth, the ground truth, and nothing but the ground truth.

Takeaways for learning from HCOMP 2019, Part 2

At HCOMP 2019, there was a lot of information about machine learning that I found relevant to building educational technology. Surprisingly to me, I didn’t find other ed-tech companies and organizations at the Fairness, Accountability, and Transparency conference I went to last year in Atlanta or the 2019 HCOMP conference. Maybe ed-tech organizations don’t have research groups that are publishing openly and thus they don’t come to these academic conferences. Maybe readers of this blog will send me pointers to who I missed! 

Mini machine learning terminology primer from a novice (skippable if you already know these): To train a machine learning algorithm that is going to decide something or categorize something, you need to start out with a set of things for which you already know the correct decisions or categories. Those are the ‘ground-truths’ that you use to train the algorithm. You can think of the algorithm as a toddler. If you want the algorithm to recognize and distinguish dogs from cats, you need to show it a bunch of dogs and cats and tell it what they are. Mom and Dad say —  “look, a kitty”; “see the puppy?” An algorithm can be ‘over-fitted’ to the ground truth you give it. The toddler example is when your toddler knows the animals you showed them (that Fifi is a cat and Fido is a dog), but doesn’t know what new animals are, for example the neighbor’s pet cat. To add a further wrinkle, if you are creating a ground-truth, it is always great if you have Mom and Dad to create the labels, but sometimes all you can get are toddlers (novices) labeling. Using novices to train is related to the idea of wisdom of the crowd, where the opinion of a collection of people is used rather than a single expert.  You can also introduce bias into your algorithm by showing it only calico cats in the training phase, causing it to only label calicos as “cats” later on. Recent real world examples of training bias come from facial recognition algorithms that were trained on light-skinned people and therefore have trouble recognizing black and brown faces. 

Creating ground truth: A whole chunk of the talks were about different ways of creating ‘ground truths’ using ‘wisdom of the crowd’ techniques. Ed-tech needs quite a bit of ground-truth about the world to train algorithms to help students learn effectively. “How difficult is this task or problem?” “What concepts are needed to do this task/problem?” “What concepts are a part of this text/example/explanation/video?” “Is this solution to this task/problem correct, partially correct, displaying a misconception, or just plain wrong?” 

Finding the best-of-the-crowd: Several of the presentations were about finding and motivating the best of the crowd. If you can find and/or train ‘experts’ in the crowd, you can get to the ground-truth at lower cost (in time or money). I am hoping that ed-tech can use these techniques to crowdsource effective practice exercises, examples, solutions, and explanations. 

  1. Wisdom of the toddlers. Heinecke, et. al (https://aaai.org/ojs/index.php/HCOMP/article/view/5279) described a three step method for obtaining a ground truth from non-experts. First, they used a large number of people and expensive mathematical methods to obtain a small ground truth. (If we are sticking with the cats and dogs example from the primer above, you have a large number of toddlers tell you whether a few animals are cats and dogs and use math to decide which animals ARE cats and ARE dogs using wisdom of the toddlers.) From there, step 2 is to find a small set of those large numbers of people who were the best at determining a ground-truth, and use them to create more ground-truth. (Find a group of toddlers who together labeled the cats and dogs correctly, and use them to label a whole bunch more cats and dogs). Finally, you use the large set of ground truth to train a machine learning algorithm. I think this is very exciting for learning content because we have students and faculty doing their day to day work and we might be able to find sets of them that can help answer the questions above.
  2. Misconceptions of the herd: One complicating factor in educational technology ground truths is the prominent presence of misconceptions. The Best Paper winner at the conference, Simoiu et. al (https://aaai.org/ojs/index.php/HCOMP/article/view/5271), found an interesting, relevant, and in hindsight unsurprising result. This group did a systematic study of crowds answering 1000 questions from 50 different topical domains. They found that averaging the crowd’s answers almost always yields significantly better results than the average (50th percentile) person. They also wanted to see the effects of social influences on the crowd. When they showed the ‘consensus’ answer (current three most popular answers) to individuals, the crowd would be swayed by early wrong answers and thus did NOT perform on average better than the average unswayed person. Since misconceptions (wrong answers due to faulty understanding) are well known phenomena in learning, and are particularly resistant to change (if you haven’t seen Derek Muller’s wonderful 6 minute TED talk about this, go see it now!) we need to be particularly careful not to aid their contagion when introducing social features.

Are misconceptions like overfitting in machine learning? As an aside, my friend and colleague Sidney Burrus told an interesting story that sheds light on the persistence of misconceptions. Sidney talked about how, during the initial transition point between an earth-centered and sun-centered model of the solar system, the earth-centered model was much better at predicting orbits, because people had spent a lot of time adding detail to the model to help it correctly predict known phenomena. The first sun-centered models, however, used circular orbits and did a poor job of prediction, even though they had more ‘truth’ in them ultimately. Those early earth-centered models were tightly ‘fitted’ to the known orbits. They would not have been good at predicting new orbits, just like an overfitted machine learning model will fail on new data. 

Geoffrey Cohen’s talk at Rice on Inclusive Teaching

Inclusive Teaching

I went to an “Inclusive Teaching” workshop by Geoffrey Cohen, who works with Carol Dweck at Stanford. The workshop was sponsored by Rice’s Center for Teaching Excellence and was well attended by Rice faculty and staff. If you don’t know Carol Dweck, she pioneered a branch of research on the effects of mindset on performance in a wide variety of settings, concentrating on academic achievement. In this model mindsets fall into two camps. A ‘fixed mindset’ is a belief that a particular trait, like intelligence for instance, is fixed at birth and basically cannot be changed, versus a ‘growth mindset‘ which is a belief that a particular trait is malleable and improves with practice and effort. There are many, many different experiments that show that regardless of initial measured ability, a growth mindset is associated with higher performance academically over time, and this appears to be due to increased tenacity in the face of challenge, because failure is not seen as a measure of ability. Furthermore, particular interventions can shift a person’s mindset and shifting that mindset results in increased performance. When these interventions work, the results are significant and can be long lasting, on the order of years.

Given the potential, I have been interested in how we might incorporate growth-mindset inducing features into OpenStax products, and whenever someone with good ideas and research is around I try to learn what I can from them. These are my notes from this talk.

Social belonging / Stereotype threat / White men can’t jump

The talk concentrated on social belonging. You may have seen some of the research on what is called ‘stereotype threat‘. It seems counterintuitive, but it appears that if you think that people believe your group isn’t good at something, and your performance could confirm that negative stereotype, your performance suffers. That is a very causal way of explaining it, and, of course, these are really just correlations, but now I will just tell you some of the weird and wooly experiments that have been done. All of these divide subjects as evenly as possible into two groups, one of which gets the ‘treatment’ (in this case a negative treatment) and the other of which doesn’t, and then average scores are compared.
Things that decrease performance:
  • If you ask people to list their gender before taking a math test, female scores drop significantly.
  • If you ask people to list their race before taking an academic test, black student scores drop significantly. (There is such a thing as ‘stereotype lift’ also. White scores increase a little if asked to list their race, but the increase is much less than the decrease for groups where a negative stereotype is part of the culture).
  • If you tell people a test is a measure of intelligence, certain minorities and females do worse than giving the same test and characterizing it differently (skills …)
  • If a black researcher asks white men to jump as high as they can, they jump less high than if they are asked by a white researcher.

Digression: Unconscious bias in hiring

Cohen went in to a significant digression about experiments that show unconscious bias in hiring. I think this was mainly to give examples of how interventions can fix things that are unconscious and hard to just ‘goodwill’ away.

Research demonstrating bias

Specifically, there are a set of experiments that show that when comparing two candidates, people adjust their expectations to favor candidates that fit their stereotypes. For example, when presenting two candidates for a police promotion, one of which is male and one of which is female, and giving these candidates either more ‘on-the-job’ experience or more ‘book-learning’ experience, if you first show the candidates and then ask which is more important ‘on-the-job’ or ‘book-learning’, the hiring manager picks whichever criteria the male has.

Techniques that can decrease bias

But just like with mindset, there are interventions that can eliminate or improve these biases.
  • If you ask people to come up with the criteria for the best candidate before they see the candidates, they stick with those criteria and, in the case of the police promotion will (on average) pick a female candidate matching the stated criteria, over a male candidate that does not.
  • When people hire a group into a position, for instance hiring three managers at once, or three developers etc. – they are more likely to select a diverse group, than if they hire three people in successive rounds.

Social Belonging interventions that increase student performance

Then he came back to listing a set of ‘interventions’ that have been shown to have positive effects for females in male dominated fields, minorities in white dominated achievement areas, first generations college students, etc. These particular interventions did not show positive or negative effects for other groups, but studies that measure attitudes first do show benefits for all students coming in with particular mindsets and attitudes.
  • Having students read about ‘real’ students who felt they were not smart enough, or did not belong, but then found that they did. Or attend a panel of students discussing these feelings, especially if the audience identifies with the students (gender, race, economics, etc).  
  • Having students choose three sentences from among a long list that are important to them and then write a paragraph about each. Cohen called this ‘value affirmation‘. The values listed have a wide variety of things, and include non-academic values like ‘sense of humor’, ‘relationship with family’ (This intervention reduced F’s in a course by 50%, from 20% to 9%).
  • For K-12 students, having a teacher write ‘I am giving you these comments because I have high standards and I know that you can meet them.’ to accompany corrections and comments on an assignment. Teachers pre-wrote these and research assistants attached them to student work. Teachers and researchers were blind to who got these and who didn’t.
  • For K-12 students, having a teacher initiate an exercise where students write the end of this sentence ‘I wish that my teacher knew that …


This summary from Carol Dweck’s website, Academic Tenacity: Mindsets and Skills that Promote Long-Term Learning, has more about a lot of the research that Cohen described.