HCOMP 2019 Humans and machines as teams – takeaways for learning
The HCOMP (Human Computation) 2019 conference was about humans and machines working as teams and, in particular, combining ‘crowd workers’ (like those on Mechanical Turk and Figure Eight) and machine learning effectively to solve problems. I came to the conference to ‘map the field’ to learn about what people are researching and exploring in this area and to find relevant tools for building effective educational technology (ed-tech). I had an idea that this conference could be useful because ed-tech often combines the efforts of large numbers of educators and learners with machine learning recommendations and assistance. I wasn’t disappointed. The next few posts contain a few of the things that I took away from the conference.
Pay/Credit vs. Quality/Learning. Finding the sweet spot. Ed-tech innovators and crowd work researchers have a similar optimization problem: finding the sweet spot between fairness and accuracy. For crowd workers, the tension comes from a need to pay fairly for time worked, without inadvertently incentivizing lower quality work. The sweet spot is fair pay for repeatably high quality work. We have an almost identical optimization problem with student learning, if you consider student “pay” to be credit for work, and student “quality” to be learning outcomes. The good news is that while the two are often in tension with each other, those sweet spots can be found. Two groups in particular found interesting results in this area.
- Quality without rejection: One group investigating repeatability of crowd work (Qarout et. al) found that there was a difference in quality (about 10%) between work produced for Figure Eight and Amazon Turk (AT). Amazon Turk allows requesters to reject work they deem low-quality and Figure Eight doesn’t and the AT workers completed tasks at about 10% higher quality. However, the AT workers also reported higher stress. Students also report high levels of stress over graded work and fear making mistakes, both of which can result in detriments to learning, but we have found that students on average put in less effort when work is graded for completion rather than correctness. Qarout et. al tried a simple equalizer. At the beginning of the job, on both platforms, they explicitly said that no work would be rejected, but that quality work would be bonused. This adjustment brought both platforms up to the original AT quality, and these modified AT tasks were chosen faster than the original ones because the work was more appealing once rejection was off the table. It makes me think we should be spending a lot of research time on how to optimize incentives for students expending productive effort without overly relying on credit for correctness. If we can find an optimal incentive, we have a chance to both increase learning and decrease stress at the same time. Now that is a sweet spot.
- Paying fairly using the wisdom of the crowd: A second exploration that has implications for learning is FairWork (Whiting, et. al). This group at Stanford created a way for those wishing to pay $15/hour to Amazon Turk workers to algorithmically make sure that people are paid an average of $15/hour. Figuring out how long a task takes on AT is hard, similar to figuring out how long a homework takes, so what the Stanford group did was ask workers to report how long their task took and then throw out outliers and average that time. They then used Amazon’s bonusing mechanism to auto-bonus work up to $15/hour. The researchers used some integrated tools to time a sample of workers (with permission) to see if the self-reported averages were accurate and found that they were. They plan to continue to research how well this works over time. For student work, we want to know whether students are spending enough effort to learn and we want them to get fair credit for their work. So it makes sense to try having students self-report their study time, and using some form of bonusing for correctness to balance incentivizing effort without penalizing the normal failure that is part of trying and learning.