More lessons from HCOMP 2019
Crowd-sourcing content: Although publishers (I work for one) create high quality content that is written, curated, and reviewed by subject matter experts (SMEs, pronounced ‘smeez’ in the industry), there are all sorts of reasons that we always need more content. In the area of assessments, learners need many, many practice items to continually test and refine their understanding and skills. Also, faculty and practice tools need a supply of new items to ‘test’ what students know and can do at a certain point in time. Students need really good examples that are clearly explained. (Khan Academy is a great example of a massive collection of clearly explained examples). When students make attempts to solve homework problems, they also need feedback about what NOT to do and why those aren’t the correct approaches. Therefore, we need to know what the core concepts needed for each activity or practice are.
Faculty and students are already creating content! Because faculty and students are already producing lots of content themselves as part of their normal workflow, and faculty are assembling and relating content and learning activities, it would be great to figure out how to leverage the best of that content and relationship labeling for learning. This paper by Bates, et. al, looked at student generated questions and solutions in the Peer Wise platform (https://journals.aps.org/prper/pdf/10.1103/PhysRevSTPER.10.020105) and found that with proper training, students generate high quality questions and explanations.
So at HCOMP 2019 I was listening for ways to crowdsource examples and ground truth. In particular, it would be useful to see if machine learning algorithms could find faculty and student work that is highly effective at helping students improve their understanding. The two papers below address both finding content exemplars and training people to get better at producing effective content.
Some highly-rated examples are better than others. Doroudi et. al wanted to see what effect showing highly rated peer-generated examples to crowd workers would have on the quality of work they submitted. In this study, the workers were writing informative comparison reviews of products to help a consumer decide which product better fits their needs. The researchers started with a small curated ground truth of high quality reviews. Workers that viewed highly rated reviews before writing their own ended up producing better reviews (more highly rated). That isn’t surprising, but interestingly, even among equally highly rated reviews, some reviews were much more effective in helping improve work! They used machine learning to determine the most effective training examples. So that suggests that, while viewing any highly rated examples will improve new contributors’ content, we can then improve the training process even more by selecting and showing the examples with the best track record of improving workers’ content creations.
Using training and leveling up: Madge et. al introduced a crowd-worker training ‘game’ with a concept of game-levels. They showed the method was more effective than standard practice at producing highly effective crowd workers. Furthermore, they showed how machine learning algorithms could determine which tasks belonged in each game level by observing experimentally how difficult tasks were.
Crowd workers are often used to generate descriptive labels for large datasets. Examples include tagging content in images (‘dog’), identifying topics in twitter feeds (‘restaurant review’), and labeling the difficulty of a particular homework problem (‘easy’). In this particular study, workers were identifying noun phrases in sentences. The typical method of finding good crowd workers is to start out by giving a new worker tasks that have a known “right answer” and then picking workers that are best at those tasks to do the new tasks you actually want completed. The available tasks are then distributed to workers randomly, meaning a worker might get an easy or difficult task at any time. These researchers showed that you could train new workers using a ‘game’, so that they improve over time and are able to do more and more difficult tasks (harder and harder levels of the game), and the overall quality of labeling for the group of workers is better.
Better education content: Faculty and students could become more effective producers of education content with the help of these two techniques. Motivating, training and selecting contributors via comparison with highly rated examples and leveling up to ‘harder’ or more complex content would be useful to help contributors to create high quality learning content (example solutions, labeling topics and difficulty, giving feedback). These techniques also sound really promising for training students to generate explanations for their peers, and potentially to train them to give more effective peer feedback.