HCOMP 2019 Part 1 – Motivation isn’t all about credit.

HCOMP 2019 Humans and machines as teams – takeaways for learning

    The HCOMP (Human Computation) 2019 conference was about humans and machines working as teams and, in particular, combining ‘crowd workers’ (like those on Mechanical Turk and Figure Eight) and machine learning effectively to solve problems. I came to the conference to ‘map the field’ to learn about what people are researching and exploring in this area and to find relevant tools for building effective educational technology (ed-tech). I had an idea that this conference could be useful because ed-tech often combines the efforts of large numbers of educators and learners with machine learning recommendations and assistance. I wasn’t disappointed. The next few posts contain a few of the things that I took away from the conference. 

    Pay/Credit vs. Quality/Learning. Finding the sweet spot. Ed-tech innovators and crowd work researchers have a similar optimization problem: finding the sweet spot between fairness and accuracy. For crowd workers, the tension comes from a need to pay fairly for time worked, without inadvertently incentivizing lower quality work. The sweet spot is fair pay for repeatably high quality work. We have an almost identical optimization problem with student learning, if you consider student “pay” to be credit for work, and student “quality” to be learning outcomes. The good news is that while the two are often in tension with each other, those sweet spots can be found. Two groups in particular found interesting results in this area.

    1. Quality without rejection: One group investigating repeatability of crowd work (Qarout et. al) found that there was a difference in quality (about 10%) between work produced for Figure Eight and Amazon Turk (AT). Amazon Turk allows requesters to reject work they deem low-quality and Figure Eight doesn’t and the AT workers completed tasks at about 10% higher quality. However, the AT workers also reported higher stress. Students also report high levels of stress over graded work and fear making mistakes, both of which can result in detriments to learning, but we have found that students on average put in less effort when work is graded for completion rather than correctness. Qarout et. al tried a simple equalizer. At the beginning of the job, on both platforms, they explicitly said that no work would be rejected, but that quality work would be bonused. This adjustment brought both platforms up to the original AT quality, and these modified AT tasks were chosen faster than the original ones because the work was more appealing once rejection was off the table. It makes me think we should be spending a lot of research time on how to optimize incentives for students expending productive effort without overly relying on credit for correctness. If we can find an optimal incentive, we have a chance to both increase learning and decrease stress at the same time. Now that is a sweet spot. 

    2. Paying fairly using the wisdom of the crowd: A second exploration that has implications for learning is FairWork (Whiting, et. al). This group at Stanford created a way for those wishing to pay $15/hour to Amazon Turk workers to algorithmically make sure that people are paid an average of $15/hour. Figuring out how long a task takes on AT is hard, similar to figuring out how long a homework takes, so what the Stanford group did was ask workers to report how long their task took and then throw out outliers and average that time. They then used Amazon’s bonusing mechanism to auto-bonus work up to $15/hour. The researchers used some integrated tools to time a sample of workers (with permission) to see if the self-reported averages were accurate and found that they were. They plan to continue to research how well this works over time. For student work, we want to know whether students are spending enough effort to learn and we want them to get fair credit for their work. So it makes sense to try having students self-report their study time, and using some form of bonusing for correctness to balance incentivizing effort without penalizing the normal failure that is part of trying and learning.  

    Accessibility Sprint – Part 3: Giving non-visual feedback for learning from interacting with PhET simulations

    This is the third part of a series of blog posts about a coding sprint about creating interactive online learning that is usable for people with disabilities.

    The first post gives an overview of the coding sprint. Each of these subsequent posts describes the work of one team.

    Sims Team Goal

    Make the University of Colorado Boulder’s well respected, freely available, open-source PhET simulations more accessible for students who cannot see the simulation. By providing just the right amount of aural feedback about what is happening in the simulation after an action taken by a learner, blind and low-vision students could interact with the simulation, hear the results, and try additional actions to understand the underlying physics principles.

    For example, PhET has been working on making their Balloon and Static electricity simulation accessible by including scene descriptions that screen readers read aloud in order to orient learners that can’t just look around to see what looks controllable. The controls are all accessible via keyboard actions. But, when a learner takes an action, for instance removing a charged wall that is keeping the balloon steady, the resulting balloon movement must be described. It would overwhelm the listener if small changes are repetitively described, and it can be confusing if messages end up being read out of logical order. For instance, messages about the balloons movement might end up being read behind a message describing its reaching an object and stopping.

    Balloon simulation on laptop screen with sweater, balloon, and wall showing. Scene description listed to the right.
    This image shows a balloon simulation of static electricity moving between a shirt and balloon. Beside the visual is information encoded in the DOM that is read when particular actions are taken. This is what will be read using assistive technology to help operate this sim and understand the results when actions are taken.

    This group decided to work on extending the messaging being reported by this balloon sim, in order to better report very dynamic events, such as moving the balloon, or the balloon moving itself (attracted to sweater) without overwhelming and overlapping messages. To do this, they designed an UtteranceQueue, which is a FIFO (first in, first out) message queue with certain rules: it takes an object that contains an utterance, an object the utterance is associated with, an expected utterance time (to delay before the next utterance) and a callback that returns a boolean, to allow the utterance to be cancelled, rather than spoken, when it reaches the top of queue. This should allow a simulation programmer to design the set of messages a particular object should report. For example the balloon would report being moved, as well as its state of charge, and whether it is stuck to something. The callback would allow, for example, the balloon movement messages to cancel themselves if the balloon is in fact now stuck to the sweater or wall.

    Laptop with balloon sim and code inspector
    This image is showing the same balloon sim with the browser code view open to see what is controlling the simulation.

    Testing (of the earlier version)

    While the above development was occurring, one of the team members, Kelly, tested the feedback announcer function in the existing version of the balloon sim (the one before the code sprint) and got some user feedback for the group. The person that she tested with had worked with the sim before, but not with the new scene narration. Her test subject found the narration volubility to be just about right. He did, however, want to have a way to repeat some narration.


    At the end of the day, this group demonstrated the operation of the new UtteranceQueue when the wall is removed and the balloon starts drifting toward the sweater. The movement was described (and not overly repetitive) and when the balloon got to the sweater that event was narrated. No other messages followed.

    People who worked in this group: Jesse Greenberg, Darron Guinness, Ross Reedstrom, Kelly Lancaster




    Accessibility sprint – part 2: Creating a mobile-friendly and accessible Infobox for maps

    This is the second part of a series of blog posts about a coding sprint that happened the day before CSUN 17. The sprint was about creating interactive online learning that is usable for people with disabilities. This whole software area is called accessibility, and known as inclusive design.

    The first post gives an overview of the coding sprint. Each of these subsequent posts describes the work of one team.

    Creating a mobile-friendly and accessible Infobox for maps

    Team Goal

    Create a widget for helping people who are blind or have low vision explore maps that display statistical information (think popular vote winners in the US). This type of map is called a choropleth.

    The existing infobox widget takes statistical data in a simple format and works with hot spots on an svg map to bring up an info box as a user mouses over or tabs to different regions on the map. The current version, however, isn’t accessible for low vision, doesn’t work well with screen readers, and doesn’t work on mobile. The team worked on improving these aspects of the widget (which can be reused for any statistical map).

    Map of the 2016 US presidential election popular vote results by state, with blue, red, pink and light blue colors for each state. An info box is open for New Mexico with the winner and actual vote totals.
    United States 2016 presidential race: Popular vote by state. 

    Demo at the end of the day

    Doug Schepers demonstrated the improvements. The demo showed the map tool improved for low vision and screen reader access. For low vision, the state selection outline was thickened, the info box contrast was increased and made resizable, the info box placement was adjusted to make sure the selected state was not covered. The ability to select the next state via tabbing on the states was added. Selection is currently in alphabetical order, and a better system would work on the navigation also. He also demonstrated using a screen reader and being able to select a state and hear it read the info box for each state. It uses ARIA Live Regions to update things. The statistical data is formatted using simple name, value pairs.

    The ultimate goal is to define a simple standard for describing statistical map data and provide an open-source, reusable, accessible widget for interacting with these maps.

    Doug Schepers and Derek Riemer worked together.

    The code is available here: https://github.com/benetech/Accessible-Interactives-Dev/tree/master/MapInteractives

    CSUN 17 Acessibility Coding Sprint for People with Disabilities (Making learning accessible) – Part 1

    Last week, my colleagues at OpenStax, Phil Schatz, Ross Reedstrom and I attended the 2nd annual pre-CSUN (but third overall) accessibility coding sprint to help make learning materials useable by people with disabilities.

    Prior accessibility coding sprints

    The first took place in 2013 and was jointly sponsored by my Shuttleworth Foundation fellowship and Benetech and held at the offices of SRI. You can read more about that one in these earlier posts (2013-accessibility-post-1, post-2, post-3, post-4, and post-5). The second took place last year before the CSUN 2016 Accessibility Technology Conference in sunny San Diego and was again sponsored by funds from my Shuttleworth Foundation fellowship and by Benetech. That one focused specifically on tools for creating accessible math. Read more in Benetech’s blog post under “Sprinting towards accessible math”, Murray Sargent’s follow up post on accessible trees and Jamie Teh’s post about creating an open-source proof-of-concept extension of math speech rules used by the NVDA browser to make them sound more natural.

    This year’s sprint

    Four tables with 10 participants conferring and working at laptops
    Participants at work

    This one again took place in not-quite-as-sunny San Diego (California has been getting lots of rain) before this year’s CSUN-17 conference. The focus was on making interactive learning content accessible. And the very cool thing from my perspective is that my fellowship had nothing to do with the organization of this one. Benetech and MacMillan Learning sponsored and organized this one. The attendance was the largest ever with 30-ish in person participant and 5 or so attending remotely. We had several developers that both create accessible software and use assistive technology themselves.

    Like previous sprints, we spent time initially getting to know each other and brainstorming and then divided into multiple teams ranging from a single person to five people working together to prototype, explore, or make progress on a particular accessibility feature. In upcoming posts, I will highlight each of the team’s goals and what they demonstrated at the end of the day.

    Upcoming posts (links will be added as subsequent posts appear)

    • Creating responsive (mobile-friendly) and accessible (screen-reader friendly) Infobox for maps
    • Giving non-visual feedback for learning from interacting with PhET simulations
    • Using alternatives to drag and drop for matching, ordering, and categorization tasks
    • Personalizing website interfaces for better accessibility (both sensory and cognitive)
    • Standardizing the display of math in publications
    • A Nemeth and UEB Braille symbols table
    • Using MathJax to produce Braille output from LaTeX math

    Read more

    Geoffrey Cohen’s talk at Rice on Inclusive Teaching

    Inclusive Teaching

    I went to an “Inclusive Teaching” workshop by Geoffrey Cohen, who works with Carol Dweck at Stanford. The workshop was sponsored by Rice’s Center for Teaching Excellence and was well attended by Rice faculty and staff. If you don’t know Carol Dweck, she pioneered a branch of research on the effects of mindset on performance in a wide variety of settings, concentrating on academic achievement. In this model mindsets fall into two camps. A ‘fixed mindset’ is a belief that a particular trait, like intelligence for instance, is fixed at birth and basically cannot be changed, versus a ‘growth mindset‘ which is a belief that a particular trait is malleable and improves with practice and effort. There are many, many different experiments that show that regardless of initial measured ability, a growth mindset is associated with higher performance academically over time, and this appears to be due to increased tenacity in the face of challenge, because failure is not seen as a measure of ability. Furthermore, particular interventions can shift a person’s mindset and shifting that mindset results in increased performance. When these interventions work, the results are significant and can be long lasting, on the order of years.

    Given the potential, I have been interested in how we might incorporate growth-mindset inducing features into OpenStax products, and whenever someone with good ideas and research is around I try to learn what I can from them. These are my notes from this talk.

    Social belonging / Stereotype threat / White men can’t jump

    The talk concentrated on social belonging. You may have seen some of the research on what is called ‘stereotype threat‘. It seems counterintuitive, but it appears that if you think that people believe your group isn’t good at something, and your performance could confirm that negative stereotype, your performance suffers. That is a very causal way of explaining it, and, of course, these are really just correlations, but now I will just tell you some of the weird and wooly experiments that have been done. All of these divide subjects as evenly as possible into two groups, one of which gets the ‘treatment’ (in this case a negative treatment) and the other of which doesn’t, and then average scores are compared.
    Things that decrease performance:
    • If you ask people to list their gender before taking a math test, female scores drop significantly.
    • If you ask people to list their race before taking an academic test, black student scores drop significantly. (There is such a thing as ‘stereotype lift’ also. White scores increase a little if asked to list their race, but the increase is much less than the decrease for groups where a negative stereotype is part of the culture).
    • If you tell people a test is a measure of intelligence, certain minorities and females do worse than giving the same test and characterizing it differently (skills …)
    • If a black researcher asks white men to jump as high as they can, they jump less high than if they are asked by a white researcher.

    Digression: Unconscious bias in hiring

    Cohen went in to a significant digression about experiments that show unconscious bias in hiring. I think this was mainly to give examples of how interventions can fix things that are unconscious and hard to just ‘goodwill’ away.

    Research demonstrating bias

    Specifically, there are a set of experiments that show that when comparing two candidates, people adjust their expectations to favor candidates that fit their stereotypes. For example, when presenting two candidates for a police promotion, one of which is male and one of which is female, and giving these candidates either more ‘on-the-job’ experience or more ‘book-learning’ experience, if you first show the candidates and then ask which is more important ‘on-the-job’ or ‘book-learning’, the hiring manager picks whichever criteria the male has.

    Techniques that can decrease bias

    But just like with mindset, there are interventions that can eliminate or improve these biases.
    • If you ask people to come up with the criteria for the best candidate before they see the candidates, they stick with those criteria and, in the case of the police promotion will (on average) pick a female candidate matching the stated criteria, over a male candidate that does not.
    • When people hire a group into a position, for instance hiring three managers at once, or three developers etc. – they are more likely to select a diverse group, than if they hire three people in successive rounds.

    Social Belonging interventions that increase student performance

    Then he came back to listing a set of ‘interventions’ that have been shown to have positive effects for females in male dominated fields, minorities in white dominated achievement areas, first generations college students, etc. These particular interventions did not show positive or negative effects for other groups, but studies that measure attitudes first do show benefits for all students coming in with particular mindsets and attitudes.
    • Having students read about ‘real’ students who felt they were not smart enough, or did not belong, but then found that they did. Or attend a panel of students discussing these feelings, especially if the audience identifies with the students (gender, race, economics, etc).  
    • Having students choose three sentences from among a long list that are important to them and then write a paragraph about each. Cohen called this ‘value affirmation‘. The values listed have a wide variety of things, and include non-academic values like ‘sense of humor’, ‘relationship with family’ (This intervention reduced F’s in a course by 50%, from 20% to 9%).
    • For K-12 students, having a teacher write ‘I am giving you these comments because I have high standards and I know that you can meet them.’ to accompany corrections and comments on an assignment. Teachers pre-wrote these and research assistants attached them to student work. Teachers and researchers were blind to who got these and who didn’t.
    • For K-12 students, having a teacher initiate an exercise where students write the end of this sentence ‘I wish that my teacher knew that …

    This summary from Carol Dweck’s website, Academic Tenacity: Mindsets and Skills that Promote Long-Term Learning, has more about a lot of the research that Cohen described.