Category Archives: Open Education Resources (OER)

Crowd 'speaking' with labels "Level Up!, 5 stars, A+"

Leveling up crowd-sourced educational content — with a little help from machine learning and lessons from HCOMP19.

More lessons from HCOMP 2019

Crowd-sourcing content: Although publishers (I work for one) create high quality content that is written, curated, and reviewed by subject matter experts (SMEs, pronounced ‘smeez’ in the industry), there are all sorts of reasons that we always need more content. In the area of assessments, learners need many, many practice items to continually test and refine their understanding and skills. Also, faculty and practice tools need a supply of new items to ‘test’ what students know and can do at a certain point in time. Students need really good examples that are clearly explained. (Khan Academy is a great example of a massive collection of clearly explained examples). When students make attempts to solve homework problems, they also need feedback about what NOT to do and why those aren’t the correct approaches. Therefore, we need to know what the core concepts needed for each activity or practice are. 

Faculty and students are already creating content! Because faculty and students are already producing lots of content themselves as part of their normal workflow, and faculty are assembling and relating content and learning activities, it would be great to figure out how to leverage the best of that content and relationship labeling for learning. This paper by Bates, et. al, looked at student generated questions and solutions in the Peer Wise platform ( and found that with proper training, students generate high quality questions and explanations. 

So at HCOMP 2019 I was listening for ways to crowdsource examples and ground truth. In particular, it would be useful to see if machine learning algorithms could find faculty and student work that is highly effective at helping students improve their understanding. The two papers below address both finding content exemplars and training people to get better at producing effective content. 

Some highly-rated examples are better than others. Doroudi et. al wanted to see what effect showing highly rated peer-generated examples to crowd workers would have on the quality of work they submitted. In this study, the workers were writing informative comparison reviews of products to help a consumer decide which product better fits their needs. The researchers started with a small curated ground truth of high quality reviews. Workers that viewed highly rated reviews before writing their own ended up producing better reviews (more highly rated). That isn’t surprising, but interestingly, even among equally highly rated reviews, some reviews were much more effective in helping improve work! They used machine learning to determine the most effective training examples. So that suggests that, while viewing any highly rated examples will improve new contributors’ content,  we can then improve the training process even more by selecting and showing the examples with the best track record of improving workers’ content creations.  

Using training and leveling up:  Madge et. al introduced a crowd-worker training ‘game’ with a concept of game-levels. They showed the method was more effective than standard practice at producing highly effective crowd workers. Furthermore, they showed how machine learning algorithms could determine which tasks belonged in each game level by observing experimentally how difficult tasks were. 

Crowd workers are often used to generate descriptive labels for large datasets. Examples include tagging content in images (‘dog’), identifying topics in twitter feeds (‘restaurant review’), and labeling the difficulty of a particular homework problem (‘easy’). In this particular study, workers were identifying noun phrases in sentences. The typical method of finding good crowd workers is to start out by giving a new worker tasks that have a known “right answer” and then picking workers that are best at those tasks to do the new tasks you actually want completed. The available tasks are then distributed to workers randomly, meaning a worker might get an easy or difficult task at any time. These researchers showed that you could train new workers using a ‘game’, so that they improve over time and are able to do more and more difficult tasks (harder and harder levels of the game), and the overall quality of labeling for the group of workers is better.  

Better education content: Faculty and students could become more effective producers of education content with the help of these two techniques. Motivating, training and selecting contributors via comparison with highly rated examples and leveling up to ‘harder’ or more complex content would be useful to help contributors to create high quality learning content (example solutions, labeling topics and difficulty, giving feedback). These techniques also sound really promising for training students to generate explanations for their peers, and potentially to train them to give more effective peer feedback. 

Linking to Objectives in the OERPUB editor (a prototype between MIT OEIT folks and OERPUB)

Decorative, colorful concept map
Learning Objectives, Concept Maps
Image: By Sborcherding at en.wikibooks
[Public domain],
from Wikimedia Commons

The exploration: When creating textbooks and interactive learning activities, wouldn’t it be cool if authors (and eventually others) could easily link material to learning objectives? This is the second exploration that OERPUB, Lumen Learning, and MIT’s Office of Educational Innovation and Technology (OEIT) took on together in Salt Lake City. Linking materials (textbook, activities, videos, quizzes) to learning objectives makes them easier to find, and could also allow navigation by objective rather than by a single linear path through the material.

The Scenario: An author is writing a textbook or course in the OERPUB editor. Perhaps it is a physics course, and the course has a set of objectives that it teaches (or hopes to). The author is writing a section on lattices and the ways that x-rays scatter through crystalline structures. Since the physics department at MIT has defined this as a learning objective, it would be great if the author could easily specify that a reading teaches this objective.

The Components: MIT’s OEIT has a service for storing and looking up learning objectives, called MC3. MC3 has an API for returning learning objectives. Before we got together, Cole Shaw took the OERPUB editor and embedded it in a page that connects with the MC3 server. The screenshots below show his prototype. He added a new “widget” to the editor for adding an activity and wired it up to include an objectives drop down. The choices in the drop down are coming from the MIT’s objectives server. He copied an existing widget and modified it.

shows the editor with a drop down added to choose which server to get objectives from and which set of objectives to use.
Cole added a top toolbar for choosing where objectives
should be looked up.

Here is the drop down in an activity added to the document. The choices
are looked up live. Once one is chosen, it is added to the activity.

And then when we all got together, Cole and Tom Wooward worked together to take Cole’s work and make it a widget that works in the github-bookeditor. That is shown below. Tom also showed Cole some of the ways to configure educational widgets within the editor. (That also tells us where we need to improve documentation for developers.)

This is the same widget, but in the github-bookeditor. The
server to query is hard-coded. This will live on a branch
to show how such a thing can be done.

Really making this kind of thing widely useful for general users of the editor, requires more thought, time, and effort. MIT is hosting their own course objectives, and their software provides the store and lookup service. But these aren’t general purpose. The user interface would need to provide ways of configuring which objectives are relevant, etc.

If we did come up with a way to do something like this, I would love to see a way to make choosing an objective a standard option on all content sections and educational widgets. In other words, an author could attach an objective to essentially anything within the HTML and the editor would provide an easy UI for doing that and a simple encoding as metadata to store in the document. I think that would probably be’s educationalAlignment.   

Technical notes and links:

Textbook writing sprint with K12 teachers in South Africa

Although I have much more to share about this sprint and what we learned, I wanted to let people know about an exciting first outing of OERPUB’s textbook editor.

Table of contents, and book section from the sprint
Screenshot of the editor (books stored in github)

In August, Siyavula, OERPUB, and St. Johns College K-12 college preparatory school collaborated on a textbook sprint to develop custom textbooks for Physics and Chemistry to serve in 11th and 12th grades. Six teachers, three in physics and three in chemistry, participated. We started with source books from Siyavula and OpenStaxCollege. The teachers also brought their own source materials. We use the brand new (pre-alpha) version of the textbook editor, based on the github-book editor started by Phil Schatz of Connexions. We started with all the source books preloaded, and with a skeleton book loaded with curriculum guidelines.

Teachers used the editor to edit from scratch and to copy modules (chapters and sections) from the source books, and to copy smaller parts like images or worked examples from the source books. We had the developer team present to fix bugs as they were encountered and to design features as needs arose. A fair number of issues were found (low load times and problems with collaborative editing of the table of contents), which we are addressing now. Despite that, the group made significant progress on chapters in the books and more importantly were convinced that we have finally hit upon the right solution for authoring and remixing textbooks. The team is now putting better bug fixes in place and the authors will return to work on the books soon. They plan to use the Physics textbook in January. Siyavula will create PDF’s for the books using a variation on their standard PDF generation.

Next Steps from the Accessibility Sprint

A lot of why we got folks together for a sprint on accessibility when creating and using web-based scholarly and educational materials was to make sure that the different participants got to know each other, had a good feel for the kinds of expertise and tools that each organization (see list below) specializes in, and could put faces to names. I think we accomplished those goals, and we also made some concrete plans for next steps. We spent the third half-day of the sprint looking at next steps for some realizable opportunities arising from the sprint although some teams kept coding (see note below).

In case you missed my earlier posts on this sprint, here are some quick links to those and links to other posts about the sprint. one from Adrian Garcia, UI intern with OERPUB):

A Service Using MathJax and ChromeVox to generate MathML, SVG, and text-descriptions of math.

Benetech is eager to move forward with support for more accessible mathematics in a tangible way, because this fits into an existing project. So a group of us spent the last morning of the sprint determining which of our ideas and prototypes around accessible mathematics could be implemented relatively quickly and efficiently. The group working on server-side mathjax for taking MathML and producing images and descriptive text for voicing math, had created a prototype quickly. Making it really work could be done fairly straight-forwardly, by building on the work of people at the sprint. It would need the following:

  • The prototype server-side code that builds on Phil Schatz’ code.
  • MathJax running server-side via PhantomJS.
  • ChromeVox’s mathematics description generation made into a separate service called by the code and running via PhantomJS.

Why building this server-side tool would be immediately useful

  • It could make EPUB books with mathematics accessible for learners using screen readers. EPUB3 calls for supporting MathML directly, but support for that is not available in most readers. Currently, publishers must produce images instead, which aren’t helpful for visually-impaired scholars and learners. With this server-side component, publishers can use MathML as the source, and deliver images with descriptions for reading the math aloud.
  • Pre-converting mathematics allows publishers more control over the generated mathematics and could make the reading experience faster for learners. Connexions, for instance, would like to ensure that their EPUBs and PDFs have mathematics that looks the same across devices. They would like to be able to generate both using MathJax.

Benetech, MathJax, and ChromeVox are working together to move this project forward. If you would like to help or keep up with the progess, please email Anh Bui, anhb at, to be added to the mailing list.

Aside about sprint lengths

A few of the teams building prototypes really wanted to continue their work and kept coding. I am sure they would have used at least a full day more coding. My friend Adam Hyde always recommends a week-long sprint. He organizes book sprints where a group writes a book in a week. Last summer, my team participated in a coding sprint with Adam’s Booktype project and about five other organizations. That sprint lasted a week. It was fabulous. We picked the editor that we based ours on, determined what approach we would take for mathematics editing, and explored options for real-time collaboration. You can read about it in earlier blog posts on that sprint.)  

Participating organizations at the accessibility sprint

This sprint was supported by generous funding and in-kind support from