Dynamics and governance of crowd work markets: lessons from practice, open questions, and next steps
M. Six Silberman


I'm going to talk today about the dynamics and governance of crowd work markets.

[Slide 2.]

First I'll talk about Mechanical Turk, and workers' issues in Mechanical Turk. Then I'll explain what I mean by "dynamics and governance". Going forward in my work I want to think mainly about open crowd work markets. I'll explain why briefly.

Next I'll talk about Turkopticon. Turkopticon is an employer reputation system for Mechanical Turk built by Lilly Irani and me in 2008. I'll describe its design and status. I'll also talk about what I think I've learned from helping to maintain it for five years.

The rest of the talk will be future-oriented. I'll talk about work in economics that I think can help us understand crowd work markets and make better ones in the future. I'll list some open questions about the dynamics of crowd work markets. I'll conclude with possible concrete next steps.

I once saw a talk from a famous computing researcher who spoke very fast. "This is how I always talk," she said. "You'll just have to think faster." In my talk -- and on this topic generally -- please take your time thinking. The ethical issues we face are hard. We need to think seriously about them. It's fine to change a design because the situation changes, or because we learn something new. But we should be embarrassed if we have to change a design because we didn't think. In my view the philosophy we aspire to should not be "move fast and break things" but "measure twice -- cut once."

[Slide 3.]

Mechanical Turk, as many of you probably know, is a web site run by Amazon.

It is a market for small information tasks -- for example, "what is in this picture?", "are these two directory entries for a business the same?", "rewrite this sentence in your own words", and "transcribe this audio clip".

Task prices range from 1 cent to a few dollars. Experienced workers can earn more, but most earn a few US dollars per hour.

Workers can be paid in US dollars, Indian rupees, or Amazon gift card points.

Amazon says there are between 250,000 and 500,000 workers on Mechanical Turk. This number is not too instructive, because workers can be more or less active. Some do one task once and never again; some do one task a month; some work on it 40 hours a week.

Panos Ipeirotis estimates there are between 1000 and 10,000 workers on it at any time.

In a survey run by Joel Ross from 2008 to 2009, a little more than half the workers were in the US, a little less than half were in India, and the rest were from a wide range of countries. These numbers are out of date, as there have been significant changes in the market in the last three years. But I cite them to make the point that in open crowd work markets, workers with widely different costs and conditions of living compete directly for the same work.

In the same survey, about 20% of respondents said they always or sometimes needed their income from Turking to meet basic needs. This number was closer to 30% for Indian Turkers and closer to 10% for US Turkers.

The survey probably underestimates that number. Lilly and I were co-authors on the paper, and workers, in the context of criticizing us and the paper, later told us the survey paid too little to attract "professional" crowd workers. These workers spend tens of hours a week Turking. They rely on their Turking income to meet basic needs. They have sophisticated strategies and tools for maximizing their earnings. These include sharing information about newly posted tasks in real time and using user scripts they have written themselves to avoid unnecessary interface actions.

Respondents to the survey reported diverse motivations for working on Mechanical Turk. But most said their main motivation was money, not fun. In my work I tend to focus on the concerns of these respondents. Some people Turk in their spare time, as an alternative to spending time on social network sites or playing games. Some people even Turk at work. The motivations of these people affect the dynamics of the market and need to be considered. But I think the most important ethical issues arise when we consider people who relate to crowd work as a livelihood. I think most people have qualitatively different relationships to the conditions of their livelihood than to the conditions of an entertainment. And I think as crowd work grows, there will be more people making a livelihood of it -- or trying to.

[Slide 4.]

Many of workers' concerns with AMT focus on one feature: the ability of employers to "reject" work. Let us be exactly clear what this means. As an employer, "I reject this work" means "I don't pay for this work".

I think the idea behind this feature is that if work is unusable the employer should not have to pay. The option to reject unusable work should discourage workers from shirking -- from producing unusable work in an attempt to earn money for very little effort. To some extent, it does. But it also has side effects. Importantly, use and payment are separate decisions. Suppose I am an employer, and I write a script to check if work is usable or not. If work is unusable I don't use it and don't pay for it. The straightforward thing to do with work that is usable is to use it and pay for it. But I could also use it and pay for it with, say, probability one-half.

This strategy cuts costs and there is no built-in incentive for me not to do it. As an employer I don't have to give a reason for not paying. Workers can complain via email. But I don't have to respond to these emails, or even read them. My collaborator Lilly Irani interviewed one employer about this. They said they don't read the emails, but if they get a lot of emails they know they probably have a bug in their script or their task. They fix the bug but don't not review work that has already been rejected.

Amazon charges employers for *posting* tasks. So Amazon's cut is the same whether or not employers pay workers. They have no immediate financial incentive to ensure that employers pay workers for usable work.

The design of the rejection mechanism also interacts with workers' ability to get work. AMT keeps track of the percentage of work a worker has been paid for. This is called the workers' approval rate. Up until recently, a newly-created task would by default screen workers with approval rates below 95%. The approval rate has been replaced by the opaque "Masters Qualification" as the main universal worker reputation mechanism. But if you create a task and do not screen non-Masters workers, the 95% approval rate threshold is the next default. So although it has been superseded, the approval rate has not been entirely abandoned as a proxy for worker quality.

There is no reputation system for employers.

If you are a worker choosing a task you might want to know how frequently the employer who posted the task rejects work. You might call this the employer's rejection rate. You might even want to see a list of all tasks from employers who have an rejection rate not above a certain number. But this is not currently possible.

1.2. Workers' issues in Mechanical Turk.

So, what do workers complain about?

[Slide 5.]

This list is from a paper Lilly Irani, Joel Ross, Bill Tomlinson, and I prepared for the 2010 Human Computation Workshop.

All of these except the time limits are enabled directly or indirectly by the rejection mechanism -- not just the fact that employers can reject work, but that they can do so without explanation, consequence, or a public record.

This list is three years old. Significant changes have occurred in Mechanical Turk, but not on these topics. So unfortunately the list has held up well.

[Slide 6.]

1.3. Framing the issues.

These are ethical issues. But the distinction "ethical/unethical" is too subjective and contentious to guide design.

They are also legal issues. But the technology has changed faster than the law, so existing law cannot guide design either.

[Slide 7.]

I think Herbert Simon's interpretation of "design" is useful here: "Everyone designs who devises courses of action aimed at changing existing situations into preferred ones." So, what is the existing situation and what are possible preferred situations? Preferred by whom?

[Slide 8.]

I use the word "dynamics" to point out that a market is a complex system. Variables of interest -- for example, the number of workers in the market and the skills, the number of tasks posted to the market and their prices -- interact with one another, and with the design of the system, in often unexpected ways.

For example, letting employers reject work may reduce worker shirking and raise average work quality. However it may also encourage employers to reject usable work, lowering effective wages. This may in turn cause some workers who produce high-quality work to leave the market -- lowering average work quality.

(The economist George Akerlof called this phenomenon "the market for lemons" in a 1970 paper about the market for used cars. Panos Ipeirotis argues that this does in fact happen in Mechanical Turk.)

In general, by "dynamics" I mean the orderly way in which outcome variables -- such as task prices, worker shirking, employer rejection, number of workers in the market, distribution of the gains from trade, workers' perceptions of crowd work as a viable livelihood, and work quality -- all interact with each other, and with the design of the system, over time.

By "governance" I mean the rules of the market. I think it is more actionable to talk about "governance" than "ethics". In framing an issue in terms of ethics it is tempting to evaluate the situation against an external standard, by *my* ideas about what is ethical. But I am not a professional crowd worker. My ideas about what is ethical may be different from theirs. I can have my opinions, but what market participants think is ultimately more important. What outcomes to participants prefer? Are all participants able to express their preferences? If a participants objects to a situation but cannot withdraw for reasons external to the market -- for example, they need the money -- do they have the power to change the situation? In general, what are the rules of the system, and how are they made, changed, and enforced? Who gets to make them and change them?

Another way to describe "dynamics and governance" is that it is about the relationship between design variables and outcome variables.

[Slide 9.]

By design variables I mean, basically, system features. Here are some. Some of these exist in current crowd work markets; others don't.

[Slide 10.]

By outcome variables I mean variables that describe the outcomes of interactions in the market.

[Slide 11.]

By "dynamics" I mean how design variables affect outcome variables --

[Slide 12.]

-- and how outcome variables affect each other.

[Slide 13.]

By "governance" I mean how outcome variables affect design. What processes are in place that allow system operators to change the design of the market in response to observed outcomes based on participants' preferences?

[Slide 14.]

1.4. I want to focus on open crowd work markets. By "open" I simply mean that participants are not screened before being allowed to participate. Mechanical Turk, which was my prototypical example for some time, is ironically no longer open, which points to the difficulties involved in managing an open market. But I think open crowd work markets, by virtue of their broad reach, hold the greatest potential for broad socioeconomic benefit -- if designed well.

[Slide 15.]

2. Turkopticon.
2.1. Motivated by the workers' issues I listed, in 2008 Lilly Irani and I built Turkopticon.
2.2. Turkopticon is a third-paty employer reputation system for Mechanical Turk.
2.2.1. By "third-party" I mean it is not associated with Amazon at all. We have no special access to their data or anything like that.
2.3. The original goals of Turkopticon were (a) to call attention to workers' problems in Mechanical Turk and (b) to pressure Amazon to build an employer reputation system into it.
2.4. The system has two parts: a browser add-on and a web database application.
2.4.1. The web application lets workers review employers.
2.4.2. The browser add-on adds these reviews to the Mechanical Turk interface.
2.5. Here are some pictures.

[Slide 16.]

2.5.1. This is what Mechanical Turk looks like normally.

[Slide 17.]

2.5.2. This is what it looks like when you have Turkopticon.

[Slide 18.]

2.5.3. If you mouse over one of the arrows, you see this.
2.5.4. We have four scores for each employer:
2.5.4.1. "Communicativity": how well do they respond to worker communications?
2.5.4.2. "Generosity": how well do their tasks pay?
2.5.4.3. "Fairness": do they reject fairly, or do they reject without good reason?
2.5.4.4. "Promptness" how fast do they pay? Employers in Mechanical Turk have up to 30 days to pay. But workers prefer faster pay.

[Slide 19.]

2.5.5. If you click on the link for the number of reviews, you can see the individual reviews.

[Slide 20.]

2.5.6. And you can leave your own review.

[Slide 21.]

2.6. Here are some numbers.
2.6.1. We have about 20,000 users.
2.6.2. We have almost 100,000 reviews, covering almost 22,000 employers.
2.6.3. These have been posted by about 8,000 workers.
2.6.4. Almost 16% of the reviews have been posted in the last three months.
2.6.5. We have about 12,000 daily visits to the different parts of the service.
2.6.6. And we have reviews for most of the employers on Mechanical Turk.

[Slide 22.]

2.7. But did we achieve what we set out to do?
2.7.1. Not really.
2.7.2. We did call attention to workers' issues in Mechanical Turk.
2.7.3. But Amazon did not build an employer reputation system.
2.7.4. In fact, I was at an event where somebody asked an Amazon executive why there is no employer reputation system in Mechanical Turk. She said "the community handles the problems" with employer misbehavior.

I want to talk about this response for a minute.

For those of us in computing, who like things that are "self-organizing" or "user-generated", this is not surprising.

But last week I was in Berlin at a meeting of the engineering and IT division of IG Metall. IG Metall is the largest German trade union. This meeting was hosted at Siemens, the huge German manufacturing company. They make really big things, like trains and wind turbines and factories for making cars. IG Metall is interested in crowd work now because management in the companies their members work for is interested in using it as a way to be more flexible and cut costs for information work. The union's job is to defend their members' quality of working conditions and the viability of their livelihoods. They are worried crowd work is about to make that job harder than it already is.

I was trying to explain the model behind this answer -- "the community handles the problems" -- to the people at this meeting. So I said: Imagine if Siemens had a factory. And sometimes the machines in the factory would break down and a contractor had to be hired to fix them. But the company did not pay the contractor. Instead, the workers kept a pot of money, collected from their own paychecks, to pay the contractor. And somebody asked management, why doesn't the company pay for this out of their profits? And management said, well, the workers handle the problem.

Everybody laughed. In Germany that would never happen, because the trade unions have the inclination and the power to stop management from offloading costs and responsibilities onto workers and keeping the benefits for themselves. But in Mechanical Turk management can offload costs and responsibilities onto workers because workers have very little power.

In a way, Turkopticon has contributed to the persistence of this situation. Turkopticon, combined with other tools and forums made by workers, has helped make the situation less bad. Turkopticon is a free service to workers but it is also a free input for Amazon. We make the situation a little less bad, so it goes on longer like this without Amazon having to take responsibility.

2.7.5. In addition to the distributional issue of workers repairing the factory at their own expense, a built in reputation system would be much better for practical reasons. A built-in system could show objective data. It could let workers search or screen by employer statistics, like how often employers reject work. With Turkopticon everything is manual, data is only collected from a fraction of the worker population, there is no verification that reviewers have actually worked on the tasks they are reviewing, and the attributes are fairly subjective.

2.7.6. There are other practical problems -- organizational and technical problems -- with Turkopticon. I am perfectly happy to talk about how bad Turkopticon is in the discussion, but the details are not that interesting. Mostly they have to do with the facts that (a) our day jobs are to do research, not maintain or improve Turkopticon, and (b) we are not professional programmers, tech support people, user experience researchers, participatory design facilitators, or online community moderators.

2.7.7. There is also one thing we could not fix even if we could work on Turkopticon full time. This is that people do not trust each other on Mechanical Turk. This bleeds over into Turkopticon. Workers have different experiences with the same employer or even the same task, and often argue. Because of the climate of distrust -- and the lack of a verification system -- it is easy for workers to imagine all kinds of dark possibilities about the reviewer they disagree with. This person must be the employer, reviewing herself. Or, the employer must have paid this person to leave a good review. Or, in the case of a bad review they disagree with: this person must be trying to keep all the good tasks to himself.

[Slide 23.]

3. What do I think I have learned?

3.1. Lessons about "information infrastructure".
3.1.1. "User research" is essential, but complex methods are not. What is important is to listen, to respond to people's concerns, and to keep listening. I think Amazon gets away with not listening to workers' complaints about Mechanical Turk for four reasons. First, it is fairly unique. Second, workers who need the money need the money, and the issues are not so intolerable that workers quit in large numbers. Third, workers who don't need the money relate to the platform as an entertainment and aren't bothered that much. Fourth, the potential worker pool is very large. But if there was a competitor to Mechanical Turk that really addressed workers' issues, I think it could do well.
3.1.2. Good engineering is helpful, but not essential at first.
3.1.3. Maintenance is just as critical as the initial design.
3.1.4. Maintenance is both technical and social. Both parts are approximately equally important and approximately equally time consuming.

3.2. Lessons about markets, as institution-infrastructure systems.
3.2.1. Markets are not the universal entities neoclassical theory would have us believe they are. They are systems designed, built, and maintained by people. Like all systems, they can be "designed" by accident.
3.2.2. Most market participants are not the so-called rational -- that is, selfish -- short-term utility maximizers of neoclassical theory, who will do anything to maximize their own gain even at the cost of others' well-being. Rather, most participants want good outcomes for everyone. Most employers want to get work done at a reasonable price, and intend to pay the workers that produce usable work. Most workers take pride in producing work employers can use. They are happy to accept prices they consider "reasonable" given the circumstances. To put it shortly, most participants have what we might call good intentions. But not all of them. Some participants -- on both sides -- are out to scam everyone else -- to get free money, or free work.
3.2.3. The small fraction of selfish participants affects the market. You have to account for them in the design of the market or they will mess it up for everyone else.
3.2.4. No system can solve all problems. Human administrators are needed. This is an important point for programmers, because we love to automate things as much as possible. As the existence of the entire crowd work industry indicates, there is a limit to our abilities in this area.
3.2.5. To maintain trust, there should be a record of administrative judgments and explanations about why they were made. We have had bugs that have made people ask things like "Has Turkopticon sold out?" These were not even things that we did on purpose; they were accidents. And people got worried. We also did some things, early on, on purpose without talking to workers about it first, or explaining our motivations. We won't make that mistake again.
3.2.6. In a complex system, sometimes you cannot see the consequences of your actions. So even well-intentioned people can harm others by accident. Without appropriate channels for information and communication, this can happen over and over again.
3.2.7. As a special case of this, people need social cues if they are to treat each other like people. I was at an event in 2009 where someone asked an Amazon executive why Mechanical Turk does not have, for example, profiles for workers. "So requesters can't discriminate" was the answer. But they discriminate anyway -- by geolocating IP addresses or asking questions about cricket. I think the main result of the absence of social cues in Mechanical Turk is that it makes it easier for employers to forget that their workers are people, some of whom earn their livelihoods through crowd work. It makes it easier for employers and administrators to treat workers like the computers the system organizes them to imitate.

3.3. Lessons about Amazon.

Amazon is not all-powerful. In 2008 I really thought that after we made Turkopticon -- we made the first version in a weekend -- they would be so ashamed that two grad students could just throw this thing together that they would make a good one themselves. It would be a thousand times better than ours and have real data and workers would be able to screen and search by employers' rejection rates and pay speeds and all of this, all of which Amazon has, or could easily track... It didn't happen. None of it happened. Nothing has changed in terms of workers' ability to judge employers inside Mechanical Turk itself -- in five years. So, obviously they do not have infinite technical resources to work on this sort of thing, and it is not high on their to-do list.

3.3.2. Amazon may also be constrained by its existing business strategy and its shareholders' expectations.

Christiane Benner of IG Metall has coined, or at least told me about, an interesting term: "Amazonization". This reminded me of an article I read in the Financial Times about a very large Amazon warehouse in the UK. I will read a paragraph from this article:

> The "pickers" [in the warehouse] push trolleys around and pick out 
> customers' orders from the aisles. Amazon's software calculates the 
> most efficient walking route to collect all the items to fill a 
> trolley, and then simply directs the worker from one shelf space to 
> the next via instructions on the screen of the handheld satnav 
> device. Even with these efficient routes, there's a lot of walking. 
> One of the new Rugeley "pickers" lost [three kilos] in his first
> three shifts. "You're sort of like a robot, but in human form," said 
> the Amazon manager. "It's human automation, if you like."

For me, the term "Amazonization" is useful to denote the increased flexibility and financial cost savings associated with the strategy of treating workers like computers or robots. These benefits are achieved at the human cost of degraded working conditions.

3.3.3. But it may not be helpful, in the very practical effort to maintain tolerable working conditions and resist the erosion of viable livelihoods, to paint Amazon or any of their managers as bad, or even merely excessively rationally self-interested, individuals.

In my work I draw heavily on systems theory. Among other things, systems theory teaches us about the limits of individual agency. It teaches us to try to understand the role of an individual's -- or organization's -- environment and history in shaping their, or its, preferences, tendencies, and apparent choices.

To understand how "Amazonization" has become a successful business strategy, and why Mechanical Turk is not merely tolerated but celebrated in the computing community, we need to step back. We need to look into the intellectual histories of a wide range of disciplines, fields, and institutions, including computer science, artificial intelligence, neoclassical economics, business school curricula, and US corporate law. I won't actually do this here, although alluded already to the role of the simplistic models of human motivation developed in neoclassical economics. These models have come into computer science through artificial intelligence. But my practical point here is that trying to "pressure" Amazon may simply not work -- may not contribute to materially improving working conditions.

When I was a naive and optimistic young graduate student way back in 2008, Amazon seemed to me to have all the money and power in the world. But I see now that they have constraints: limited developer hours and dollars, business strategies to which they have already committed, and high and unforgiving shareholder expectations. I do not expect Amazon to make significant changes to Mechanical Turk to address workers' issues. I say this not because I think they don't know or don't care, but because I think they have less room to maneuver than I once assumed.

So to really address these issues, we may need to build a new market.

[Slide 24.]

4. Lessons from contemporary economics.

4.1. Building a new market that successfully addresses workers' issues while staying commercially viable will take more sophisticated theory -- and more sophisticated practice -- than we currently have in mainstream crowd work research. Specifically, it will require more empirically grounded ideas about human motivation and markets. Many theoretical resources are available on these topics. But I think the most natural and applicable for computer scientists, programmers, and technology industry managers are those consistent with game theory. Economic research has come a long way since the rule of the selfish short-term utility maximizers of noncooperative game theory. I want to talk about three threads in contemporary economics that I think can help us in the task of building future crowd work markets. These are market design, experimental economics, and institutional analysis and development. I am not an economist, or a specialist in any of these fields. My aim here is to point to their existence and raise your interest.

4.2. This is the turning point in the talk. Everything I've talked about so far has already happened. The rest, at least from the perspective of crowd work research, is about the future.

4.3. Market design is a field concerned, as you might expect, with the design of markets. This field studies "market failures" -- instances in which real-world markets did not work as well as neoclassical theory would have suggested -- and how to fix them.

4.3.1. One of the leading theorists and practitioners of market design is Alvin Roth, who received a recent Nobel Prize in Economics for this work. In 2008 he published a paper called "What have we learned from market design?" The paper begins with a quotation from an older paper.

[Slide 25.]

"...the real test of our success [as game theorists] will be not merely how well we understand the general principles that govern economic interactions, but how well we can bring this knowledge to bear on practical questions of microeconomic engineering..."

17 years later, he writes this:

[Slide 26.]

"Since [1991], economists have gained significant experience in practical market design. One thing we learn from this experience is that transactions and institutions matter at a level of detail that economists have not often had to deal with, and, in this respect, all markets are different. But there are also general lessons."

He summarizes these lessons as follows:

[Slide 27.]

"To work well, marketplaces have to provide thickness, i.e., they need to attract a large enough proportion of the potential participants in the market; they have to overcome the congestion that thickness can bring, by making it possible to consider enough alternative transactions to arrive at good ones; and they need to make it safe and sufficiently simple to participate [straightforwardly] in the market, as opposed to transacting outside of the market or having to engage in costly and risky strategic behavior."

Builders of web services are no stranger to the need for thickness, although we usually call it the need for a "critical mass". In the context of virtual markets, I read the need to overcome congestion as the need for good interaction design. We have a long way to go on this front in open crowd work markets, but it is not a surprising idea. The ideas that a market should be "safe" and simple, however, and that strategic behavior is undesirable, a sign of a problem with the design of the market, are, I think, new, perhaps even surprising, for us. But the claim that strategic behavior among participants -- that is, deception among participants -- is a bad sign resonates with my experience. I have seen how distrust in Mechanical Turk and Turkopticon takes its toll.

"Distrust" may sound like a fluffy word. But it can be defined in game theoretic terms. What are participants' estimates of each others' utility functions and strategies? Do participants imagine each other to be selfish utility maximizers who will defect after promising to cooperate? If I think you are lying when you say you will cooperate, I am less likely to cooperate. If the market is full of selfish, non-cooperating utility maximizers who refuse to communicate their true intentions because they do not think it safe to do so, a costly arms race ensues. Actors spend time and resources trying to out-strategize each other.

On this topic, consider how much research has been done in the human computation community on "quality control" for exmployers. If the market had been designed differently -- I'll say it, designed *better* -- in the first place, maybe we wouldn't have needed it. If there was no strategic behavior in Mechanical Turk, we wouldn't have needed to build Turkopticon.

[Slide 28.]

4.4. Experimental economics is an approach in contemporary economics in which economic concepts are investigated in laboratory and field experiments. Much of this work investigates social dilemmas, and is relevant to us.

[Slide 29.]

4.4.1. One entry point into this literature is the 1999 paper "A theory of fairness, competition, and cooperation" by the experimental economists Ernst Fehr and Klaus Schmidt.

[Slide 30.]

They draw on the literature in social psychology and sociology, in which, they write, it is well established that "relative [and not only absolute] material payoffs affect people's well-being and behavior". More recent empirical and experimental economic research also supports this view.

[Slide 31.]

Fehr and Schmidt model fairness as "self-centered inequity aversion."

"Inequity aversion means that people resist inequitable outcomes; that is, they are willing to give up some material payoff to move in the direction of more equitable outcomes. Inequity aversion is self-centered if people do not care per se about inequity that exists among other people but are only interested in the fairness of their own material payoff relative to the payoff of others."

This model may be simpler than necessary. For example, it seems unlikely that *all* inequity aversion is self-centered. But Fehr and Schmidt are able to explain a broad range of experimental results with this interpretation that are inexplicable under what they call the standard economic model, which assumes no inequity aversion. More sophisticated models do exist and are likely to be of interest in future crowd work research. But I view even a purely self-centered model of inequity aversion as an advance over the current assumption, widespread if mostly implicit in human computation, of no inequity aversion at all.

[Slide 32.]

4.4.2. Fehr and Schmidt consider economic games with n players, indexed on i, with monetary payoffs denoted by x_i.

The utility function of selfish players is simply their monetary payoffs. More money for me equals more happiness for me. Other people's payoffs -- or happiness -- don't matter. If my payoff increases at the cost of someone else's I am happier.

The utility function of inequity averse players is different. It is their payoff minus two terms. The first is the amount of happiness they lose from disadvantageous inequality -- that is, from other people having bigger payoffs than them. The second is the amount of happiness they lose from advantageous inequality -- from having more than other people.

Advantageous inequality is assumed to be weighted less heavily than disadvantageous inequality. It could even be zero.

[Slide 33.]

If you want to actually calculate A and B, here they are. Alpha and beta reflect preferences about the different types of inequality. Beta is assumed to be less than or equal to alpha, at least zero, and less than one.

4.4.3. Fehr and Schmidt interpret fairness rather simply: as *equality of outcomes between participants*. This is fine for experimental games. But how does it apply to real markets, if at all?

We could say that a fair interaction in a market game is one in which the surplus created by the interaction -- the "gains from trade" -- is allocated equally among the participants.

Alternatively, we could say that a fair interaction is one in which the gains from trade are distributed according to the contributions of the participants. The contributions are not necessarily likely to be equal. This inequality may be due to participants' goals or strategies. Or they may be due to unequal endowments of capital or capabilities the predate their entering the market.

These definitions have different ethical and political commitments and implications. If we use them as tools for thinking, together they can help us think deeply about ethics in crowd work markets.

[Slide 34.]

4.5. Institutional analysis and development is not really a field in economics. It is a distinct, multi-disciplinary, multi-methodological approach to studying institutions, including but not limited to markets. It was developed over fifty years by Elinor Ostrom, based at the Workshop in Political Theory and Policy Analysis at Indiana University, and many collaborators. Elinor and Vincent Ostrom, co-founders and co-directors of the Workshop, were political theorists by training. But they published widely and prolifically across the social sciences. In 2009 Elinor Ostrom was awarded the Nobel Prize in economics, despite being unknown to many economists at the time. (This situation has of course been rectified.)

[Slide 35.]

I point here to two of her books and one paper. The books are the now-classic _Governing the Commons_, published in 1990, and _Understanding Institutional Diversity_, published in 2005. The paper, "Beyond markets and states: polycentric governance in complex economic systems", is based on her Nobel acceptance speech. You may want to start with the paper, as it is both the most recent and the shortest. But I think all three texts -- and the IAD framework generally -- are of great potential benefit for crowd work research and practice. In her work, Ostrom has been especially interested in studying "diverse institutional arrangements for governing common-pool resources". To the extent that a market is built for the benefit of all participants, not just some, it can be considered a kind of "common-pool resource".

4.5.1. Importantly and usefully for crowd work researchers interested in market design and experimental economics, the IAD framework was developed from the early days to be consistent with classical game theory. I would suggest, in fact, that it *extends* game theory.

Two empirical findings from IAD research may be of interest to researchers trained in noncooperative game theory.

[Slide 36.]

First, communication is valuable, even when participants cannot enforce informal agreements. So-called "cheap talk," Ostrom writes, "enables participants to reduce overharvesting [of common-pool resources] and increase joint payoffs contrary to [classical] game theoretical predictions". This means that not all participants immediately go back on their word when they see profit in doing so. This is, again, basically because not all participants are selfish short-term utility maximizers. But the value of so-called cheap talk is an finding with clear implications for designers of future crowd work markets. Workers should be able to talk to each other, and to employers.

[Slide 37.]

Second, "large studies of irrigation systems in Nepal and forests around the world challenge the presumption that governments always do a better job than users in organizing and protecting important resources." That is, to quote the subtitle of a paper she published in 1992, "self-governance is possible". Not all agreements require external enforcement. In fact, the ability of external enforcement to improve outcomes may be limited.

[Slide 38.]

4.5.2. This paragraph gives a sense of the scope and spirit of IAD research.

"Currently, many scholars are undertaking new theoretical efforts. A core effort is developing a more general theory of individual choice that recognizes the central role of trust in coping with social dilemmas. Over time, a clear set of findings from the microsituational level have emerged regarding structural factors affecting the likelihood of increased cooperation. Due to the complexity of broader field settings, one needs to develop more configural approaches to the study of factors that enhance or detract from the emergence and robustness of self-organized efforts within multilevel, polycentric systems. Further, the application of empirical studies to the policy world leads one to stress the importance of fitting institutional rules to a specific social-ecological setting. [Here by 'ecological' she means 'biophysical'.] 'One size fits all' policies are not effective. The frameworks and empirical work that many scholars have undertaken in recent decades provide a better foundation for policy analysis."

4.5.3. Like experimental economics, IAD developed party in response to the inadequacy of the older rational-actor model of economic interaction. IAD work has developed more sophisticated models of the individual and a multi-level framework for modeling institutions and systems of institutions in their biophysical contexts.

Despite this expanded context, the flavor of the IAD approach should be familiar to anyone with experience with game theory.

[Slide 39.]

Ostrom offers these instructions for specifying a game -- or a market or institution -- and predicting its dynamics. (I won't read them all.)

[Slide 40.]

She also describes types of rules affecting an "action situation".

I won't read these either. My point is that there is a huge and rich body of theory, grounded in decades of painstaking empirical work and conceptual synthesis, available to guide our practical work.

[Slide 41.]

5. Open questions.

I will keep the rest short.

There are a lot of open questions about the dynamics and governance of crowd work.

[Slide 42.]

A few of these relationships have been explored, but not many. I will come back to my three categories of question. How do design variables affect outcomes? How do outcome variables affect each other? And how do governance mechanisms help systems change their designs in response to outcomes?

[Slide 43.]

What about the role of participants' preferences in shaping governance?

[Slide 44.]

Where do participants' preferences come from? How are they affected by what they perceive to be possible or "reasonable" given current outcomes? What exogenous factors shape them, and how?

I have not enumerated the design and outcome variables again here. But you may recall that each list filled a slide. Well, you can mix and match them as you like to form open questions about the dynamics and governance and crowd work markets. Most of these have not been answered. Many have not even been asked. And we certainly have no coherent theoretical framework with which to make sense of all of this.

6. Next steps.

Now what?

[Slide 45.]

Here is a slide about some ways we could improve Turkopticon.

But really, in the future I would like Turkopticon to become unnecessary. The only way I see that happening is if Mechanical Turk is replaced. It could be replaced by many small, specialized markets; by another general market; or by something even more general, like a crowd work market protocol.

[Slide 46.]

But none of these will happen overnight.

One possible project I've been thinking about lately is making an alternative, stand-alone interface to Mechanical Turk. It could scrape Mechanical Turk and put tasks data into a database on a separate server. (As far as I know, this is not against Mechanical Turk's terms of service.) This data could be made available through an API. A new user interface could let workers search and screen tasks according to many different criteria. This new interface could also begin to support a transition to worker self-governance, by including functionality that allows stakeholders of all types to propose, discuss, vote on, and collectively prioritize new features and changes to the system.

This system and these practices could lay the technical, social, and organizational groundwork for the construction of a future crowd work market managed by stakeholders.

[Slide 47.]

In the meantime, there are other ways to build theory. Agent-based models and participatory simulations can help us explore quite rigorously the relationships between design and outcome variables -- and even, to a certain extent, investigate different governance strategies.

[Slide 48.]

Participatory simulation, in which actual humans, not computational agents, play the roles of market actors, would be more appropriate for investigating governance strategies. But even recruiting through Mechanical Turk, the cost for running such simulations could be quite high.

[Slide 49.]

I would like to hope that Turkopticon will not be needed for another five years.

Maybe in another five years Amazon will have added an employer reputation system to Mechanical Turk. Or maybe Mechanical Turk will be outcompeted by many small, specialized markets. Or maybe we will build a new crowd work market under stakeholder governance. Maybe, as in the good German tradition of worker co-determination and the seemingly now-forgotten tradition of participatory design, workers, employers, administrators, and researchers will cooperate to build and maintain a market whose operation benefits all -- if not exactly equally, at least reasonably so under the circumstances.

Or maybe the future holds something even stranger and more general, like a crowd work market protocol, allowing many different markets operated by many different administrators to communicate with one another, forming a vast but differentiated network of markets.

In any case I hope we can build a community of practice to address these new challenges -- and opportunities -- together.

[Slide 50.]

There are certainly plenty of tasks to go around.