Wisdom of crowds

Author: Dr Simon Moss

Overview

Individuals often attempt to predict or forecast the future. They might, for example, attempt to predict which team will win a sporting contest. They might want to predict who will win an election, which movie will win the Academy Award, and so forth. Generally, when the forecasts of many individuals are averaged or aggregated, the predictions are more likely to be accurate (e.g., Hogarth, 1978;; Johnson, Budescu, & Wallsten, 2001;; Wolfers & Zitzewitz, 2004), which Surowiecki (2004) called the wisdom of crowds.

Surowiecki (2004), for example, alludes to an anecdote, recounted by Francis Galton, at a county fair. Individuals at the fair were asked to guess the weight of an ox. The conjectures themselves tended to be inaccurate, but the average of these estimates was very close to th actual weight. Indeed, the average of these estimates was more accurate than was the conjectures of experts.

In most of the examples presented by Surowiecki (2004), numerous estimates were aggregated to optimize forecasts. Some researchers, however, have shown that only a few (Hogarth, 1978;; Johnson, Budescu, & Wallsten, 2001), or even two (Herzog & Hertwig, 2009) estimates need to be aggregated to improve forecasts or judgments.

Example: Aggregated first impressions

In general, the first impressions or immediate intuitions individuals about other people are inaccurate. The first impressions of recruiters during job interviews, for example, do not tend to correlate strongly with subsequent performance. Yet, the first impressions of many independent recruiters, when aggregated, do correlate strongly with performance, as shown by Eisenkraft (2013).

In this study, 41 individuals, recruited from Amazon Mechanical Turk, watched 34 silent videos, each depicting a different student. Participants were asked to predict the grade point average of these students. In general, for each participant, the correlation between predicted and actual ratings was low. However, when the predictions of all participants were averaged for each student, the combined predictions were strongly related to the actual grade point average, r = .37.

Conditions that facilitate wisdom in crowds

Surowiecki (2004) also identified the conditions that are essential to ensure that crowds do indeed reach more accurate decisions than individuals. For example, each individual should be able to access private information--that is, knowledge or perspectives that are not shared by everyone else. Even a unique interpretation of some observation can represent private information. In other words, the predictions must be diverse. Similarly, individuals should be able to specialize, invoking knowledge on specific facets.

Other conditions also need to be satisfied. For example, each individual should form an independent estimate. That is, their own estimate or forecast should not be appreciably shaped by the predictions of anyone else. Finally, some mechanism is needed to aggregate these estimates together.

If these conditions are not fulfilled, crowds can generate unsuitable and inaccurate estimates. Indeed, Surowiecki (2004) characterizes various factors or contexts that preclude these conditions and thus undermine decisions in crowds. To illustrate, the need to belong can manifest as a herd mentality or peer pressure, which compromises the independence of estimates. Alternatively, some decisions are too centralized, which compromises the diversity of opinions.

Other factors can also obstruct these conditions. Sometimes, the estimates of a few individuals are too public, which biases the estimates of other individuals, compromising independence. Finally, sometimes various individuals or bodies do not collaborate sufficiently, and hence no mechanism is available to aggregate the estimates.

Conditions that overcome biases in individuals

One of the limitations of this principle that crowds are wise is that individual judgments are often biased systematically. For example, people tend to overestimate the likelihood of desirable events. If individuals are systematically biased, crowds might be systematically biased as well.

Nevertheless, proponents of this wisdom of crowds have proposed a variety of counterarguments (for a review of these arguments, see Simmons, Nelson, Galak, & Frederick, 2011). For example, they maintain that some of these biases nullify each other in diverse crowds. To illustrate, although individuals might overestimate the likelihood of desirable events, they tend to desire different events to one another. This bias, therefore, diminishes when the aggregated predictions of collectives are examined (Camerer, 1998).

Evidence of systematic errors

As Simmons, Nelson, Galak, and Frederick (2011) showed, even when the conditions that enhance the wisdom of crowds are fulfilled, systematic errors may still emerge. This study examined the accuracy of gambling on sporting results--specifically, the NRL.

The study attempted to fulfill all the conditions that facilitate the wisdom of crowds. First, all the participants were very knowledgeable: only people who maintain they follow the season very closely were included in the sample, curbing random error. Second, they were motivated to be accurate, because they were betting for real money. Third, the participants were independent of one another. They did not discuss the results with each other and, thus, were not as likely to be susceptible to the same misguided beliefs. Finally, the participants represented diverse backgrounds, increasing the likelihood their judgments were affected by independent sources.

Although all these conditions were fulfilled, Simmons, Nelson, Galak, and Frederick (2011) showed a systematic bias in the predictions of these participants. In particular, these researchers examined point spread betting. In this context, people do not merely bet on which team will win. They instead bet on whether the team that is considered stronger, perhaps the Baltimore Ravens, will defeat the team that is considered weaker, perhaps the Washington Redskins, by more than a specific margin, such 10 points. This margin is derived by bookmakers, intended to ensure that about 50% of people will predict the stronger team will exceed this margin.

When people predict these point spreads, however, they tend to choose the stronger team too often. That is, on average, the stronger team exceeds this margin about 50% of the time. However, people bet on this stronger team around 90% of the time. The crowd is not wise but biased towards the stronger team.

Indeed, even when people were informed this official margin was an overestimate of the likely margin, this bias persisted, almost unabated. That is, information intended to curb this bias was unhelpful. A slightly different method, however--in which participants needed to estimate the point margin--did not produce such a pronounced bias.

Even more surprisingly, this bias increased over the season. Conceivably, when the participants are correct, their intuition that stronger teams tend to exceed the margin is reinforced. They feel their intuitions are accurate. In contrast, when the participants are incorrect, they feel that some isolated event could explain this unexpected result. Their trust in their intuition, therefore, may remain unmitigated.

Wisdom of select crowds

Some researchers have uncovered an approach that overrides the problems that unfold when the average of a crowd is calculated (Soll, Larrick, & Mannes, 2014). This approach weighs the opinions of some individuals more than other individuals, depending on the accuracy of their judgments in the past. Indeed, in simulations studies, Soll, Larrick, and Mannes (2014) show the average of estimates from the top five judges, as measured by the accuracy of their last judgment, was more accurate than averages of all judges or the most knowledgeable judge. In addition, as these researchers show, individuals tend to perceive this strategy as more palatable than alternative approaches.

Practical applications

The wisdom of crowds might underpin the benefits and merits of the Delphi method. To apply this method, a panel of independent experts is formed. The experts first answer a series of questions. A facilitator then summarizes these answers and the underlying reasons that were provided, maintaining confidentiality. Next, the experts revise their previous answers after hearing this summary. Over time, the answers converge to an optimal response.

Dialectical bootstrapping: The wisdom of many estimates in individuals

Empirical evidence

Herzog and Hertwig (2009) showed how individuals can, alone, forecast and aggregate multiple estimates to optimize judgments and decisions. In their study, participants were instructed to estimate the year in which various historical events unfolded, such as specific wars (for a similar method, see Soll & Klayman, 2004;; Yaniv & Milyavsky, 2007).

Next, some participants were asked to reflect upon some of their assumptions or considerations that could have been incorrect and might have biased their estimate. These participants were then instructed to consider whether this initial estimate was too high or low--and then to present an updated estimate. This exercise mirrors the technique called consider the opposite (e.g., Lord, Lepper, & Preston, 1984;; for similar methods, see Hirt & Markman, 1995;; Koriat, Lichtenstein, & Fischhoff, 1980).

In contrast, the remaining participants did not undergo this exercise. They were merely informed to present a second estimate. In other words, all participants expressed two estimates& only some participants, however, were encouraged to challenge the first estimate.

The results were very informative. If participants had been encouraged to challenge their first response, the average of their two estimates was, generally, closer to the correct answer than was any individual estimate. If participants had not been encouraged to challenge their first response, the average of their two estimates was, generally, no closer to the correct answer than was any individual estimate (Herzog & Hertwig, 2009).

These findings indicate that individuals should attempt to form multiple estimates or judgments, each derived from slightly different sources of knowledge or information, and then average these responses. This exercise will tend to improve their judgments. Nevertheless, Herzog and Hertwig (2009) did show this process, called dialectical bootstrapping, was not as effective as averaging two responses from two different individuals.

In an additional study, Herzog and Hertwig (2009) examined whether incentives affected the findings. That is, in the original study, participants who challenged their first estimate were told they will win a prize if either of their estimates was sufficiently accurate. Participants who did not challenge their first estimate were told they will win a prize if one of their estimates, randomly selected, was accurate. However, even if the incentive scheme was equated in the two conditions, the original pattern of results persisted.

Origin of the phrase dialectical bootstrapping

The phrase dialectical bootstrapping is derived from two sources (Herzog & Hertwig, 2009). Specifically, the term dialectical refers to the process of development, characterized by Hegel. This process involves a first estimate, called a thesis, a different or dialectical estimate, called an antithesis, and the aggregation of these estimates, called aggregation.

Bootstrapping, a term often used in statistics, refers to Baron Munchhausen, who putatively escaped a swamp by pulling himself up by his own bootstraps. In other words, bootstrapping alludes to using one part of the self to help the self.

Rationale for dialectical bootstrapping

According to Herzog and Hertwig (2009), dialectical bootstrapping is effective because, at least sometimes, the two estimates surround the correct answer. That is, one estimate is too low and one estimate is too high. In these instances, the average of these estimates will obviously be closer to the correct response.

Admittedly, on other occasions, the two estimates will not surround the correct answer. Both estimates might be too low or both estimates might be too high. The aggregate of these estimates, on average, is equivalent in accuracy to the expected value of each estimate alone. In other words, in this context, dialectical bootstrapping does not, in general, affect the accuracy of estimates.

In practice, the two estimates will sometimes surround the correct response--in which case dialectical bootstrapping is helpful--and the two estimates will sometimes not surround the correct response--in which case dialectical bootstrapping is neither helpful nor unhelpful. Taken together, however, dialectical bootstrapping is, on average, helpful.

Rather than consider the limitations of previous estimates, Herzog and Hertwig (2009) discussed some other approaches that can be applied to generate other estimates. Individuals, for example, could increase the delay between the two estimates--which Vul and Pashler (2008) showed also enhances the final decision when these estimates are averaged.

References

Ariely, D., Au, W. T., Bender, R. H., Budescu, D. V., Dietz, C., Gu, H., et al. (2000). The effects of averaging subjective probability estimates between and within judges. Journal of Experimental Psychology: Applied, 6, 130-147.

Armstrong, J. S. (2001). Combining forecasts. In J.S. Armstrong (Ed.), Principles of forecasting: A handbook for researchers and practitioners (pp. 417-439). Norwell, MA: Kluwer Academic.

Camerer, C. F. (1998). Can asset markets be manipulated? A field experiment with racetrack betting. Journal of Political Economy, 106, 457-482.

Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5, 559-583.

Einhorn, H. J., Hogarth, R. M., & Klempner, E. (1977). Quality of group judgment. Psychological Bulletin, 84, 158-172.

Eisenkraft, N. (2013). Accurate by way of aggregation. Journal of Experimental Social Psychology, 49, 277-279. doi:10.1016/j.jesp.2012.11.005

Gigone, D., & Hastie, R. (1997). Proper analysis of the accuracy of group judgments. Psychological Bulletin, 121, 149-167.

Herzog, S. M., & Hertwig, R. (2009). The wisdom of many in one mind: Improving individual judgments with dialectical bootstrapping. Psychological Science, 20, 231-237.

Hirt, E. R., & Markman, K. D. (1995). Multiple explanation: A consider-an-alternative strategy for debiasing judgments. Journal of Personality and Social Psychology, 69, 1069-1086.

Hogarth, R. M. (1978). A note on aggregating opinions. Organizational Behavior and Human Performance, 21, 40-46.

Johnson, T. R., Budescu, D. V., & Wallsten, T. S. (2001). Averaging probability judgments: Monte Carlo analyses of asymptotic diagnostic value. Journal of Behavioral Decision Making, 14, 123-140.

Koriat, A., Lichtenstein, S., & Fischhoff, B. (1980). Reasons for confidence. Journal of Experimental Psychology: Human Learning and Memory, 6, 107-118.

Larrick, R. P. (2004). Debiasing. In D. Koehler & N. Harvey (Eds.), Blackwell handbook of judgment and decision making (pp. 316-337). Oxford, England: Blackwell.

Larrick, R. P., & Soll, J. B. (2006). Intuitions about combining opinions: Misappreciation of the averaging principle. Management Science, 52, 111-127.

Lord, C. G., Lepper, M. R., & Preston, E. (1984). Considering the opposite: A corrective strategy for social judgment. Journal of Personality and Social Psychology, 47, 1231-1243.

Simmons, J. P., Nelson, L. D., Galak, J., & Frederick, S. (2011). Intuitive biases in choice versus estimation: Implications for the wisdom of crowds. Journal of Consumer Research, 38, 1-15. doi: 10.1086/658070

Soll, J. B. (1999). Intuitive theories of information: Beliefs about the value of redundancy. Cognitive Psychology, 38, 317-346.

Soll, J. B., & Klayman, J. (2004). Overconfidence in interval estimates. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 299-314.

Soll, J. B., Larrick, R. P., & Mannes, A. E. (2014). The wisdom of select crowds. Journal of Personality and Social Psychology, 107, 276-299. doi :10.1037/a0036677

Stewart, T. R. (2001). Improving reliability of judgmental forecasts. In J.S. Armstrong (Ed.), Principles of forecasting: A handbook for researchers and practitioners (pp. 81-106). Norwell, MA: Kluwer Academic.

Surowiecki, J. (2004). The wisdom of crowds. New York: Doubleday.

Timmermann, A. (2006). Forecast combinations. In G. Elliott, C. Granger, & A. Timmermann (Eds.), Handbook of economic forecasting (pp. 135-196). Amsterdam: North Holland.

Vul, E., & Pashler, H. (2008). Measuring the crowd within: Probabilistic representations within individuals. Psychological Science, 19, 645-647.

Wolfers, J., & Zitzewitz, E. (2004). Prediction markets. Journal of Economic Perspectives, 18, 107-126.

Yaniv, I. (2004). The benefit of additional opinions. Current Directions in Psychological Science, 13, 75-78.

Yaniv, I., & Milyavsky, M. (2007). Using advice from multiple sources to revise and improve judgments. Organizational Behavior and Human Decision Processes, 103, 104-120.

Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.