3205 SH-DH
3620 Locust Walk
Philadelphia, PA 19104
Research Interests: Forecasting and decision processes
Links: CV, Personal Website, Department of Psychology
Ph.D. Yale University, 1979;
B.A. University of British Columbia, 1975;
2011 -present Leonore Annenberg University Professor, School of Arts and Sciences (Psychology) and Wharton School (Management), University of Pennsylvania;
2002- 2010 Mitchell Endowed Professorship, Haas School of Business, University of California Berkeley;
2005-2006 Russell Sage Scholar;
1996-2001 Harold Burtt Professor of Psychology and Political Science, The Ohio State University;
1993-1994 Fellow, Center for Advanced Study in the Behavioral Sciences, Stanford;
1993-1995 Distinguished Professor, University of California, Berkeley;
1988-1995 Director, Institute of Personality and Social Research, University of California, Berkeley;
1987-1996 Professor, Department of Psychology, University of California, Berkeley;
1984-1987 Associate Professor, Department of Psychology, University of California, Berkeley;
1980-1995 Research Psychologist, Survey Research Center, University of California, Berkeley;
1979-1984 Assistant Professor, Department of Psychology, University of California, Berkeley;
Group Chair, Organizational Behavior and Industrial Relations, Haas School of Business, University of California, Berkeley, 2002-present;
Associate Dean for Academic Affairs, Haas School of Business, University of California, Berkeley, 2003-2004;
Director, Ph.D. programs, Haas School of Business, University of California, Berkeley;
Director, Institute of Personality Assessment and Research (renamed in 1992 as Institute of Personality and Social Research), University of California, Berkeley, 1988-1995.
Josh Rosenberg, Ezra Karger, Zach Jacobs, Molly Hickman, Avital Morris, Harrison Durland, Otto Kuusela, Philip Tetlock (2025), Belief updating in AI-risk debates: Exploring the limits of adversarial collaboration, Risk Analysis.
Abstract: We organized adversarial collaborations between subject-matter experts and expert forecasters with opposing views on whether recent advances in Artificial Intelligence (AI) pose an existential threat to humanity in the 21st century. Two studies incentivized participants to engage in respectful perspective-taking, to share their strongest arguments, and to propose early-warning indicator questions (cruxes) for the probability of an AI-related catastrophe by 2100. AI experts saw greater threats from AI than did expert forecasters, and neither group changed its long-term risk estimates, but they did preregister cruxes whose resolution by 2030 would sway their views on long-term risk. These persistent differences shrank as questioning moved across centuries, from 2100 to 2500 and beyond, by which time both groups put the risk of extreme negative outcomes from AI at 30%–40%. Future research should address the generalizability of these results beyond our sample to alternative samples of experts, and beyond the topic area of AI to other questions and time frames.
Ezra Karger, Josh Rosenberg, Zachary Jacobs, Molly Hickman, Philip Tetlock (2025), Subjective-probability forecasts of existential risk: Initial results from a hybrid persuasion-forecasting tournament, International Journal of Forecasting, 41 (2), pp. 499-516.
Abstract: A multi-stage persuasion-forecasting tournament asked specialists and generalists (“superforecasters”) to explain their probability judgments of short- and long-run existential threats to humanity. Specialists were more pessimistic, especially on long-run threats posed by artificial intelligence (AI). Despite incentives to share their best arguments during four months of discussion, neither side materially moved the other’s views. This would be puzzling if participants were Bayesian agents methodically sifting through elusive clues about distant futures but it is less puzzling if participants were boundedly rational agents searching for confirmatory evidence as the risks of embarrassing accuracy feedback receded. Consistent with the latter mechanism, strong AI-risk proponents made particularly extreme long- but not short-range forecasts and over-estimated the long-range AI-risk forecasts of others. We stress the potential of these methods to inform high-stakes debates, but we acknowledge limits on what even skilled forecasters can achieve in anticipating rare or unprecedented events.
Gregory Mitchell and Philip Tetlock (2025), Psychological Elitism, Theory and Society.
Abstract: Elites can be differentiated from non-elites by their status-enhancing attributes: their accomplishments, expertise, and group memberships. Elitism is the belief that elites deserve epistemic deference because they better understand the workings of the world. Psychological elitism posits the existence of a class of elites who possess specialized knowledge of subconscious (motivational and cognitive) drivers of human judgment that is beyond the ken of non-elites. This article challenges whether psychological elites deserve deference. The central problem is the elusiveness of ground-truth standards for determining the true drivers of judgments. To warrant deference, psychological elites must demonstrate that their reasoning operates free of the same subconscious distortions ascribed to non-elites. Absent such demonstrations, it is fair game-under the very theories that psychological elites endorse-to question the competence of psychological elites to second-guess the true reasons underlying the views of non-elites.
Shauna Bowes, Cory Clark, Lucian Gideon Conway III, Thomas Costello, Danny Osborne, Philip Tetlock, Jan-Willem van Prooijen (2025), An adversarial collaboration on the rigidity-of-the-right, symmetry thesis, or rigidity-of-extremes: The answer depends on the question, Political Psychology.
Abstract: In an adversarial collaboration, two preregistered U.S.-based studies (total N = 6181) tested three hypotheses regarding the relationship between political ideology and belief rigidity (operationalized as less evidence-based belief updating): rigidity-of-the-right, symmetry, and rigidity-of-extremes. Across both studies, general and social conservatism were weakly associated with rigidity (|b| ~ .05), and conservatives were more rigid than liberals (Cohen's d ~ .05). Rigidity generally had null associations with economic conservatism, as well as social and economic political attitudes. Moreover, general extremism (but neither social nor economic extremism) predicted rigidity in Study 1, and all three extremism measures predicted rigidity in Study 2 (average |bs| ~ .07). Extreme rightists were more rigid than extreme leftists in 60% of the significant quadratic relationships. Given these very small and semi-consistent effects, broad claims about strong associations between ideology and belief updating are likely unwarranted. Rather, psychologists should turn their focus to examining the contexts where ideology strongly correlates with rigidity.
Cory Clark, Nicholas Kerry, Maja Graso, Philip Tetlock (2025), Morally offensive scientific findings activate cognitive chicanery, Annals of NY Academy of Science, 15 (52), pp. 148-164.
Abstract: We document a mutually reinforcing set of belief-system defenses—cognitive chicanery—that transform “morally wrong” scientific claims into “empirically wrong” claims. Five experiments (four preregistered, N = 7040) show that when participants read identical abstracts that varied only in the sociomoral desirability of the conclusions, morally offended participants were likelier to (1) dismiss the writing as incomprehensible (motivated confusion); (2) deny the empirical status of the research question (motivated postmodernism); (3) endorse claims inspired by Schopenhauer's stratagems (The Art of Being Right) and the Central Intelligence Agency's (CIA's) strategies for citizen-saboteurs; and (4) endorse a set of contradictory complaints, including that sample sizes are too small and that anecdotes are more informative than data, that the researchers are both unintelligent and crafty manipulators, and that the findings are both preposterous and old news. These patterns are consistent with motivated cognition, in which individuals seize on easy strategies for neutralizing disturbing knowledge claims, minimizing the need to update beliefs. All strategies were activated at once, in a sort of belief-system “overkill” that ensures avoidance of unfortunate epistemic discoveries. Future research should expand on this set of strategies and explore how their deployment may undermine the pursuit of knowledge.
Calvin Isch, Cory Clark, Philip Tetlock (2025), Reflections on adversarial collaboration from the adversaries: Was it worth it?, Theory and Society .
Abstract: There is much enthusiasm, in principle, for adversarial collaborations (ACs), a scientific conflict resolution technique that encourages investigators with clashing models to collaborate in designing studies that test competing predictions. Adversarial collaborations offer the promise of breaking deadlocked debates, resolving disputes, and providing a deeper, more comprehensive understanding of a research domain. In practice, however, adversarial collaborations are more the exception than the rule, and there is almost no evidence on how scholars who have ventured into ACs assess the experience. To understand these perspectives, we surveyed and interviewed 29 scholars who participated in 13 AC projects. The data revealed that interpersonal conflicts were generally minor, that these projects required more upfront effort than typical collaborations, but benefited from high-quality results and more thoughtful post-publication debates. Rather than producing a clear “winner,” the most common outcome was a deeper understanding of the problem space through the integration of opposing perspectives. Although the generalizability of these findings is limited by a sample consisting only of scholars who completed an AC, they nonetheless highlight the value of ACs as a tool for advancing scientific inquiry and offer practical guidance for scholars and journals exploring this approach.
Pavel Atanasov, Ezra Karger, Philip Tetlock (2025), Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters, Management Science.
Philip Tetlock, Philipp Schoenegger, Peter Park, Ezra Karger, Sean Trott (2025), AI-augmented predictions: LLM assistants improve human forecasting accuracy, ACM Transactions on Interactive Intelligent Systems, 15 (1), pp. 1-25.
Abstract: Large language models (LLMs) match and sometimes exceed human performance in many domains. This study explores the potential of LLMs to augment human judgment in a forecasting task. We evaluate the effect on human forecasters of two LLM assistants: one designed to provide high-quality (“superforecasting”) advice, and the other designed to be overconfident and base-rate neglecting, thus providing noisy forecasting advice. We compare participants using these assistants to a control group that received a less advanced model that did not provide numerical predictions or engage in explicit discussion of predictions. Participants (N =991) answered a set of six forecasting questions and had the option to consult their assigned LLM assistant throughout. Our preregistered analyses show that interacting with each of our frontier LLM assistants significantly enhances prediction accuracy by between 24% and 28% compared to the control group. Exploratory analyses showed a pronounced outlier effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 41%, compared with 29% for the noisy assistant. We further examine whether LLM forecasting augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our data do not consistently support these hypotheses. Our results suggest that access to a frontier LLM assistant, even a noisy one, can be a helpful decision aid in cognitively demanding tasks compared to a less powerful model that does not provide specific forecasting advice. However, the effects of outliers suggest that further research into the robustness of this pattern is needed.
Ezra Karger, Houtan Bastani, Chen Yueh-Han, Zachary Jacobs, Danny Halawi, Fred Zhang, Philip Tetlock (2025), Forecastbench: A Dynamic Benchmark of AI Forecasting Capabilities, International Conference on Learning Capabilities, Singapore.
Abstract: Forecasts of future events are essential inputs into informed decision-making. Machine learning (ML) systems have the potential to deliver forecasts at scale, but there is no framework for evaluating the accuracy of ML systems on a standardized set of forecasting questions. To address this gap, we introduce ForecastBench: a dynamic benchmark that evaluates the accuracy of ML systems on an automatically generated and regularly updated set of 1,000 forecasting questions. To avoid any possibility of data leakage, ForecastBench is comprised solely of questions about future events that have no known answer at the time of submission. We quantify the capabilities of current ML systems by collecting forecasts from expert (human) forecasters, the general public, and LLMs on a random subset of questions from the benchmark (N = 200). While LLMs have achieved super-human performance on many benchmarks, they perform less well here: expert forecasters outperform the top-performing LLM (p-value < 0.001). We display system and human scores in a public leaderboard at www.forecastbench.org.
David James Gill, Marc Trachtenberg, Michael J Gill, Philip Tetlock, Thomas Robb, Michael Varnum, Cendri Hutcherson, Igor Grossmann, Zoe Trodd (2025), Predicting the Past: Testing Expert Historical Judgement, American Historical Review, 130 (4), pp. 1615-1630.
Abstract: Absences pervade the historical record. The loss or destruction of material, redaction of documents, silence of participants, data embargoes, and poor record keeping present inherent difficulties to any understanding of the past. Gaps in the historical record pose significant challenges but they can also provide valuable opportunities. Instead of avoiding all gaps, historians can test the accuracy of some inferences about the past by carefully outlining their assumptions and explicitly predicting what they believe to have occurred in the absence of evidence. Subsequent discoveries or declassifications can then be used to assess the accuracy of these hypothesized explanations and, in turn, help us to evaluate the quality of historians’ thinking about the unknown past. Given enough examples, we can begin to learn more about how to make better predictions about the past or what we term “retrodictions.”
This seminar-based course, with active discussion and analysis, is required of all first-year doctoral students in Management and open to other Penn students with instructor permission. The purpose of this course is to examine and understand basics in the theory and empirical research in the field of micro organizational behavior and to build an understanding of people's behavior in organizations and across organizations. The course covers a blend of classic and contemporary literature so that we can appreciate the prevailing theories and findings in various areas of organizational behavior. This course covers micro-organizational behavior, focused on topics such as influence/status, virtual teams, job design, organizational culture and socialization, identity in organizations and overall look on where the field of micro-organizational behavior is going.
Mentored research involving data collection. Students do independent empirical work under the supervision of a faculty member, leading to a written paper. Normally taken in the junior or senior year.
The Honors Program has been developed to recognize excellence in psychology among Penn undergraduates and to enhance skills related to psychological research. The 4998 credit signifies an Honors Independent Study, completed as part of the Honors Program. The honors program involves: (a) completing a year-long empirical research project in your senior year under the supervision of a faculty member (for a letter grade). This earns 2 cu's. (b) completing a second term of statistics (for a letter grade) before graduation. (c) participating in the year-long Senior Honors seminar (for a letter grade). This seminar is designed especially for Psychology Honors majors; this receives a total of 1 cu. (d) participating in the Undergraduate Psychology Research Fair in the Spring semester, at which honors students present a poster and give a 15-minute talk about their research. (e) a total of 15 cu's in psychology is required. Students will be selected to be part of the Honors Program in the Spring of their junior year (see application process online)
Individual Study and Research
New research from Wharton's Philip Tetlock finds that combining predictions from large language models can achieve accuracy on par with human forecasters.…Read More
Knowledge at Wharton - 1/14/2025Our best pundits don’t have a solid track record. So how can the rest of us become better forecasters?
Wharton Magazine - 04/20/2016