by Nate Breznau, University of Bremen
Early in my studies a supervisor recommended that I replicate a key publication in my research area on the relationship of public opinion and social welfare policy. Throughout my entire dissertation studies I couldn’t do it. This is how I arrived at the following conclusion:
Different researchers (or teams) who work with the same data and employ the same statistical models will not arrive at the same results.
My study was actually a reanalysis, not a replication because I took the same data and methods as the original researchers. Of course in true replication studies researchers do not expect to arrive at identical or even similar results. The subjective perceptions of the scientists and the unique observational contexts lead to variations in results. But with secondary data and reproduction of statistical models how are different outcomes possible?! These secondary observer effects, as I label them,
actually come about in many ways (Breznau 2015).
For one, secondary data sources are not stable over time. That’s right: The OECD social indicators data for example, differ depending on the date they were acquired. Also, there is no singular definition of many common statistical procedures we employ such as ‘factor scoring’. Thus, researchers’ own idiosyncratic methods lead to variable outcomes and this might be a big problem. In the small-N, macro-comparative research realm, findings may change from one reanalysis to the next. In my sample the secondary observer effects roughly averaged one-third of a standard deviation in effect sizes.
What is to be done? Sensitivity analyses are always a good idea, but with different data vis-à-vis the original study, a problem remains. An answer came to me after reading a working paper about a seemingly unrelated topic, although one I am nonetheless enthusiastic about: football (a.k.a. soccer). In a study spearheaded by Rafael Silberzahn and colleagues, 29 teams of data analysts were given the same dataset and the same hypothesis – that referees are skin tone biased in their distribution of red cards (Silberzahn et al. 2015). The results of the 29 research teams are a cornucopia of findings with little coherence but lots of utility, because the principals interviewed the research teams and reviewed their methods to develop a larger picture of the topic and pave the way for superior future research.
Although the Silberszahn example deals only with idiosyncratic methodological decisions, I think this approach can reconcile some secondary data inconsistencies. Macro-comparative research should employ various independent research teams, i.e. crowdsourcing analysts following the Silberzahn strategy. But more importantly here, I propose we should share our secondary data, i.e. crowdsourcing data. We could crowdsource the hard drives of social scientists and collect all possible versions of secondary data. Most importantly, I argue that it should be a standard practice to publish all secondary datasets along with any journal publications that use or compiled them. This would provide a range of plausible values for each country-time point which we could use in sensitivity analyses, and greatly improve the reliability of our country-level findings.
- Breznau, Nate. 2015. “Secondary Observer Effects: Idiosyncratic Errors in Small-N Secondary Data Analysis.” International Journal of Social Research Methodology, online first. http://www.tandfonline.com/doi/abs/10.1080/13645579.2014.1001221.
- Silberzahn, Rafael, Eric Uhlmann, Dan Martin and Brian Nosek et al. 2015. “Crowdsourcing Data Analysis: Do Soccer Referees Give More Red Cards to Dark Skinned Players”. Open Science Framework. https://osf.io/j5v8f/.
If you enjoyed this blog entry, you may be interested in a similar article: Reflections on contemporary debates in policy studies by Sarah Ayres & Alex Marsh