PET activation studies are performed widely to study human brain function. The question of reproducibility, reliability, and comparability of the results of such experiments has never been addressed on a large scale. Recently, 12 European PET centers performed the same cognitive activation experiment in a European Union funded concerted action. The experiment involved a standardized and validated cross-lingual experimental and control task involving verbal fluency. Each center contributed at least 6 subjects. In total there were 77 subjects and 247 scans in each of the two conditions, giving 494 scans in total. We have analyzed each center's dataset and pooled datasets using statistical parametric mapping. We present results that address the consistency of these analyses, discuss the factors that influence their sensitivity, and comment on a number of related methodological issues. We used a MANOVA to test for center, condition, and centre by condition effects and found a strong condition and center effect and weaker interactions. The main effect determining reproducibility was the overall sensitivity of the experiment, to which the scanner and number of scans contribute in a major way, with a marked advantage for 3D scanners and a large field of view. An important conclusion is that data from different centers can be pooled to improve the reliability of results, which is of particular importance for studies in patients with rare conditions.