Information retrieval vol:12 issue:3 pages:380-399
The number of Web users whose first language is not English continues to grow, as does the amount of content provided in languages other than English. This poses new challenges for actors on the Web, such as in which language(s) content should be offered, how search tools should deal with mono- and multilingual content, and how users can make the best use of navigation and search options, suited to their individual linguistic skills. How should these challenges be dealt with? Technological approaches to non-English (or in general, cross-language) Web search have made large progress; however, translation remains a hard problem. This precludes a low-cost but high-quality blanket all-language coverage of the whole Web. In this paper, we propose a user-centric approach to answering questions of where to best concentrate efforts and investments. Drawing on linguistic research, we describe data on the availability of content and access to it in first and second languages across the Web. We then present three studies that investigated the impact of the availability (or not) of first-language content and access forms on user behaviour and attitudes. The results indicate that non-English languages are under-represented on the Web and that this is partly due to content-creation, link-setting and link-following behaviour. They also show that user satisfaction is influenced both by the cognitive effort of searching and the availability of alternative information in that language. These findings suggest that more cross-language tools are desirable. However, they also indicate that context (such as user groups' domain expertise or site type) should be considered when tradeoffs between information quality and multilinguality need to be taken into account.