Do Photos Complement or Substitute for Text?

Journal of Marketing Research Scholarly Insights are produced in partnership with the AMA Doctoral Students SIG – a shared interest network for Marketing PhD students across the world.

A 2018 survey by the Pew Research Center found that an overwhelming majority (93%) of Americans often read customer reviews and ratings when buying a product or service for the first time. As customers increasingly rely on reviews to make decisions, it becomes essential to identify the characteristics of reviews that make them helpful. Do the reviewer’s credibility and expertise matter? Does writing style have an effect? Or can something as simple as adding photos to a review make them more helpful?

In a recently published Journal of Marketing Research article, Gizem Ceylan, Kristin Diehl, and Davide Proserpio explore when and why photos increase review helpfulness. The authors combine a machine-learning analysis of review text and photos from with five experiments to evaluate whether and why the similarity between the text and photos in reviews makes them more helpful.

How Text and Photos Combine to Improve Review Helpfulness

The authors show that, whereas the mere presence of a photo can increase a review’s helpfulness, greater similarity between photos and text heightens this effect. Using a dataset of 7.4 million reviews associated with 3.5 million photos from, the authors provide real-world evidence of a positive association between photo–text similarity and helpfulness. This study also provides preliminary evidence that the positive effect of text–photo similarity on review helpfulness is attenuated when processing ease is low (i.e., text is difficult to read, image quality is low).

Through controlled lab experiments, this research delves deeper into the underlying mechanism and finds that perceived ease of processing drives the effect of text–photo similarity on review helpfulness. It also establishes that text and photo fluency act as moderators such that similarity enhances helpfulness when the text and photo are difficult to process (i.e., less fluent to the reader).

These findings have important implications for marketers. They demonstrate that the interplay between visual and verbal content influences review helpfulness. By shedding light on the mechanism behind the effect of text–photo similarity on review helpfulness, this research also provides insights into how review sites can increase review helpfulness by nudging consumers to convey similar content in text and photos, rather than using photos to substitute for text.

Review sites can increase review helpfulness by nudging consumers to convey similar content in text and photos, rather than using photos to substitute for text.

We were able to ask several questions to these authors, who provided interesting insights into this article:

Q: You find that similarity between photos and text helps in improving helpfulness perceptions in the context of Yelp reviews. Do you think the impact of similarity would be the same across all platforms and all contexts, or is there any reason to expect a deviation?

A few factors come to mind that could potentially moderate the impact of photo–text similarity on perceptions of helpfulness across different platforms and contexts:

  • Centrality of visuals versus text – What I mean by that is that for certain categories (e.g., clothes) and certain platforms (e.g., Instagram), visual content is more central in the review experience, whereas in other categories (e.g., podcasts) and on other platforms (e.g., Reddit), text is more central. We would expect similarity to matter more in settings where visuals are more central to the experience versus those where text is more central.
  • Experience variability – What I mean by that is the consistency in photos between different users and on different occasions. For durable goods or even experiential purchases such as hotel rooms, where photos remain more or less the same across users and usage occasions, aligned photos may not play as critical of a role as is the case for restaurants.
  • Devices – As we show, similarity helps with fluency. This may be particularly important on mobile devices, because the smaller screen size may create greater feelings of difficulty (vs. laptops) for which similarity-induced fluency may overcome these feelings.

Q: Given your findings that the similarity between text and photos heightens the helpfulness of the review, do you believe that the ratio of photos to text also plays a role in influencing helpfulness? For instance, is there a difference in impact between scenarios with less text but more photos compared to those with more text and fewer photos?

Great question! The simple answer is yes, but probably not the way you would expect. We conducted an experiment to examine how the number of topics in the review text and the number of photos included in the review impact the helpfulness of online reviews. We tested conditions with either one or two topics mentioned in the text, crossed with either one or two photos shown. The results demonstrated that when there was one topic in the text and one matching photo, the review was moderately helpful. That was our baseline. Simply adding a second photo to that review without adding an additional topic did not increase the helpfulness. However, adding a second topic without adding an additional matching photo increased helpfulness. Finally, reviews that included two topics in the text matched with two photos produced the highest helpfulness rating.

These findings suggest two key conclusions. First, people seem to focus relatively more on the text of reviews to obtain useful information compared to the photos. Simply adding more topics led to higher perceptions of informativeness, while adding more photos did not. Second, alignment between the number of topics covered in the text and the number of illustrative photos is important—the greater the match, the easier the review is to process, making it more informative and helpful overall.

Q: What are some challenges associated with multimethod research?

This research project proved to be a valuable learning experience for our team. The reviewers’ feedback challenged us to strengthen the connection between our large-scale data modeling and the experiments. In particular, they emphasized integrating the insights from the Yelp data more tightly into the experimental studies. In response, we worked hard on making the transition from the computational modeling in Study 1 to the experimental settings more seamless by using actual Yelp reviews as stimuli in Studies 2 and 3. Improving the connection between these different components was critical in the review process. As multimethod work is becoming more common to address external and internal validity concerns in the same paper, connecting different data sources and approaches is critical.

Q: More generally, it makes intuitive sense to think online content anywhere would be more helpful if it’s presented with both visual and verbal information. Is there a specific reason why you focus on reviews?

You raise an excellent point: the interaction between text and images we identified likely extends beyond online reviews into many communication contexts. We focused specifically on reviews in this paper for pragmatic reasons, given their importance in influencing consumer decisions and the ready availability of review data to study. However, the core finding that similarity between textual topics and corresponding images improves ease of processing and perceived informativeness has clear implications more broadly.

For example, in science communication, public health messaging, or education, ensuring topic–image congruence could enhance comprehension and engagement. When communicating about a vaccine, matching the text to accompanying visuals should boost understanding by facilitating cognitive processing. Overall, this text–image complementarity effect appears generalizable and can inform effective communication design across many domains, not just reviews. Examining this phenomenon in other settings is an exciting direction for future research that can build on the foundations here.

Q: Do you think the use of different media (smartphones vs. personal computers) could impact how important photos or text are? For example, individuals may be more likely to focus on photos in a less-attention context (smartphones) and more on text in a high-attention context.

For what we find (i.e., that greater photo–text similarity creates feelings of fluency and thus increases helpfulness), high- versus low-attention contexts could be a moderator. When readers don’t devote a lot of attention, the cognitive ease that greater photo–text similarity provides should have a bigger effect on helpfulness. Similarly, another important moderator could be the reader’s motivation to process the information (either chronically or situationally). When motivation is lower, the facilitating effect of greater photo–text similarity should be more impactful. On the other hand, when readers are highly motivated to process, the facilitating effect of greater photo–text similarity may be less.

Read the Full Study for Complete Details

Read the full article:

Gizem Ceylan, Kristin Diehl, and Davide Proserpio (2023), “Words Meet Photos: When and Why Photos Increase Review Helpfulness,” Journal of Marketing Research, 61 (1), 5–26. doi:10.1177/00222437231169711

Go to the Journal of Marketing Research

Aadya Sanwal is a doctoral student in marketing, Pennsylvania State University, USA.

Sushma Kambagowni is a doctoral student in marketing, University of Pittsburgh, USA.