AVA Awards Recipients Report - Chisom Aniebo - 2023

Bradshaw-Eagle Undergraduate Research Scholarship

Uncovering Factors Driving the Between-Observers Similarity in Gaze Patterns to Images

Supervisor: Professor Isabelle Mareschal

School of Biological and Behavioural Science, Queen Mary University of London

Introduction

When observing the same scene, different people tend to exhibit similar gaze patterns. The degree of this similarity is referred to as interobserver consistency. So far, only several studies analysed factors determining it and, consequently, these factors currently remain mostly unexplored. For example, while image characteristics are known to affect interobserver consistency, it is unknown which image characteristics in particular these are, i.e., whether high-level or low-level image characteristics play a more important role in shaping that phenomenon. Additionally, it is not known whether interobserver consistency is task-dependent, e.g., whether it differs when observers free-view a scene compared to when they perform a visual search in it. The study at hand aims to fill this knowledge gap by asking three research questions:

How does the task (free-viewing vs. visual search) affect interobserver consistency?
What is the influence of low-level image complexity on interobserver consistency?
What is the influence of high-level image complexity on interobserver consistency?

To answer these questions, eye movements of participants viewing images of natural scenes will be recorded. These images will systematically vary in their high-level complexity (i.e., semantic content) and low-level complexity (i.e., low-level image features). Two independent groups of participants will either free-view the images or perform a visual search task on them. Participants from both task groups will view the same images, thus for each image there will be two interobserver consistency values: one from the free-viewing group and one from the visual search group. Thus, the variables predicting the interobserver consistency in this experiment are high-level complexity, low-level complexity, and the task (free-view/visual search). The interobserver consistency will be measured by calculating how well the fixation heatmap of all observers minus one predicts the fixation heatmap of the remaining (left-out) observer according to some heatmap similarity metric (e.g., correlation).

Images for the experiment will be obtained from publicly available dataset of labelled images. For each image from this set, both complexity measures will be computed. Then, a final set of images will be selected: it will span the wide range of complexity levels, whilst ensuring that low-level and high-level image complexities remain uncorrelated (to be able to test their distinct contributions). The images will need to meet other predefined criteria, e.g., pertaining to size and depicted content.

A linear mixed-effects model will be used to examine the relationships between our predictors (low-level complexity, high-level complexity, task) and the outcome variable (interobserver consistency).

It is hypothesised that interobserver consistency will be higher during the visual search than during free-view, and that this effect will be more strongly influenced by high-level image characteristics. This is because, when conducting a visual search, participants are likely to rely on their shared schema of typical scene arrangements.

To summarise, the proposed project will investigate the degree to which the commonalities in gaze patterns are attributable to the low-level and high-level characteristics of images, and whether this effect is dependent on the task being performed (e.g., free-view vs. visual search).

My role:

I took part in the design phase of the project. Subsequently, the aim of this internship was to be co-responsible for the initial research, design, creation, and preparation of the study. My role within the research team encompassed various responsibilities, including conducting extensive literature searches, performing statistical analyses, creating plots, learning and using R programming language, summarising research papers, evaluating images, and making decisions regarding the usability of a substantial image dataset:

Descriptive data analysis

In the first part of the internship, I was required to use SPSS to perform a series of exploratory analyses on a dataset detailing the characteristics of 27,000 labelled images, to get an overall sense of what these data looked like. This was the image dataset from which we would select the final set of images for the study. The variables analysed were: number of object labels, the aspect ratio of the images, semantic complexity (high-level complexity), visual clutter (low-level complexity), and the size of the images in pixels.

I was required to calculate descriptive statistics like the mode, mean, range, etc. of these variables. I was then required to illustrate these findings in appropriate graphs, including histograms and scatterplots of the different useful combinations of the variables, to illustrate any trends and relationships. Then I conducted a Pearson’s test to ensure that low-level complexity and high-level complexity were uncorrelated. I created a summary document and presented these findings to the researchers in my team, which encouraged me to learn how to present data clearly. From this summary, we were able to gauge an understanding of what the dataset looked like, so that we could start deciding how we will use the images in the study.

Literature searches

Over the course of the internship, I conducted multiple literature searches to inform the study design. I created a number of documents relating to the different aspects of the study that required thorough research on various topics. These documents included the summaries of:

articles that used similar methods to the ones we were planning to use.
articles reporting studies using eye-tracking in visual search tasks.
articles describing memory tasks typically used in visual attention research.
articles that investigated interobserver consistency in eye movements.

I created summary documents for each literature search topic and presented these back to the researchers. My work was used to inform the next stages of the development of the study, so it was vital that the papers I presented matched the search requirements. This helped develop my literature review skills, including an appreciation for reviewing literature from credible sources (i.e., key journals/publications) and the ability to quickly obtain relevant information from dense literature.

R online course

Prior to this internship, I had limited experience working with programming software, and no experience in using R. This internship provided me with the opportunity to learn how to use R, following the assistance of my colleagues, as well as independently working through an online course, which helped me build my independence and initiative. I decided to learn to use R as it was a programming language the researchers were familiar with, allowing us to work using the same software. This skill has proved invaluable and enabled me to assist in further data analysis and presentation.

Creation of a detailed scatterplot in R

Having learned how to use R, I was provided the opportunity to put my newly learned skills to the test. I used R to create the most detailed scatterplot I had made to date, using coding, which was a brand-new skill and experience for me. The scatterplot illustrated the minimal correlation between visClutter (visual clutter, a low-level image characteristic) and semMean (a measurement for semantic complexity, a high-level characteristic). In order to achieve a balanced representation of both high and low-level complexity to use in our experiment, I developed a script that randomly selected 100 image datapoints, and visualised these points in blue. I was provided with advice on how to refine the scatterplot until it was considered usable, which developed my skills and confidence in R programming and data presentation.

Selection of stimuli for the experiment

In the final stage of the internship, we divided the image set into subsets characterised by systematically varying ranges of high- and low-level complexity values. This operation ensured that the final set of images we use in the study span the continuum of low-level and high-level complexity. Then, automatically selected a number of images meeting pre-specified criteria (pertaining to the aspect ratio and size) to potentially use in the study. I was then required to manually inspect this set of 4,450 images, to determine their suitability for the study, i.e., if they were usable or not in terms of their quality, how blurry they were, if they had any distracting elements such as text, people, or vignette, etc. This developed my attention-to-detail.

Other

Alongside the main part of the internship, I also had the opportunity to gain experience in diverse areas. This included learning how to use MATLAB for image labelling and programming. I further learned how to use Qualtrics for survey design and Prolific for recruitment and distribution of the survey to participants. Additionally, I had the privilege of engaging in discussion about the experimental design and rationale behind the decisions made in conducting the study. The internship also provided valuable mentoring and career guidance, along with the chance to expand my knowledge about the operation of the eye tracker.

Conclusion:

At the conclusion of my internship, the research project remained in the design phase, and the data collection had not yet started. The design we developed serves as the foundation for the study's future implementation, with the next steps involving the transition to data collection and analysis.

In conclusion, my internship experience at QMUL provided me with the unique opportunity to actively participate in the initial design of a computational vision research study. I acquired valuable skills, made significant contributions to the research project, and gained insights into the pivotal role that the design phase plays in shaping the direction of scientific inquiries.