4th AVA Natural Images Meeting at University of Bristol

17 Sep 2003 archived

Programme

11.30 Registration and Coffee

12.00 Daniel Osorio Why do animals see colour in so many ways? (Invited Lecture)

12.45 – 13.30 Lunch, poster, equipment demos, art demo

13.30 Anthony Hayes: Continuity of natural structure in space and scale

13.50 Steven Dakin: Natural image statistics and brightness “filling in”

14.10 David Tolhurst, Jan Lauritzen: Detecting Gabor patches in natural scenes

14.30 Yury Petrov, Li Zhaoping: Human luminance discrimination in natural images matches luminance correlations in natural images

14.50 Simon Schultz, Anthony Movshon: Transient timing structure in neuronal responses to visual texture motion stimuli

15.10 – 15.30 Tea/coffee, poster, demos

15.30 Kinjiro Amano, David Foster, Sergio Nascimento: Surface-colour judgements in natural scenes

15.50 Lewis Griffin: Learning to see with “Google Image”

16.10 Florian Ciurea, Brian Funt: “Histograms as descriptors for illuminants in natural scenes

16.30 David Simmons, Maura Edwards, Lorna Macpherson, Kenneth Stephen, Robert McKerlie How unsightly is dental fluorosis? A technique and preliminary results

16.50 Robert O'Callaghan, John Lewis, Stavri Nikolov, Nishan Canagarajah, Dave Bull Segmentation of uni- and multi-modal natural images

17.10 Alexa Ruppertsberg, Bosco Tjan, Heinrich Bülthoff: Local features bootstrap gist perception of scenes

17.30 Tom Troscianko: OK, now we have natural images. What about natural tasks?

17.50 – 19.00 Drinks and buffet

Poster:

Alej Parraga, Tom Troscianko, David Tolhurst: Performing a naturalistic task when the spatial structure of colour in natural scenes is changed

Demo of equipment by Cambridge Research Systems

Art Demo by John Jupe

Meeting Abstracts

Why do animals see colour in so many ways?

Daniel Osorio.

Neuroscience, University of Sussex, Falmer, Brighton BN1 9QG UK

d.osorio@sussex.ac.uk

There are plentiful data on spectral sensitivities of photoreceptors and photopigments, and on the evolutionary relationships of the pigment genes. Spectral sensitivity maxima ranges from below 350nm to over 600nm, and eyes have from one to about 16 different spectral types of pigment. Sometimes spectral sensitivities are substantially narrowed by coloured filters. To account for the adaptive significance of this diversity we can collect information on spectral stimuli, model colour signals and relate the models_ prediction to visual ecology _ what animals look at, what they do, and when or where they do it. A straightforward conclusion is that seeing colour vision on land is very different enterprise from seeing colour under water. Beyond this there is no simple story to tell about adaptive optimisation, but perhaps we have learnt to think more carefully about visual ecology and the uses of colour.

References: Kelber A, Vorobyev M, Osorio D (2003) Animal colour vision – behavioural tests and physiological concepts. Biological Reviews8 78, 81-118.

Continuity of natural structure in space and scale

Anthony Hayes.

Department of Psychology, The University of Hong Kong, Pokfulam Road, Hong Kong Special Administrative Region, People's Republic of China.

ahayes@hku.hk

In recent years there has been renewed interest in the representation by the visual system of natural image contours and continuity – a key area of interest of the early 20th Century Gestalt School of psychology. Researchers in anatomy, neurophysiology, computer science, and visual psychophysics, have combined their approaches to develop models of how natural contours are perceived. These models are based on work that demonstrates that neurons in primary visual cortex make use of long-range lateral connections that allow integration of information from far beyond the classical receptive field, and the evidence suggests that these connections are involved in associating neurons that respond to the length and shape of a contour. I shall present psychophysical results on contour perception and I shall show how these results converge with recent anatomical and physiological findings. I shall then show how continuity in scale in natural images can be considered as analogous to continuity in space, and is perhaps as important to vision as continuity in space. I shall support a claim that partitioning spatial structure into different scales, as others have suggested is a property of visual processing, gives rise to an algorithm that allows neurones with relatively low dynamic ranges to finely code a wide range of image-intensity variation.

Natural image statistics mediate brightness “filling-in”

Steven Dakin.

Institute of Ophthalmology, University College London, 11-43 Bath Street, London EC1V 9EL, UK

s.dakin@ucl.ac.uk

Although the human visual system can accurately estimate the lightness/reflectance of surfaces under enormous variations in illumination, two equiluminant grey regions can be induced to appear quite different simply by placing a light-dark luminance transition between them. This illusion, the Craik–Cornsweet-O’Brien (CCOB) effect, has been taken as evidence for a low-level “filling-in” mechanism subserving lightness perception. Here we present evidence that the mechanism responsible for the CCOB effect operates not via propagation of a neural signal across space but by amplification of the low spatial-frequency structure of the image. We develop a simple computational model that relies on the statistics of natural scenes to actively reconstruct the image most likely to have caused an observed series of responses across spatial frequency channels. This principle is tested psychophysically by deriving classification images for subjects’ discrimination of the contrast polarity of CCOB stimuli masked with noise. Classification images resemble “filled-in” stimuli; i.e. observers rely on portions of the stimuli that contain no information per se but that correspond closely to the reported perceptual completion. As predicted by the model, the filling-in process is contingent on the presence of appropriate low spatial frequency structure.

Detecting Gabor patches in natural scenes

David Tolhurst and Jan Lauritzen.

Department of Physiology, University of Cambridge, Cambridge CB2 3EG, UK, mailto:djt12@cus.cam.ac.uk

We measured contrast thresholds for detecting the presence of a small Gabor patch embedded in the centre of digitised monochrome photographs of natural scenes, and when embedded in a “surrogate scene” with the same second-order structure as real natural scenes: a random luminance pattern filtered to a 1/f amplitude spectrum. Compared to its threshold on a uniform screen, the threshold for the Gabor patch was markedly elevated (was masked) by these complex patterns. We measured how masking depended on the contrast of sinusoidal gratings with the Gabor’s spatial frequency, orientation and phase. We calculated an “equivalent contrast of grating”: the grating contrast that should evoke the same response from a typical bandpass V1 neuron as the central part of each natural scene. The complex patterns gave substantially more masking than the calculated equivalent gratings. Our calculation of equivalence is subject to assumptions, but the discrepancy between grating and natural-scene masking seems too big to be explained by errors of calculation. Broadband complex scenes have substantial contrast energy outside the immediate bandpass of the test Gabor; it is likely that this energy contributes to the masking. To test this, we filtered the natural scenes and 1/f pattern to remove all contrast energy in a 1.5-2.1 octave band centred on the orientation and spatial frequency of the test Gabor. As expected, these “notch-filtered” images did still cause masking but, surprisingly, they caused more masking than the original unfiltered images. We propose that these results are compatible with a “contrast normalisation” model.

Human luminance discrimination in natural images matches luminance correlations in natural images

Yury Petrov and Li Zhaoping.

University College London, UK.

z.li@ucl.ac.uk

Humans can detect luminance variations as small as 1% of the background levels. We ask if this sensitivity is determined by the information content in natural images, captured by relationships or correlations between image pixel values. A lack of correlations between the pixel values, e.g., in white noise images, indicates no information content. In experiment 1, we measured human luminance discrimination in natural images. Subjects discriminated between original natural images, I, and their luminance sub-sampled versions (mean luminance matched), Ib, in which pixel values are digitized at pixel depth b=3-9 bits (or 8-512 grey level gradations) per pixel. In experiment 2, we measured human detection of luminance correlation. Subjects discriminated between a white noise image and a residue image Iresidue = I- Ib , i.e., the difference between the original natural image and its luminance sub-sampled version, amplified in luminance/contrast in display to match the noise image. The residue image contains the residue luminance correlation in natural scenes not captured in the sub-sampled image. The percentage correct performances vs. pixel depth b in both experiments matched remarkably well, with 75% correct performance at around 6 bits/pixel and 50% correct at around 7 bits/pixel from both experiments. This strongly suggests that human luminance discrimination sensitivity is optimized to match the information content in natural scenes. We support this hypothesis by showing that the performance curves can be fitted by signal detection theory assuming that the differences in the amount of mutual information between nearby image pixels is responsible for the performances.

Transient spike timing structure in neuronal responses to visual texture motion stimuli

Simon R. Schultz* and J. Anthony Movshon.

Center for Neural Science, New York University.

* Now at Wolfson Institute for Biomedical Research, University College, London.

s.schultz@ucl.ac.uk

We negotiate everyday tasks in natural environments with many brief fixations, during which motion continues across receptive fields due to self and environmental movement. As a result, the motion signal analysed in the cortex is composed of many step changes, which give rise to transient as well as sustained neuronal responses in cortical areas such as V5/MT. We studied these transient responses by recording extra-cellularly from neurons in the anaesthetised macaque monkey, during presentation of grating, plaid and texture motion stimuli punctuated with direction changes every 320 ms. The texture motion stimuli were constructed by filtering drifting white noise with the spatial frequency tuning curve of each cell, thus optimally concentrating power in spatial frequency bands to which the cell was responsive. We found characteristic spike timing structure in the transient, but not the sustained, responses. Differences in onset dynamics between the motion stimuli were often striking, with responses to the texture motion stimuli tending to begin significantly later, and having a less prominent transient response, for the same level of sustained response. Increasing the amount of spatial frequency or orientation content in the stimuli reduced the transient peak height. Responses to motion signals arising in natural scenes are therefore likely to involve slower dynamics than those that would be predicted from grating responses.

Surface-colour judgements in natural scenes

Kinjiro Amano1, David Foster1, Sergio Nascimento2.

1Visual & Computational Neuroscience Group, University of Manchester Institute of Science and Technology, Manchester M60 1QD, United Kingdom. 2Department of Physics, Gualtar Campus, University of Minho, 4710-057 Braga, Portugal.

kinjiro.amano@umist.ac.uk

The ability of observers to make judgements about surface colour has usually been measured with either abstract planar Mondrian-like displays or three-dimensional tableaux consisting of simple geometric objects. The aim of the present study was to measure how the spatial and spectral properties of natural scenes affect this ability. Reflectance functions of rural and urban scenes were obtained with a fast hyperspectral imaging system (e.g. Nascimento, et al. 2002, J. Opt. Soc., Amer. A, 19, 1484-1490), which provided spectral data at approx. 10-nm intervals over 400-720 nm at each point in a high-resolution digital image. Simulations of these scenes under different illuminants were presented on a computer-controlled colour monitor with 10-bit resolution per gun. The images on a display subtended not less than 9° ´ 10° at a viewing distance of 90 cm. Observers’ judgements of surface colour were measured in a simple detection task (Foster et al., 2001, Proc. Nat. Acad. Sci. USA, 98, 8151-8156), organised as follows. In each trial, two images were presented in sequence for 1 s each, in the same position and with no interval between them. The images were simulations of a scene under spatially uniform daylights with correlated colour temperature 25000 K and 6700 K. The surface reflectance of a test region in the second image was varied from trial to trial, quantified by an equivalent local change in daylight. Observers reported whether there was a surface-reflectance change.

The variance of observers’ judgements was less with these natural scenes than with abstract patterns, although the extent of their bias was little affected. Both the spatial and spectral characteristics of the natural scenes used here may contribute towards improved performance.

Supported by the EPSRC.

Learning to see with ‘Google Image’

Lewis D Griffin.

Imaging Sciences, Medical School, King’s College London, UK

lewis.griffin@kcl.ac.uk

Google Image (GI) is an online searchable index of 425 million images on the web. Searches are defined using text with the standard Google syntax. Results are based not on image content, but rather on: “the text on the page adjacent to the image, the image caption and dozens of other factors to determine the image content.” Search results are returned as pages of up to twenty items, consisting of a link to the image and the page containing it and also a thumbnail of the image, typically 100×100 pixels. Not every image returned by GI is as closely related to the search terms as would be the results from a manually indexed image database; but the results do include a rich variety of images, for example in response to ‘cucumber’ one obtains photographs, pictures (both realistic and stylised), line drawings, etc. In size and flexibility, GI is an unparalleled database of imagery linked to language. I suggest that it has potential as a novel resource for vision research. For an initial experiment, I have taken terms from the children’s book “First Thousand Words Sticker Book” (e.g. acrobats, baby, cabbage, dance). Using the software rafabot (a high-speed, multi‑threading, large scale web spidering robot) I have, in the space of a few hours, downloaded the first 100 thumbnails for each of these 1000 searches. I will report initial results of using this database to attempt to derive the partitioning of colour space into the eleven basic colour categories.

Histograms as Descriptors for Illuminants in Natural Scenes

Florian Ciurea and Brian Funt.

Simon Fraser University, Burnaby, BC, Canada, V5A 1S6

fciurea@cs.sfu.ca

Color histograms are important in image indexing and in several color constancy methods; however, there has been little systematic study of the properties of color histograms. For reasons of simplicity, several recently proposed color constancy algorithms use an image histogram as input as opposed to the full scene. In this study, we investigate the relationship between color histograms and the illuminants found in natural scenes. We have constructed a database of approximately 11,000 images in which the RGB color of the ambient illuminant in each scene is measured. To build such a large database, we used a novel setup consisting of digital video camera with a neutral gray sphere attached to the camera so that the sphere always appears in the field of view. Using a gray sphere instead of the standard gray card facilitates measurement of the variation in illumination as a function of incident angle. Our results show that there is a high correlation between the distribution of colors in the image histogram and the color of the illuminants in the scene. The results suggest that it is possible to build a successful color constancy algorithm that does not make use of the spatial information in an image We also present an analysis of the distribution of various illuminants in the natural scenes.

How unsightly is dental fluorosis? A technique and preliminary results

David R. Simmons*, Maura Edwards‡, Lorna M. D. Macpherson‡, Kenneth Stephen‡ & Robert A. McKerlie‡. *Department of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB and ‡University of Glasgow Dental School, 378 Sauchiehall Street, Glasgow G2 3JZ

d.r.simmons@psy.gla.ac.uk

Dental fluorosis occurs when excess fluoride, from any source, is incorporated into the enamel of developing teeth, resulting in a mottled appearance when the teeth erupt. The effects can range from mild white striations to more severe staining and pitting. Standard clinical photographs of teeth are usually taken as close-up views with lip retractors in place i.e. very different from a natural viewing situation. Consequently a new technique was used to simulate fluorosis. Using an image database of teeth with different levels of fluorosis, the associated mottled textures were superimposed on to those of a volunteer. Appropriate colour and size adjustments made the simulation more realistic. The quality of the final result was judged by dental epidemiologists experienced in fluorosis' assessment, who rated the teeth on a standard scale. The resultant images were then incorporated into a web-site and associated with a questionnaire. High-school students were asked to judge acceptability of the teeth on a four-point scale. It was found that acceptability was inversely correlated with the level of fluorosis at each simulated viewing distance, but that this inverse correlation petered-out as the viewing distance increased. At the largest simulated distance used (roughly conversational distance), there was very little difference in perceived acceptability with fluorosis level. This study has established a useful methodology for looking at dental cosmetics and provides the basis for future work on generating a more detailed and rigorous model of how fluorosis affects both tooth appearance and acceptability.

Segmentation of uni- and multi-modal natural images

Robert O'Callaghan, John Lewis, Stavri Nikolov, Nishan Canagarajah and Dave Bull.

Department of Electrical and Electronic Engineering, University of Bristol, Bristol, BS8 1UB, UK.

R.J.OCallaghan@bristol.ac.uk

Human segmentation of natural images, whether implicit or explicit, is usually guided by high-level semantic understanding of the scene. This fact alone makes truly "meaningful" automated segmentation of arbitrary natural images a challenging proposition. It is possible, nonetheless, to obtain a useful partitioning of an image by examining low-level features, without contextual information. These features are related to biological early visual processes, giving rise to a kind of "pre-attentive" segmentation. We have recently proposed an unsupervised image segmentation algorithm based around this principle. The first stage computes a perceptual gradient function, marking likely texture and intensity boundaries in the image. After deriving an initial over-segmentation from this function, the second stage applies a perceptual grouping strategy, taking into account the region statistics. The results on both natural and artificial images can be seen to approximate human perception of the scenes. An extension of the algorithm to multi-modal imagery has also shown interesting potential for a combination of segmentation and fusion tasks.

Local features bootstrap gist perception of scenes

Alexa I. Ruppertsberg*, Bosco S. Tjan# and Heinrich H. Buelthoff.

Max-Planck-Institute for Biological Cybernetics, Tuebingen, Germany, * Now at: Department of Optometry, University of Bradford, UK and # now at: University of Southern California, Los Angeles, USA

a.i. ruppertsberg@bradford.ac.uk

Natural scenes have a complex structure in terms of the variety of objects they contain and the spatial arrangements of these objects. Yet, visual perception of scenes appears to be automatic and rapid (Biederman, 1972; Potter, 1975, 1976). We used a rapid priming paradigm to investigate if local structures are used during the first milliseconds to bootstrap scene processing for obtaining the gist of a briefly presented natural scene. Local structure we defined as visual information that survives image scrambling with intact units of 1.4 degree in size. ‘Gist’ is the information, which allows an observer to perform scene categorisation defined by choosing a target from the test scene as a response prompt, and not a distractor from a very different scene. In our experiments, a scrambled version of the test scene (42 ms) was presented before the onset of the intact scene (28ms), followed by a mask.

Results: The scrambled frame significantly facilitates the perception of the gist of a scene but the facilitation is incomplete (Exp. 1). This facilitation is not due to luminance and colour distributions (Exp. 2), and significant facilitation occurs only when the scrambled frame is presented immediately before or after the intact frame (Exp. 3). Lastly, local structure of one scene can facilitate the perception of a similar scene, but the effect is significantly reduced. Taken together, our results suggest that local structures have a significant contribution to rapid scene perception, and rapid scene perception relies on the integration of diverse sources of information that are available within a brief time frame.

OK. Now we have “natural images”. What about “natural tasks?”

Tom Troscianko.

Dept of Experimental Psychology, University of Bristol, 8 Woodland Rd, Bristol BS8 1TN, UK

tom.troscianko@bris.ac.uk

The very concept of a “natural images” area within vision science appears absurd, since it implies that these scenes form a minor subset of visual stimuli – whereas, in reality, most vision systems of most species spend most of their time encoding natural stimuli. Of course, it has been difficult until recently to present such stimuli in controlled experiments, so that is one reason for the scientific novelty of suddenly being able to do so. But there is, of course, another problem. Once we put up “natural images” on our screen, what do we do with them? What is the observer’s task? What gets measured?

In the research that I have been doing with colleagues, the paradigm has been to investigate which aspects of the stimulus are important for making responses such as object identification or discrimination. We felt that these tasks were representative of “natural tasks” that vision has to allow us to carry out – but is this a reasonable assumption? I will give examples of these tasks from our recent studies (these tasks are sometimes only implicit), and will suggest that one of the problems that this exposes is that, while “natural images” are novel within vision science, finding appropriate “natural tasks” is still not straightforward.

Performing a naturalistic visual task when the spatial structure of colour in natural scenes is changed.

C.Alejandro Párraga, Tom Troscianko, David Tolhurst*

Department of Experimental Psychology, University of Bristol, 8 Woodland Rd., Bristol, BS8 1TN, UK; * The Department of Physiology, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.

alej.parraga@bristol.ac.uk

A previous study (Párraga, Troscianko and Tolhurst, Current Biology 10, pp35-38, 2000) demonstrated psychophysically that the human visual system is optimised for processing the spatial information in natural achromatic images. This time we ask whether there is a similar optimisation to the chromatic properties of natural scenes. To do this, a calibrated, 24-bit digital colour morph sequence was produced where the image of a lemon was transformed into the image of a red pepper in small (2.5%) steps on a fixed background of green leaves. Each pixel of the image was then converted to the triplet of L, M, and S human cone responses and transformed into a luminance [lum=L+M] and two chromatic [(L-M)/lum and (lum-S)/(lum)] representations. The luminance and the [L-M/lum] chromatic plane were Fourier-transformed and their amplitude slopes were independently modified to either increase (blurring) or decrease them (whitening) in fixed steps. Recombination of the luminance and chromatic representations produced 49 different morph sequences each one with its characteristic luminance and L-M chromatic amplitude slope. Psychophysical experiments were conducted in each of the 49 sequences, measuring observers’ ability to discriminate between a morphed version of the fruit and the original one. A control condition was the same task with only monochrome information. We found that colour information appeared to “dominate” the results, except that, performance was significantly impaired when the colour information in the images was high-pass filtered. This is in keeping with the idea that colour information is most useful at low spatial frequencies, as expected from the contrast sensitivity function for isoluminant gratings.

Funded by the BBSRC-UK

Registration (click on one of the categories below and select any available options via pull down menu)

The registration for this event is over.