Appendix A2
From 'neural net' simulations of rival views
Introduction
In some of the essays in this volume I have illustrated how infants are capable of circular re-enactment of a model's movements in face-to-face situations, entailing a virtual mirror reversal of the model's movements as perceived. Differing from egocentric observation (such as in autism), such perceptual mirror reversal would have to be supported by some sort of altercentric system, perhaps even neurons sensitized to altercentric perception, operative in the brain, perhaps in competition with systems or networks that subserve egocentric perception. In the last part of this appendix I make some preliminary notes on a project in progress to explore by connectionist simulation networks "trained" to realize such mirror reversal, compared to networks 'trained' only for egocentric perception.
Proceeding from the assumption that competitive networks are involved, I shall first succinctly report from some neurocompational explorations with competitive Neocognitron networks (Fukushima 1986) which we 'trained' from rival perspectives and exposed to ambiguous visual stimuli (Bråten & Espelid 1989).
Perception of Visual Ambiguity: Neurocomputational Explorations
Faced with figures inviting visual ambiguity such as (I) Necker's cube or (ii) Rubin's vase (Fig. A2.1), humans, when setting their mind to it, are able to shift between rival images evoked by the same form, for example alternating between seeing a vase and seeing two faces, or between seeing the cube facing left or facing right.
Neurocomputational approaches to visual ambiguity
Visual ambiguity in the form of the cube presented by L. A. Necker (Fig. A2.1 (I)) in 1832 has been subject to much discussion and many attempts in terms of connectionist modelling and neurocomputation in terms of parallel distributed processing (PDP). For example, Rummelhart et al. (1986) report from runs of a simulation of Necker cube processing by a connectionist network model in terms of the two rival views of the cube front facing left or the cube front facing right. They show that the system will (almost always) end up in a situation in which all the units in one subnetwork are fully activated and none of the units in the other subnet are activated. That is, the system settles in the stable state (or fixed point) of interpreting the Necker cube as either facing left or facing right. This occurs when the input values are low relative to the strength of the constraints among units. Under high input condition, the implemented system occasionally yield the "impossible" interpretation that the cube has two front faces. While this has the merit of retaining both the left and the right hand perspective in operation, it makes for an "impossible perception" through fusing the two.
In terms of his neural net model, Malsburg proposes that a coherent image comes about through temporal coherence of the firing of neurons, exhibiting synchrony in spite of their locations in distant columns. When there are two superposed figures, then cell assemblies coding for the different figures should be expected to be activated in alteration (Marlsburg et al. 1978, 1986).
Neurocomputational nets facing visual ambiguity without contextual clues
Fukushima (1986; 1988) has developed "self-organizational" multi-layered network model, Neocognitron, which allows for recognition of superimposed figures, such as Figure A2.2 (I). Having backward paths it is capable of selective feature 'attention'. Neocognitron can "recognize" different handwritten versions of characters and digits, and can also handle certain types of visual ambiguity. For example, when trained to recognize the individual patterns in this set (0, 1, 2, 3, 4) Neocognitron will 'attend' and recognize selectively three distinct patterns in this order (4, 2, 1) in the below stimulus (Fig.A1.2 (I)).
When faced, however, with the kind of ambiguous stimuli illustrated in Fig. A2.1, or in Fig. A2.2 (ii), calling upon rival perspectives, we found that such a single multilayer network of the Neocognitron kind could not cope.2 T /-æ E C /-æ T
This is Selfridge's example of instant disambiguation by virtue of available context.
For example, the above label (Fig. A2.2 (ii)) may be read as 125, as IZS, as 12 S, as IZ 5, as I 25, etc. When a single Neocognitron network version 'trained' to recognize both digits and characters, was exposed to the above series, the implemented network failed to come out with significant results, at least in our trials.
The idea of competitive networks
When, however, one such network was trained to "recognize" digital forms, and another was trained to "recognize" letter forms, each came out with clear forms according to their respective "perspectives". Our basic idea is this. Define a perspective P as a related set of viewpoints, p1,p2,...,pn, which evoke companion viewpoints, q1,q2,...,qn, as members of a complementary set Q. Let these complementary preferences in viewing the world be imposed by the systems designer or "trainer". Train competing networks each to operate from a single perspective, for example from a perspective P that restrains the world to faces, while the other is 'trained' from the rival perspective, Q, limiting its viewpoints only to vases or goblets. When exposed to visual ambiguity, without contextual clues, permit these P- and Q-nets to operate concurrently, each from their own (trained) perspective. Then, when faced with Rubin's vase, the P-net is expected to recognize the silhouetted faces, while the Q-net is expected to see the vase. When allowed to engage in 'dialogue' with their respective outputs at a higher-order level, they will complement each other, one of them will prevail, given other clues that comes from other sources, perturbing or supporting the viewpoints in question.
When one of the networks, the P-net was trained to "recognize" digital forms, was exposed to the pattern (ii) in Figure A2.2, it generated forms conforming to the digits (1 2 5), while the Q-net, trained to "recognize" letter forms, generated forms conforming to the letters (I Z S). Thus, each came out with clear forms according to their respective "perspectives" when exposed to the partly deformed patterns of Fig. A2.2 (ii), and even to highly distorted forms conforming to the respective perspective. Thus, while a single network appears to defy "training to recognize" patterns of similar form that conform to elements of both the above series,3 the trained P- and Q-nets appeared capable of "recognizing" even highly distorted forms as conforming to viewpoints in terms of their respective perspectives.4
The fact that human perceivers -- when devoid of contextual clues -- can alternate between two such complementary viewpoints suggests that humans are able to house complementary viewpoints in parallel or near-parallel. The kind of perceptual alternating invited by Rubin's vase (Fig. A2.1), for example, appears to presuppose a structure capable of embodying complementary perspectives in concurrent or near-concurrent operation.
Altercentric (mirror) reversal in face-to-face learning situations
Structural prerequisites for the kind of perceptual alternation explored above may relate to a structural prerequisite for the kind of perceptual mirror reversal invited by the gestures and movements of a facing other. In the project now turned to I proceed from the assumption that competitive systems or networks subserving, respectively, egocentric and altercentric perception somehow must be at play.
Again, as in the above use of neurocomputational simulations to study the behaviours of implemented networks 'trained' to process in terms of rival viewpoints, we study the behaviour of networks 'trained', respectively, to reproduce a copy of a given input pattern and to generate the reverse of that pattern. Nothing much is expected to come out of such crude connectionist simulations providing, as it were, mostly an explorative playground. Yet, it is worth while pursuing for this reason: the very process of implementing and studying the behaviours of such competitive networks compels one to consider possible operational prerequisites for such processes in real life.
When asked to do what the facing model is doing
As has been documented in this volume and elsewhere (chapter 5 in Bråten (ed.) 1998: 105-124), infant learners appear able to feel to be moving with the other's movements, entailing a virtual mirror reversal of the facing other's movements as perceived. This I have termed 'altercentric participation', the very reverse of egocentric perception. For example, when an adult model raises her hands with palms outwards and asks a child facing her to do what she does, the ordinary child will imitate that gesture correctly with palms outwards (Figure A2.3 (left)). A child confined only to egocentric perception, however, will fail to execute such a mirror reversal. Identifying the inside of his own hands with the inside of the model's hands shown to him, such a child is expected to raise arms with palms inwards (Figure A2.3 (right)). This has been predicted to apply to autism from the assumption that the ordinary capacity for such mirror reversal has been impaired or blocked in children with autism, creating problems in face-to-face situations (Bråten 1994).
The illustration to the left in figure A2.3 pictures what normal children do when invited to do what the adult is doing, entailing mirror reversal, while someone with egocentric perspective, such as in autism, are expected to have problems: Seeing the resemblance between the inside of the model's hands with the insides of his own, and being incapable of a virtual reversal of the model's movements as felt, the child with autism who understands the request to do as the adult does, has been predicted to do what is being seen from own position, and will raise hands with inside inwards. This has been confirmed (cf. inter alia Whiten & Brown in Bråten (ed.) 1998:260-280).
On the neurosociological prediction and speculations about possible architecture
My neurosociological prediction (1997) that an altercentric (mirror) system would be found in humans (essay no 16, this volume), followed from the postulate of virtual other mechanism. Pondering upon the possible architecture by which the infant's bodily self and virtual other could complement each other, I realized that their relation would have to be chiral (Greek for handedness) like the way in which the left hand and the right hand only can become identical in form if one of them is reflected by a mirror.
The radical step, however, to the expectation that perhaps even neurons sensitized to altercentric perception might be found, was voiced upon reflecting on what I had learnt at a King's College workshop on perception of subjects and objects. Here John O'Keefe (1992) raised the issue of the relation between self-consciousness and allocentric maps, that is maps of the landscape in front of you, represented in a such manner that you represent it independent of your gaze direction in relation to the landscape, transcending a view from an egocentric perspective. O'Keefe and Nadel (1978) suggest that the hippocampus implements a cognitive map and performs spatial computation. Rats are able to find their way in an environment even when novel trajectories are necessary, they hold, in virtue of hippocampal maps that holds information about allocentric space, as contrasted to egocentric space. In a study of a monkey moved to different places in a spatial environment, O'Keefe (1984) found evidence of place cells, dependent upon the place where the monkey was, and different from view cells, defined primarily by the view of the environment, and not by the place where the monkey was. Feigenbaum and Rolls (1991) had investigated whether the spatial views encoded by primate hippocampal neurons use egocentric or some form of allocentric coordinates (see Rolls 1995).
This inspired me, then, to venture the prediction that neurons sensitized to altercentric (mirror) reversal would be found in infant learners who re-enact a facing model's novel movements or gestures. When considered realized at the neurophysiological level, it would entail that there be neural cells responding to the gestural movements of others not just egocentrically (that is view-dependent), not just allocentrically (that is independent of your own position and perspective), but altercentrically, that is from the other's position or perspective. If such neural cells should not be found, then infant learners' altercentric participating in the facing model's movements would have to be subserved by some higher-order system for mirror reversal.
Thus, in addition to networks operating upon egocentric and allocentric cells in the learner's system, alterocentric cells or a higher order mirror systems is expected to exist and be activated in order that the learner be able to do what the facing others does, i.e. to re-enact from the learner's position what the learner has felt to be co-enacting with the model in virtue of alteroceptive reversal of the model's movements.
May such an altercentric (mirror) system be operative already in neonatal imitation?
One may speculate about whether or not such a mirror system is dependent upon being sensitized or "trained" in order to be operative. I expect that it is. While predicting it to be innate, I am reluctant to attribute such a mirror system already in operation in neonatal imitation in the first hour of life. But this is a matter for further investigation. Heimann (1997, personal communication) suspect it to be at play already from the outset. There is, indeed, the possibility is that is already present at birth, while not observed in children who later are diagnosed as autistic (cf. Heimann's analyses in Bråten (ed.) 1989:89-104).
When some neonates in Meltzoff & Moore's (1989) study exhibit reversed head rotation when imitating the facing model after a pause, this might indicate neonatal capacity for mirror reversal. And yet, even if there may be such capacity already at birth, I would expect it to be dependent upon interactional nurture in order to be sustained and strengthened. While capable of including actual others in their companion space in felt immediacy, young infants would depend, I suspect, on such face-to-face nurture in felt immediacy in order that such a mirror reversal system be sensitized and "trained" to respond to companions in an altercentric manner. The first months -- before infants engage with their companions in joint attention towards objects in the surround -- may turn out to be critical in that respect.
In 1973 Malsburg presents a nerve net model for the visual cortex of higher vertebrates. His point of departure are the findings of Hubel and Wiesel 1962, 1963). They found neurones selectively sensitive to the presentation of light bars and edges of a certain orientation to be organized in functional columns according to orientation, and such that neighbour columns on the cortical surface tend to respond to stimuli of similar orientation. Malsburg, then, models the orientation sensitive cells in terms of these two mechanisms: development of pattern sensitive cortical cells by a self-organizing process involving synaptic "learning", and the ordering of functional columns as a consequence of intracortical connections. A certain proportion of the model cells is orientation sensitive already before the learning principle is applied. When subjected to "training" to only a restricted set of stimuli during the training period, the model entails that the cortical neurones will specialize to these stimuli and become insensitive to other stimuli. This is consistent with what Wiesel (1987) reports about how newborn monkeys and kittens exhibit marked changes in the ocular dominance column when influenced by visual training.
This, then, when applied to human infants, would lead one to expect that altercentric neuronal firing, complementing the egocentric neurons of the bodily self, would be dependent upon critical nurture during the first months in order to complement the kind of allocentric neurons stimulated and trained when infants join with companions in paying attention to objects in the common surroundings.
There is, however, another possibility. Instead of attributing the capacity for alteroceptive reversal to rest upon and be dependent upon specific altercentric nerve cells, whether or not dependent upon training, the capacity for alteroceptive reversal may rest upon a more globally organized capacity which operates upon egocentric and allocentric, place-dependent and view-dependent neural responses, and carry out such translation and alteroceptive reversal in a supramodal manner. Hence, a more global mirror system for altercentric participation may be envisaged
On connectionist simulation of some elements
The above are the kind of questions in the background for the crude connectionist explorations in progress, albeit without any expectations of illuminating reply. My limited objective is just to demonstrate the operational feasibility of 'training' different versions of implemented 'neural net' simulators to reproduce input patterns, corresponding to the Gestalt of a manual gesture, in a copying (egocentric) manner and in a reversed (altercentric) manner.
We first begun by way of connectionist simulation such different responses to the arm raising Gestalt pictured in figure A2.3. Although we did manage to train for reversal, it turned out to be too cumbersome to explore by our fairly simple networks, incapable as they were to respond to movements; we had to translate into a sequence of pattern snapshots representing the move from lowered to raised hands.
While I hope to renew our efforts with more refined network models with reference to arm raising, I found in the meantime a simpler pattern which could be explored without interposing representations of movements. This is the modern hand sign for 'Jesus', represented by a gesture marking a stigma in the hand, illustrated to the left of Figure A2.4. At the top to the left is shown the hand sign as seen by a perceiver facing the enactor of the sign. Below is shown the reversed image corresponding to how the perceiver upon re-enactment of the gesture will see the sign from his own (egocentric) stance. For the facing perceiver, then, in order to re-enact the gesture as seen made by the other, the image has to be reversed. This inversion pattern has turned out to be easy to explore even by a simple three-layer recurrent network (of the family shown in Figure A2.4(ii), but without contextual nodes), compared with recognition of from an egocentric stance.
The three layer architecture of simple recurrent networks (Elman 1990; Plunkett & Elman 1997) has been used as a first candidate in these explorations of the hand sign. It has been designed for connectionist simulation of perceptual detection of patterns displaced in time. Such an implemented network consists of one input layer (symbolized at the bottom in Figure A2.4 (ii)), one output layer (symbolized at the top), and an intermediate layer with 'hidden nodes'.5
For example, in some of our attempts, using a low-resolution image of the hand sign, we have used two identical three-layer networks, and Ego-net and an Alter-net, each with an input layer of 838 nodes and an output layer of 838 nodes, and with 50 nodes in the intermediate (hidden) layer. While the Ego-net is trained to reproduce an image of the input pattern without any reversal, the Alter-net is trained to reproduce the reverse image (corresponding to the pattern at the bottom of Figure A2.4 (I)). Although it should have been expected, given the design of tlearn programme, I was surprised to see that given the same number of training sweeps6 (400 presentations of the input pattern), the Ego-net and the Alter-net did not differ much in terms of error or time. It made me realize that if there be distinct systems, subserving respectively egocentric and altercentric perception in humans, then upon being sensitized or trained, they may not differ much in terms of expediency and delay time. If, on the other hand, the altercentric system depends on egocentric input for its subsequent mirror reversal, then differences in delay time should be expected.
In another experimental run, the Ego-net was compared to an Alter-net that alternatively was trained to respond both in terms of an ego-input and in a reversed manner. Given the same number of training sweeps, the Ego-net learnt to reproduce its target more speedily and accurately than the Alter-net. Again, this is to be expected.
Preliminary conclusion
The above leads me to suspect that an evolved systems sensitized to altercentric (mirror) perception may be as speedy in operation as a system operating from an egocentric perspective, incapable of reversal. At least this is what our crude and preliminary exploration with simple recurrent networks trained from these rival perspectives. While due to the architecture of the tlearn networks used, for that simple architecture it did not much matter whether the target pattern was reversed or not. For a network, however, that has to cope both with an egocentric target and an altercentric target it certainly makes a difference.
Thus, explorations even with a crude and simple network architecture may have put us on the track of illuminating this question: does mirror reversal depend on altercentric neurons being discharged, or rather on a system at a higher operative level that also has to cope with egocentric input? If the latter is the case, then time difference should be expected.
For those who would object to the kind of reductionism they see to be entailed by the above prediction, speculations, and crude connectionist simulations, I would like to point out by way of conclusion: They concern questions about how the very self-other connecting link between face-to-face participants may be subserved by, not reduced to, neuropsychologcial mechanisms. Bearing in mind the dynamic interpersonal companion systems level examined in many of the essays in this volume, I expect tenable replies to entail a transition from neuropsychology to neurosociology.
.
1 This appendix includes notes for a talk November 19, 1996, at the weekly seminar in my Theory Forum group 1966-67 at The Centre for Advanced Study, Oslo, The first part refer to a project on visual ambiguity, carried out with Rune Espelid at Bergen Scientific Centre IBM (Bråten & Espelid 1989), in which Fredrik Manne and Petter Møller assisted with implementation in C. The second part contains preliminary notes for a project in progress, with Anders Nøklestad as my research assistant, to explore by crude connectionist simulations competing networks trained to process input patterns as presented and by mirror reversal. For the hand-raising pattern (fig. A2.3)) programmes have been implemented in Java 1.02, while for the 'Jesus figure' (fig. A2.4) the tlearn programme (from the Oxford site, UK) is being used.
2 The question is relevant for the kind of visual ambiguity exhibited by technical document patterns for which the immediate neighbouring context provides no clues for correct interpretation by conventional automated means Normally, we use the context to disambiguate, for example when the same distinct pattern, such as this /-æ, in one instant may be seen as "H" and in the next instant may be seen as "A":
3 Two different adapted versions of the Neocognitron model have been implemented in C and run on IBM 3090/200VF and on an IBM 6150(RT/PC). Trained to recognize nine clearly distinguishable patterns conforming to the elements in this series, (0,1,2,..9), they are capable of recognizing partly deformed input patterns in the Gestalt of the respective numbers, and even to extract superposed forms. However, when confronted with ambiguous stimulus that may permit recognition both as characters and numbers, we were unable -- even when a variety of inhibition values were tried out -- to train a particular network to recognize characters and numbers in such a manner,, that it could recognize a character pattern and a number pattern in the same distorted stimulus.
4 So-called back-propagation, i.e. regulatory feedback correcting pattern based on some externally set criteria or target pattern, works within the limited perspective that each net has been "trained" for, but not across the networks, separately implemented and trained from different perspectives, implying different targets. Conversation in view of global or more distant contextual information from the past may be required for resolving their difference, for example, in the case of exposure to the series in Fig. A1.2 (ii), about how labels in this document tend to adhere to the rule (character, character, digit). Should later a different solution be required, there may be reversal to the discarded alternatives by virtue of the parallel competitive net.
5 In addition, special context nodes (symbolized by circles) may serve to store a copy of the hidden node activation pattern at t, feeding it to the hidden nodes at t+1. The hidden nodes activation patterns at t would thereby be afforded to the hidden nodes at t+1 in conjunction with the new input pattern at t+1.
6 During network training, error is assigned to the hidden units by back propagating the error from the output, where the error reflects. the deviance of the output pattern from the training target pattern. The error on the output and hidden units is used to change the weights on the intermediate and input layer. Simple recurrent network learning and backpropagation through time, of the kind implemented in tlearn, has been developed by Plunkett, Elman and others to explore processes in children's language acquisition, for example, learning the English past tense.