Spike Processing Machines
The brain must be capable of forming object representations that are invariant with respect to the large number of fluctuations occurring on the retina. These include object position, scale, pose and illumination, and the presence of clutter. What are some plausible computational or neural mechanisms by which invariance could be achieved in the spike domain? We pioneered the realization of identity preserving transformations (IPTs) on visual stimuli in the spike domain. The stimuli are encoded with a population of spiking neurons; the resulting spikes are processed and finally decoded. A number of IPTs have been demonstrated including faithful stimulus recovery, as well as simple transformations on the original visual stimulus such as translations, rotations and zooming.
- Aurel A. Lazar, Eftychios A. Pnevmatikakis and Yiyin Zhou, The Power of Connectivity: Identity Preserving Transformations on Visual Streams in the Spike Domain , Neural Networks, Vol. 44, pp. 22-35, 2013.
A visual illustration of Identity Preserving Transformations (IPTs) on a visual stream performed in the spike domain is shown below.
The original visual stimulus (far left, 256x256 pixels) is encoded with a Video Time Encoding Machine (Video TEM) realized with a neural circuit consisting of 18,605 Gabor receptive fields in cascade with the same size population of Integrate-and-Fire (IAF) neurons. In the second video from left, the recovery from a zooming in transformation in the spike domain is demonstrated. The area inside the dashed circle corresponds to the region inside the dashed circle in the original video. In the third video from left, the recovery from a zooming out transformation is shown. Finally, in the right most video, the recovery from a simultaneous zooming out and rotation transformation is depicted.
- Aurel A. Lazar and Yevgeniy B. Slutskiy, Multisensory Encoding, Decoding, and Identification, Advances in Neural Information Processing Systems 26 (NIPS*2013), edited by C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K.Q. Weinberger, December 2013.
The perceptual advantages of combining multiple sensory streams that provide distinct measurements of the same physical event are compelling, as each sensory modality can inform the other in environmentally unfavorable circumstances. For example, combining visual and auditory stimuli corresponding to a person talking at a cocktail party can substantially enhance the accuracy of the auditory percept. We investigated a biophysically-grounded spiking neural circuit and a tractable mathematical methodology that together allow one to study the problems of multisensory encoding, decoding, and identification within a unified theoretical framework.
A visual demonstration of decoding (demixing) a short audio and video stimulus encoded with a Multisensory Time Encoding Machine (mTEM) is shown below.
A multisensory TEM circuit consisting of 9,000 neurons was used to encode concurrent streams of natural audio and video into a common pool of spikes. Each neuron was modeled as an integrate-and-fire neuron with two receptive fields: a non-separable spatiotemporal receptive field for video stimuli and a temporal receptive field for audio stimuli. Spatiotemporal receptive fields were chosen randomly and had a bandwidth of 4 [Hz] in temporal direction ‘t’ and 2 [Hz] in each spatial direction ‘x’ and ‘y’. Similarly, temporal receptive fields were chosen randomly from functions bandlimited to 4 [kHz]. Thus, two distinct stimuli having different dimensions (three for video, one for audio) and dynamics (2-4 cycles vs. 4,000 cycles in each direction) were multiplexed at the level of every spiking neuron and encoded into an unlabeled sequence of spikes. The mTEM produced a total of 360,000 spikes in response to a 6-second-long video of Albert Einstein explaining the mass-energy equivalence formula E=mc2: "... [a] very small amount of mass may be converted into a very large amount of energy". A Multisensory Time Decoding Machine (mTDM) was then used to reconstruct the video and audio stimuli from the produced set of spikes. The first and third columns in this demo show the original (top row) and recovered (middle row) video and audio, respectively, together with the absolute error between them (bottom row). The second and fourth columns show the corresponding amplitude spectra of all signals.