NeuroInformation Processing Machines
We are currently creating, implementing and testing a collection of algorithms, open-source software tools and environments for phase and spike processing.
Spike Processing Machines
The brain must be capable of forming object representations that are invariant with respect to the large number of fluctuations occurring on the retina. These include object position, scale, pose and illumination, and the presence of clutter. What are some plausible computational or neural mechanisms by which invariance could be achieved in the spike domain? We pioneered the realization of identity preserving transformations (IPTs) on visual stimuli in the spike domain. The stimuli are encoded with a population of spiking neurons; the resulting spikes are processed and finally decoded. A number of IPTs have been demonstrated including faithful stimulus recovery, as well as simple transformations on the original visual stimulus such as translations, rotations and zooming.
- Aurel A. Lazar, Eftychios A. Pnevmatikakis and Yiyin Zhou, The Power of Connectivity: Identity Preserving Transformations on Visual Streams in the Spike Domain , Neural Networks, Vol. 44, pp. 22-35, 2013.
A visual illustration of Identity Preserving Transformations (IPTs) on a visual stream performed in the spike domain is shown below.
The original visual stimulus (far left, 256x256 pixels) is encoded with a Video Time Encoding Machine (Video TEM) realized with a neural circuit consisting of 18,605 Gabor receptive fields in cascade with the same size population of Integrate-and-Fire (IAF) neurons. In the second video from left, the recovery from a zooming in transformation in the spike domain is demonstrated. The area inside the dashed circle corresponds to the region inside the dashed circle in the original video. In the third video from left, the recovery from a zooming out transformation is shown. Finally, in the right most video, the recovery from a simultaneous zooming out and rotation transformation is depicted.
- Aurel A. Lazar and Yevgeniy B. Slutskiy, Multisensory Encoding, Decoding, and Identification, Advances in Neural Information Processing Systems 26 (NIPS*2013), edited by C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K.Q. Weinberger, December 2013.
The perceptual advantages of combining multiple sensory streams that provide distinct measurements of the same physical event are compelling, as each sensory modality can inform the other in environmentally unfavorable circumstances. For example, combining visual and auditory stimuli corresponding to a person talking at a cocktail party can substantially enhance the accuracy of the auditory percept. We investigated a biophysically-grounded spiking neural circuit and a tractable mathematical methodology that together allow one to study the problems of multisensory encoding, decoding, and identification within a unified theoretical framework.
A visual demonstration of decoding (demixing) a short audio and video stimulus encoded with a Multisensory Time Encoding Machine (mTEM) is shown below.
A multisensory TEM circuit consisting of 9,000 neurons was used to encode concurrent streams of natural audio and video into a common pool of spikes. Each neuron was modeled as an integrate-and-fire neuron with two receptive fields: a non-separable spatiotemporal receptive field for video stimuli and a temporal receptive field for audio stimuli. Spatiotemporal receptive fields were chosen randomly and had a bandwidth of 4 [Hz] in temporal direction ‘t’ and 2 [Hz] in each spatial direction ‘x’ and ‘y’. Similarly, temporal receptive fields were chosen randomly from functions bandlimited to 4 [kHz]. Thus, two distinct stimuli having different dimensions (three for video, one for audio) and dynamics (2-4 cycles vs. 4,000 cycles in each direction) were multiplexed at the level of every spiking neuron and encoded into an unlabeled sequence of spikes. The mTEM produced a total of 360,000 spikes in response to a 6-second-long video of Albert Einstein explaining the mass-energy equivalence formula E=mc2: "... [a] very small amount of mass may be converted into a very large amount of energy". A Multisensory Time Decoding Machine (mTDM) was then used to reconstruct the video and audio stimuli from the produced set of spikes. The first and third columns in this demo show the original (top row) and recovered (middle row) video and audio, respectively, together with the absolute error between them (bottom row). The second and fourth columns show the corresponding amplitude spectra of all signals.
Phase Processing Machines
Although images can be represented by their global phase alone, phase information has largely been ignored in the field of linear signal processing and for good reason. Phase-based information processing is intrinsically non-linear. Recent research, however, showed that phase information can be smartly employed in speech processing and visual processing. For example, spatial phase in an image is indicative of local features such as edges when considering phase congruency. We have devised a motion detection algorithm based on local phase information and constructed a fast, parallel algorithm for its real-time implementation. Our results suggest that local spatial phase information may provide an efficient alternative to perform many visual tasks in silico as well as in vivo biological vision systems.
- Aurel A. Lazar, Nikul H. Ukani and Yiyin Zhou, A Motion Detection Algorithm Using Local Phase Information, Computational Intelligence and Neuroscience, Volume 2016, January 2016.
The following video shows motion detected in the "train station video" by the phase-based motion detection algorithm, and compares the result with that of the Reichardt motion detector and the Barlow-Levick motion detector.
The top row shows motion detected by the phase-based motion detection algorithm (left), the Reichardt motion detector (middle) and the Barlow-Levick motion detector (right). Red arrows indicate detected motion and its direction. On the bottom row, the contrast of the video was artificially reduced 5 fold and the mean was reduced to 3/5 of the original. The red arrows shown are duplicated from the output of the motion detectors on the original video as on the top row. Blue arrows are the result of motion detection with reduced contrast. If motion is detected both in the original video and in the video with reduced contrast, then the corresponding arrow is shown in magenta (as a mix of blue and red).