Spiking Neural Networks: Research Projects Or Commercial Products?
200Shares
Opinions differ widely, but in this space that isn’t unusual.
MAY 18TH, 2020 - BY: BRYON MOYER
Spiking neural networks (SNNs) often are touted as a way to get close to the power efficiency of the brain, but there is widespread confusion about what exactly that means. In fact, there is disagreement about how the brain actually works.
Some SNN implementations are less brain-like than others. Depending on whom you talk to, SNNs are either a long way away or close to commercialization. The varying definitions of SNNs leads to differences in how the industry is seen.
“A few startups are doing their own SNNs,” said Ron Lowman, strategic marketing manager of IP at Synopsys. “It’s being driven by guys that have expertise in how to train, optimize, and write software for them.”
On the other hand, Flex Logix Inference Technical Marketing Manager Vinay Mehta said that, “SNNs are out further than reinforcement learning,” referring to a machine-learning concept that’s still largely in the research phase.
The entire notion of a “neural network” is motivated by attempts to model how the brain works. But current neural networks — like the convolutional neural networks(CNNs) that are so prevalent today – don’t follow the design of the brain. Instead, they rely on matrix multiplication for incorporating synaptic weights and gradient-descent algorithms for supervised training.
Those working on SNNs often refer to these as “classical” networks or “artificial” neural networks (ANNs). That said, Alexandre Valentian, head of advanced technologies and system-on-chip laboratory for CEA-Leti, noted that CNNs reflect more of an approach or type of application, while SNNs reflect an implementation. “CNNs can be implemented in spikes – it’s not CNN vs. SNN.”
Mimicking the brain
The notion of an SNN originates in the fact that the brain uses spikes to relay information. An important question, however, is how information is coded onto those spikes. Several ways are used in both research and development stages. This category of neural network is sometimes referred to as “neuromorphic,” in that it reflects the way the brain works. Classical networks are not neuromorphic, but some SNNs are more neuromorphic than others. As noted in a BrainChip paper, “… Today’s technology… is, at best, only loosely related to how the brain functions.”
Many of the SNN ideas are still in the exploration stage in academic institutions. Several papers at the 2019 IEDM conference dealt with implementations of SNNs with novel circuit techniques to achieve the goals of lower power. But there are also commercial companies working on SNNs. As identified at the recent Linley Spring Processor Conference, Intel has a serious research program going, while BrainChip and GrAI Matter Labs are readying commercial chips. The reason for this wide range between early research and commercial viability reflects a range of interpretations as to how an SNN can be implemented.
Some of the projects underway involve literal spikes, which are an analog phenomenon. But others abstract the notion of a “spike” into that of an “event,” and they implement them digitally as packets traveling through a network from neuron to neuron. The high-level effect, then, is to move from measuring everything all the time, as in a classical CNN, to dealing only with events. The power savings expected from SNNs is often thought to relate to the spikes themselves, but part of the gain comes from dealing with events. In other words, work happens only when there’s an interesting event to work with. Otherwise, no work (or less work) is done, keeping power low.
“If you don’t achieve [a neuron’s] activation threshold, no event is generated,” said Roger Levinson, COO of BrainChip. This corresponds to a high level of sparsity, which is coveted in classical networks.
Another feature of SNNs is the fact that events can excite or suppress a neuron. Events then can compete with each other, with some having an excitatory effect while others have an inhibitory effect. With classical networks, negative weights can reduce the magnitude of the resulting activations, but that’s more of a static representation of a video frame (or other data set) being evaluated rather than events pushing and pulling on the outcomes.
Coding values in spikes
One of the major distinctions between SNN implementations relates to what is referred to as “coding” – how a value is transformed into a stream of spikes. While there are several ways to do this, two appear to predominate many of the discussions: rate coding and temporal coding.
Rate coding takes a value and transforms it into a constant spike frequency for the duration of that value. The benefit of this approach is that classical training techniques can be used, with the resulting values then being transcoded for an SNN inference engine. Classical networks use an enormous amount of multiplication, which is energy-intensive. Spikes, by contrast, are simply accumulated, with no multiplication necessary. That said, each spike results in a synaptic-weight lookup, which also burns power, prompting Valentian to caution that it’s not clear that this approach is lower in power.
Temporal coding is said by some to be closer to what happens in the brain, although there are differing opinions on that, with some saying that that’s the case only for a small set of examples: “It’s actually not that common in the brain,” said Jonatha Tapson, GrAI Matter’s chief scientific officer. An example where it is used is in owl’s ears. “They use their hearing to hunt at night, so their directional sensitivity has to be very high.” Instead of representing a value by a frequency of spikes, the value is encoded as the delay between spikes. Spikes then represent events, and the goal is to identify meaningful patterns in a stream of spikes.
A major challenge, however, is training, because classical training results cannot be transcoded into this type of SNN. There is no easily-obtained derivative of the spike train, making it impossible to use the gradient-descent approach to training. In general, Tapson said, “Temporal coding is horrible for electronics. It makes it hard to know if a calculation completes, and it is very slow.”
Temporally coded SNNs can be most effective when driven by sensors that generate temporal-coded data – that is, event-based sensors. Dynamic vision sensors (DVS) are examples. They don’t generate full frames of data on a frames-per-second basis. Instead, each pixel reports when its illumination changes by more than some threshold amount. This generates a “change” event, which then propagates through the network. Valentian said these also can be particularly useful in AR/VR applications for “visual odometry,” where inertial measurement units are too slow.
It’s possible that temporally-coded SNNs could work with shallower networks than the 50 to 100 (or more) layers we’re seeing with classical networks. “The visual cortex is only six layers deep, although that system isn’t purely feed-forward,” Valentian said. “There’s some feedback, as well.” Still, he noted that what’s lacking here is a killer application that will provide the energy and funding required to push temporal coding forward.
Meanwhile, BrainChip started with rate coding, but decided that wasn’t commercially viable. Instead, it uses rank coding (or rank-order coding), which uses the order of arrival of spikes (as opposed to literal timing) to a neuron as a code. This is a pattern-oriented approach, with arrivals in the prescribed order (along with synaptic weighting) stimulating the greatest response and arrivals in other orders providing less stimulation.
All of these coding approaches aside, GrAI Matter uses a more direct approach. “We encode values directly as numbers – 8- or 16-bit integers in GrAI One or Bfloat16 in our upcoming chip. This is a key departure from other neuromorphic architectures, which have to use rate or population or time or ensemble codes. We can use those, too, but they are not efficient,” said Tapson.
Neurons
SNN neurons typically are implemented in one of two ways. The approaches are motivated by analog implementations, although they can be abstracted into digital equivalents. Arteris IP fellow and chief architect Michael Frank refers to this as “emulation.” He points to several challenges for an analog implementation: “With analog, you would need to customize the model to the specific chip for inference. No two transistors are the same. And at 7 nm, you can’t do analog.”
Tapson concurs. “For a large circuit, you need to be digital,” he said.
The idea behind the two abstract neural approaches is that a neuron evaluates a signal by accumulating spikes. The simplest implementation is called “integrate-and-fire” (IF). Each spike is accumulated in the neuron until a threshold is reached, at which point the neuron fires an output spike – that is, it creates an event that propagates downstream in the network (at least for a feed-forward configuration). Many of the academic projects ongoing implement this as a literal analog circuit, and in operation it’s philosophically similar to sigma-delta modulation.
The challenge here, especially for temporal coding, is that patterns may inadvertently appear over a long time period. What are two events separated in time may be interpreted as a single pattern, since early accumulation remains in place as new spikes arrive.
In order to neutralize older “obsolete” results as newer ones arrive, a “leaky integrate-and-fire” (LIF) circuit can be used. This means that accumulations gradually dissipate over time so that, given enough time between events, accumulation restarts from a low level.
Another element that can reverse accumulation is an inhibitory event. Accumulation assumes excitatory events that add to the accumulation, but inhibitory events accumulate negative values, reducing the level of accumulation.
A team from CEA-Leti discussed an analog SNN using RRAM in a paper presented at the 2019 IEDM conference. While RRAM has been used in classical networks as a way of implementing in-memory computation of multiply-accumulate functions, its usage here is different. Eight cells are used, four each for excitation and inhibition, with anywhere from 0 to 4 of the resistors being programmed in a low-resistance state. Low resistance means more current and, hence, a stronger weight. The more cells in a low-resistance state, the greater the overall synaptic current. The following image shows the Leti synapse design.
Packets can be broadcast to the destination neurons with an identifying tag. Then receiving neurons will know which tag to pay attention to, giving the effect of multi-cast. In this way, spikes arrive at the intended neurons for processing, while other neurons ignore them. This gives the input side of the neuron a many-to-one relationship, while the output has a one-to-many relationship.
Frank indicated there should not be issues with collisions on the network. Sensor data is generated at a rate of around 500 samples per second, while the network is clocked at hundreds of megahertz. This leaves plenty of room for time-sharing data so that individual spike deliveries can appear to be concurrent. If there is any issue with collisions, Frank noted that the network can be divided into domains to reduce their impact.
Timing also has a role here. Frank noted that Intel’s Loihi network is asynchronous. “If you use a synchronous approach, it’s probably too high power for a large network.”
A selection of projects
The range of approaches to SNNs is illustrated by reviewing several of the more prominent ones. There are many more projects underway at academic institutions and possibly at other commercial companies as well, so this list will by no means be exhaustive.
We’ve already seen some of what CEA-Leti has been working on. Their IEDM paper claims this is the first full network implementation using spikes, analog neurons, and RRAM synapses. It’s a single-layer, fully-connected network with 10 output neurons corresponding to the 10 classes used for MNIST image classification. Inference is considered complete when the difference between the highest-spiking output and the next-highest-spiking one exceeds a threshold. They’ve shown an equivalence between this and the classical tanh activation function.
BrainChip has an all-digital implementation, which allows it to be implemented on any CMOS process (unlike analog). A conceptual view of their architecture is shown in Figure 6.
While NPU details or images are not available, BrainChip did further explain that each NPU has digital logic and SRAM, providing something of a processing-in-memory capability, but not using an analog-memory approach. An NPU contains eight neural processing engines that implement the neurons and synapses. Each event is multiplied by a synaptic weight upon entering a neuron.
The company noted that its use of event-domain convolution allows it to use IF neurons rather than LIF, since this approach results in much simpler hardware. In order to deal with the issue of straggling spikes creating an inadvertent pattern, BrainChip frames the time so that, once that frame is completed, subsequent spikes will start afresh.
Training is a topic the company does not talk much about. It refers to training as “semi-supervised.” BrainChip bases its proprietary learning algorithms on a training notion referred to as Spike Timing-Dependent Plasticity, or STDP, as well as some reinforcement learning concepts. It does the training with fully connected layers in a feed-forward manner that it says is orders of magnitude faster than what is typical with classical networks. The company also is working on unsupervised learning – that is, the ability to train a network without giving it pre-labeled samples – for its next generation architectures.
Unusually, BrainChip has the ability to do some further training in the field on a deployed device. It refers to this as “incremental training,” which leverages the existing training model but allows for the device to be further trained in the field. This is done by removing the last network layer (which does classification) and replacing it with a fully connected layer. The device can then “relearn” the existing classes (the last layer only, as prior layers remain unchanged) while adding new classes to the capabilities of the network. The company does this with labeled samples, but it can add new classes with a single image instead of hundreds or thousands of images.
GrAI Matter also is doing an all-digital implementation. It uses an on-chip packet-switched network to route the “spikes.” GrAI Matter’s overall architecture is shown below (the node implementation is shown above in Figure 5). The company trains its chip using classical techniques, converting the result to the GrAL Matter format for implementation.
Others expressed concern, as well. “[Research] papers are aimed at much simpler models [than what are implemented with classical networks],” said Geoff Tate, CEO of Flex Logix. “It’s far from commercialization.”
It’s also not necessarily an either-or situation: “You could have a network that’s partly classical and partly SNN. An example is sensor fusion, with video as classical and sound as SNN,” said Leti’s Valentian.
Arteris IP’s Frank sees a future for SNNs. “SNNs have their domain where they will outrun a standard network. Even a digital emulation of an SNN is better than a classical CNN,” he said.
The success of early commercial entrants, as well as Intel’s Loihi research project, will be indicators of whether SNNs eventually can bring their much-anticipated power savings into the market for good.
BRN Price at posting:
5.4¢ Sentiment: Hold Disclosure: Held