Short-Term Depression in VLSI Stochastic Synapse Peng Xu, Timothy K. Horiuchi, and Pamela Abshire Department of Electrical and Computer Engineering, Institute for Systems Research University of Maryland, College Park, MD 20742 pxu,timmer,pabshire@umd.edu Abstract We report a compact realization of short-term depression (STD) in a VLSI stochastic synapse. The behavior of the circuit is based on a subtractive single release model of STD. Experimental results agree well with simulation and exhibit expected STD behavior: the transmitted spike train has negative autocorrelation and lower power spectral density at low frequencies which can remove redundancy in the input spike train, and the mean transmission probability is inversely proportional to the input spike rate which has been suggested as an automatic gain control mechanism in neural systems. The dynamic stochastic synapse could potentially be a powerful addition to existing deterministic VLSI spiking neural systems. 1 Introduction Synapses are the primary locations in neural systems where information is processed and transmitted. Synaptic transmission is a stochastic process by nature, i.e. it has been observed that at central synapses transmission proceeds in an all-or-none fashion with a certain probability. The synaptic weight has been modeled as R = npq [1], where n is the number of quantal release sites, p is the probability of release per site, and q is some measure of the postsynaptic effect. The synapse undergoes constant changes in order to learn from and adapt to the ever-changing outside world. Synaptic efficacy can increase or decrease within milliseconds after the onset of specific temporal patterns of activity, and the changes can last from milliseconds to days. The variety of synaptic plasticities differ in the triggering condition, time span, and involvement of pre- and postsynaptic activity. Regulation of the vesicle release probability has been considered as the underlying mechanism for various synaptic plasticities [1­3]. Adaptive VLSI synapses have been extensively studied and developed as the central units of neurally inspired VLSI learning systems. Floating-gate single transistor synapses have been proposed where ¨ adaption can be achieved locally in parallel and over long times [4]. Hafliger and Mahowald [5] developed a synapse with weight change depending on the temporal correlation of spikes. Silicon synapses with short-term depression (STD) have been developed and modeled, with the weight of the synapse implemented by a gate voltage which decreases after each presynaptic spike and recovers between the spikes [6]. Short-term facilitation and depression have also been implemented using a current mirror integrator [7]. Synapses with temporally-asymmetric Hebbian learning rules have been implemented [8] as well as synapses with spike timing dependent plasticity (STDP) [9]. All these adaptive synapses are deterministic VLSI synapses. Alternatively stochastic synapses transmit spikes according to a transmission probability. Stochastic synapses have been difficult to implement in VLSI because it is hard to properly harness the probabilistic behavior, normally provided by noise. Although stochastic behavior in integrated circuits has been investigated in the context of random number generators (RNGs) [10], these circuits either are too complicated to use for a stochastic synapse or suffer from poor randomness. Therefore other approaches were explored to bring randomness into the systems. Stochastic transmission was implemented in software using a 1 lookup table and a pseudo random number generator [11]. Stochastic transition between potentiation and depression has been demonstrated in bistable synapses driven by stochastic spiking behavior at the network level for stochastic learning [12]. Previously we reported the first VLSI stochastic synapse. The circuit is compact ( 15 transistors) and the experimental results demonstrated true randomness as well as the adjustable transmission probability. We also proposed the method to implement plasticity and demonstrated the implementation of STD by modulating the probability of spike transmission. Like its deterministic counterpart, this stochastic synapse operates on individual spike train inputs; its stochastic character, however, creates the possibility of a broader range of computational primitives such as rate normalization of Poisson spike trains, probabilistic multiplication, or coincidence detection. In this paper we extend the subtractive single release model of STD to the VLSI stochastic synapse. We present the simulation of the new model. We describe a novel compact VLSI implementation of a stochastic synapse with STD and demonstrate extensive experimental results showing the agreement with both simulation and theory over a range of conditions and biases. 2 VLSI Stochastic Synapse and Plasticity Vicm Vr Vi+ Vg+ Vdd2 Ibias Vc M1 M2 Vpre~ VoVw M3 M5 Vo+ C M4 M7 M6 Vw Vo+ VgVicm Vr ViVp Vh Vtran Vbias Vdd Vdd Vpre Vo- Figure 1: Schematic of the stochastic synapse with STD. Previously we demonstrated a compact stochastic synapse circuit exhibiting true randomness and consuming very little power (10-44 µW). The core of the structure is a clocked, cross-coupled differential pair comparator with input voltages Vi+ and Vi- , as shown in the dashed box in Fig. 1. It uses competition between two intrinsic circuit noise sources to generate random events. The differential design helps to reduce the influence from other noise sources. When a presynaptic spike arrives, Vpre goes low, and transistor M5 shuts off. Vo+ and Vo- are nearly equal and the circuit is in its metastable state. When the two sides are closely matched, the imbalance between Vo+ and Vo- caused by current noise in M1-M4 eventually triggers positive feedback, which drives one output to Vc and the other close to ground. We use a dynamic buffer, shown in the dotted box in Fig. 1, to generate rail-to-rail transmitted spikes Vtran . Vtran either goes high (with probability p) or stays low (with probability 1 - p) during an input spike, emulating stochastic transmission. Fabrication mismatch in an uncompensated stochastic synapse circuit would likely permanently bias the circuit to one solution. In this circuit, floating gate inputs to a pFET differential pair allow the mismatch to be compensated. By controlling the common-mode voltage of the floating gates, we operate the circuit such that hot-electron injection occurs only on the side where the output voltage is close to ground. Over multiple clock cycles hot-electron injection works in negative feedback to equalize the floating gate voltages, bringing the circuit into stochastic operation. The procedure can be halted to achieve a specific probability or allowed to reach equilibrium (50% transmission probability). The transmission probability can be adjusted by changing the input offset or the floating gate charges. The higher Vg+ is, t1e lower pvis. The probability tuning function is closely fitted by h , an error function f (v ) = 0.5 + erf - µ 2 where µ is the input offset voltage for p = 50%, 2 is the standard deviation characterizing the spread of the probability tuning, and v = Vi- - Vi+ is the input offset voltage. Synaptic plasticity can be implemented by dynamically modulating the probability. Input offset modulation is suitable for short-term plasticity. Short-term depression is triggered by the transmitted input spikes Vtran to emulate the probability decrease because of vesicle depletion. Short-term facilitation is triggered by the input spikes Vpre to emulate the probability increase because of presynaptic Ca2+ accumulation. Nonvolatile storage at the floating gate is suitable for long-term plasticity. STDP can be implemented by modulating the probability depending on the precise timing relation between the pre- and postsynaptic spikes. 3 Short-Term Depression: Model and Simulation Although long-term plasticity has attracted much attention because of its apparent association with learning and memory, the functional role of short-term plasticity has only recently begun to be understood. Recent evidence suggests that short-term synaptic plasticity performs temporal filtering [13] and is involved in many functions such as gain control [14], phase shift [15], coincidence detection, and network reconfiguration [16]. From the perspective of information transmission, it has also been shown that depressing stochastic synapses can increase information transmission efficiency by filtering out redundancy in presynaptic spike trains [17]. Activity dependent short-term changes in synaptic efficacy at the macroscopic level are determined by activity dependent changes in vesicle release probability at the microscopic level. We will focus on STD here. STD during repetitive stimulation results from a decrease in released vesicles. Since there is a finite pool of vesicles, and released vesicles cannot be replenished immediately, a successful release triggered by one spike potentially reduces the probability of release triggered by the next spike. We propose an STD model based on our VLSI stochastic synapse that closely emulates the simple subtractive single release model [17, 18]. A presynaptic spike that is transmitted reduces the input offset voltage v at the VLSI stochastic synapse by v , so that the transmission probability p(t) is reduced. Between successful releases, v relaxes back to its maximum value vmax exponentially with a time constant d so that p(t) relaxes back to its maximum value pmax as well. The model can be written as v (t+ ) = v (t- ) - v , successful transmission at t dv (t) d = vmax - v (t) dt p(t) = f (v (t)) (1) (2) (3) For an input spike train with Poisson arrivals, the model can be expressed as a stochastic differential equation vmax - v dv = dt - v · dNp·r(t) (4) d where dNp·r(t) is a Poisson counting process with rate p · r(t), and r(t) is the input spike rate. By taking the expectation E (·) on both sides, we obtain a differential equation dE (v ) vmax - E (v ) = - v · E (p)r(t) dt d (5) When v is reduced, the probability that it will be reduced again becomes smaller. v is effev tively c b 1 - constrained to a small range where we can approximate the function f (v ) = 0.5 + erf 2µ y a linear function f (v ) = av + 0.5, where µ = 0 for simplicity. We can then solve for E (p) at steady state: avmax + 0.5 pmax 1 pss (6) 1 + av d r av d r r Therefore the steady state mean probability is inversely proportional to the input spike rate when av d r 1. This is consistent with prior work that modeled STD at the macroscopic level [14]. 1 v , We simulated the model (1)-(3). We use the function f (v ) = 0.5 + erf 2·2.16 obtained from the best fit of the experimental data. Initially v is set to 5 mV which sets pmax close to 1. Although the transformation from v to p is nonlinear, both simulation and experimental data show 3 that this implementation exhibits behavior similar to the model with the linear approximation and the biological data. Fig. 2(a) and 2(b) show that the mean probability is a linear function of the inverse of the input spike rate at various v and d for high input spike rates. Both v and d affect the slope of the linear relation, following the trend suggested by (6): the bigger the v or the bigger the d , the smaller the slope is. Fig. 3 shows a simulation of the transient probability for a period of 200 ms. Fig. 4 shows that the output spike train exhibits negative autocorrelation at small time intervals and lower power spectral density (PSD) at low frequencies. This is a direct consequence of STD. 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.002 0.004 0.006 0.008 0.01 v = 2 mV v = 4 mV v = 6 mV 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.002 0.004 0.006 0.008 0.01 d = 100 ms d = 200 ms d = 300 ms p 1/r p 1/r (a) v = 2, 4, 6 mV, d = 100 ms. (b) d = 100, 200, 300 ms, v = 2 mV. Figure 2: Mean probability as a function of input spike rate from simulation. Data were collected at input rates from 100 Hz to 1000 Hz at 100 Hz intervals. The solid lines show the least mean square fit for input rates from 400 Hz to 1000 Hz. 0.6 0.5 0.4 p(t) 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 180 200 Time (ms) Figure 3: Simulated probability trajectory over 200 ms period. r = 100 Hz, = 100 ms, v = 2 mV. 4 VLSI Implementation of Short-Term Depression We implemented this model using the stochastic synapse circuit described above (see Fig. 1). Both inputs are restored up to an equilibrium value Vicm by tunable resistors implemented by subthreshold pFETs operating in the ohmic region. To change the transmission probability we only need to modulate one side of the input, in this case Vi- . The resistor and capacitor provide for exponential recovery of the voltage to its equilibrium value. The input Vi- is modulated by transistors M6 and M7 based on the result of the previous spike transmission. Every time a spike is transmitted successfully, a pulse with height Vh and width Tp is generated at Vp . Tp is same as the input 4 0.1 0.08 Autocorrelation 0.06 0.04 0.02 0 -0.02 0 PSD (dB) 10 20 Intervals 30 40 50 20 0 -20 -40 -60 -80 0 10 20 30 Frequency (Hz) 40 50 (a) Autocorrelation. (b) Power spectral density. Figure 4: Characterization of the output spike train from the simulation of the stochastic synapse with STD. r = 100 Hz, d = 200 ms, v = 6 mV, Vmax = 5 mV. spike pulse width. This pulse discharges the capacitor with a small current determined by Vw and reduces Vi- by a small amount, thus decreasing the transmission probability. The value of the tunable resistors is controlled by the gate voltage of the pFETs, Vr . When Vi- is reduced, the probability that it will be reduced again becomes smaller. Since the probability tuning only occurs in a small voltage range ( 10 mV), the change in Vi- is limited to this small range as well. Under this special condition, the resistance implemented by the subthreshold pFET is linear and large ( G). With capacitance as small as 100 fF, the exponential time constant is tens of milliseconds and is adjustable. Similar control circuits can be applied to Vi+ to implement short-term facilitation. The update mechanism would then be driven by the presynaptic spike rather than the successfully transmitted spike. The extra components on the left provide for future implementation of short-term facilitation and also symmetrize the stochastic synapse, improving its randomness. 5 Experimental Results The circuit has been fabricated in a commercially-available 0.5 µm CMOS process with 2 polysilicon layers and 3 metal layers. The layout size of the stochastic synapse is 151.9 µm × 91.7 µm and the layout size of the STD block is 35 µm × 32.2 µm. A 2-to-1 multiplexer with size 35 µm × 30 µm is used to enable or disable STD. The circuit uses a nominal power supply of 5 V for normal operation. The differential pair comparator uses a separate power supply for hot-electron injection. Each floating-gate pFET has a tunnelling structure, which is a source-drain connected pFET with its gate connected to the floating node. A separate power supply provides the tunnelling voltage to the shorted source and drain (tunnelling node). When the tunnelling voltage is high enough (14-15 V), electron tunnels through the silicon dioxide, from the floating gate to the tunnelling node. We use this phenomenon to remove electrons from the floating gate only during initialization. Alternatively Ultra-Violet (UV) activated conductances may be used to remove electrons from the gate to avoid the need for special power supplies. To begin the test, we first remove residual charges on the floating gates in the stochastic synapse. We set Vicm = 2 V. We raise the power supply of the differential pair comparator to 5.3 V to facilitate the hot-electron injection. We use the negative feedback operation of hot-electron injection described above to automatically program the circuit into its stochastic regime. We halt the injection by lowering the power supply to 5 V. During this procedure, STD is disabled, so that the probability at this operating point is the synaptic transmission probability without any dynamics. We then enable STD. We use a signal generator to generate pulse signals which serve as input spikes. Although spike trains are better modeled by Poisson arrivals, the averaging behavior should be similar for deterministic spike trains which make testing easier. We use Ibias = 100 nA. The power consumption of the STD block is much smaller than the stochastic synapse. The total power consumption is about 10 µW. 5 We collect output spikes from the depressing stochastic synapse at an input spike rate of 100 Hz. We divide time into bins according to the input spike rate so that in each bin there is either 1 or 0 output spike. In this way, we convert the output spike train into a bit sequence s(k). We then compute the normalized autocorrelation, defined as A(n) = E (s(k )s(k + n)) - E 2 (s(k )), where n is the number of time intervals between two bits. A(0) gives the variance of the sequence. For two bits with distance n > 0, A(n) = 0 if they are independent, indicating good randomness, and A(n) < 0 if they are anticorrelated, indicating the depressing effect of preceding spikes on the later spikes. Fig. 5 shows the autocorrelation of the output spike trains at two different Vr . There is significant negative correlation at small time intervals and little correlation at large time intervals, as expected from STD. Fig. 6 shows the PSD of the output spike trains from the same data shown in Fig. 5. Clearly, the PSD is reduced at low frequencies. The time constant of STD increases with Vr so that the larger Vr is, the longer the period of the negative autocorrelation is and the lower the frequencies where power is reduced. This agrees with simulation results. Notice that the autocorrelation and PSD for Vr = 1.59 V show very close similarity to the simulation results in Fig. 4. Normally redundant information is represented by positive autocorrelation in the time domain, which is characterized by power at low frequencies. By reducing the low frequency component of the spike train, redundant information is suppressed and overall information transmission efficiency is improved. If the negative autocorrelation of the synaptic dynamics matches the positive autocorrelation in the input spike train, the redundancy is cancelled and the output is uncorrelated [17]. Vr = 1.56 V 0.25 0.2 Autocorrelation 0.15 0.1 0.05 0 -0.05 -0.1 0 10 20 30 Intervals 40 50 Autocorrelation 0.1 0.08 0.06 0.04 0.02 0 -0.02 0 10 20 30 Intervals 40 50 Vr = 1.59 V Figure 5: Autocorrelation of output spike trains from the VLSI stochastic synapse with STD for an input spike rate of 100 Hz. Autocorrelation at zero time represents the sequence variance, and negative autocorrelation at short time intervals indicates STD. Vr = 1.56 V 20 0 PSD (dB) -20 -40 -60 -80 0 10 20 30 40 Frequency (Hz) 50 PSD (dB) 20 0 -20 -40 -60 -80 0 10 20 30 40 Frequency (Hz) 50 Vr = 1.59 V Figure 6: Power spectral density of output spike trains from the VLSI stochastic synapse with STD for an input spike rate of 100 Hz. Lower PSD at low frequencies indicates STD. We collect output spikes in response to 104 input spikes at input spike rates from 100 Hz to 1000 Hz with 100 Hz intervals. Fig. 7(a) shows that the mean transmission probability is inversely proportional to the input spike rate for various pulse widths when the rate is high enough. This matches the theoretical prediction in (6) very well. By scaling the probability with the input spike rate, the synapse tends to normalize the DC component of input frequency and preserve the neuron dynamic 6 range, thus avoiding saturation due to fast firing presynaptic neurons and retaining sensitivity to less frequently firing neurons [14]. The slope of mean probability decreases as the pulse width increases. Since the pulse width determines the discharging time of the capacitor at Vi- , the larger the pulse width, the larger the v is and the smaller the slope is. Fig. 7(b) shows that av d scales linearly with the pulse width. The discharging current is approximately constant, thus v is proportional to the pulse width. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.002 0.004 0.006 0.008 0.01 0.01 10 20 30 40 50 10 us 20 us 30 us 40 us 50 us 0.04 0.03 av p d 0.02 1/r Pulse width (µs) (a) Mean probability as a function of input spike rate for pulse width Tp =10, 20, 30, 40, 50 µs. Data were collected at input rates from 100 Hz to 1000 Hz at 100 Hz intervals. The dotted lines show the least mean square fit from 200 Hz to 1000 Hz. (b) av d as a function of the pulse width. The dotted line shows the least mean square fit, f (x) = 0.0008x + 0.0017. Figure 7: Steady state behavior of VLSI stochastic synapse with STD for different pulse widths. We perform the same experiments for different Vr and Vw . As Vr increases, the slope of mean transmission probability as a linear function of 1 decreases. This is due to the increasing d = RC , r where the equivalent resistance R from the pFET increases with Vr . Fig. 8(a) shows that av d is approximately an exponential function of Vr , indicating that the equivalent R of the pFET is approximately exponential to its gate voltage Vr . For Vw , the slope of mean transmission probability decreases as Vw increases. This is due to the increasing v with Vw . Fig. 8(b) shows that av d is approximately an exponential function of Vw , indicating that the discharging current from the transistor M6 is approximately exponential to its gate voltage Vw . This matches the I-V characteristics of the MOSFET in subthreshold. 0.14 0.12 0.1 0.12 0.1 0.08 avd 0.08 0.06 0.04 0.02 1.55 avd 1.56 1.57 1.58 1.59 0.06 0.04 0.02 0 0.3 0.35 0.4 0.45 0.5 Vr (V) Vw (V) (a) av d as a function of Vr . The dotted line shows the least mean square fit, f (x) = e(44.54x-72.87) . (b) av d as a function of Vw . The dotted line shows the least mean square fit, f (x) = e(15.47x-9.854) . Figure 8: The effect of biases Vr and Vw on the depressing behavior. 7 6 Conclusion We designed and tested a VLSI stochastic synapse with short-term depression. The behavior of the depressing synapse agrees with theoretical predictions and simulation. The strength and time duration of the depression can be tuned by the biases. The circuit is compact and consumes low power. It is a good candidate to bring randomness and rich dynamics into VLSI spiking neural systems, therefore to increase the communication efficiency, energy efficiency, and computational power of these systems. References [1] C. Koch, Biophysics of Computation: Information Processing in Single Neurons. New York, NY: Oxford University Press, 1999. [2] M. V. Tsodyks and H. Markram, "The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability," Proc. Natl. Acad. Sci. USA, vol. 94, pp. 719­ 723, 1997. [3] W. Senn, H. Markram, and M. Tsodyks, "An algorithm for modifying neurotransmitter release probability based on pre- and postsynaptic spike timing," Neural Computation, vol. 13, pp. 35­67, 2000. [4] C. Diorio, P. Hasler, B. A. Minch, and C. Mead, "A single-transistor silicon synapse," IEEE Trans. Electron Devices, vol. 43, pp. 1972­1980, Nov. 1996. ¨ [5] P. Hafliger and M. Mahowald, "Spike based normalizing Hebbian learning in an analog VLSI artificial neuron," Int. J. Analog Integr. Circuits Signal Process., vol. 18, no. 2-3, pp. 133­139, 1999. [6] S.-C. Liu, "Analog VLSI circuits for short-term dynamic synapses," EURASIP Journal on Applied Signal Processing, vol. 2003, pp. 620­628, 2003. [7] E. Chicca, G. Indiveri, and R. Douglas, "An adaptive silicon synapse," in Proc. IEEE Int. Symp. Circuits Systems, vol. 1, Bangkok, Thailand, May 2003, pp. 81­84. [8] A. Bofill, A. F. Murray, and D. P. Thompson, "Circuits for VLSI implementation of temporally asymmetric Hebbian learning," in Advances in Neural Information Processing Systems, S. B. T. G. Dietterich and Z. Ghahramani, Eds. Cambridge, MA, USA: MIT Press, 2002. [9] G. Indiveri, E. Chicca, and R. Douglas, "A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity," IEEE Trans. Neural Networks, vol. 17, pp. 211­221, 2006. [10] C. S. Petrie and J. A. Connelly, "A noise-based IC random number generator for applications in cryptography," IEEE Trans. Circuits Syst. I, vol. 47, no. 5, pp. 615­621, May 2000. [11] D. H. Goldberg, G. Cauwenberghs, and A. G. Andreou, "Probabilistic synaptic weighting in a reconfigurable network of VLSI integrate-and-fire neurons," Neural Networks, vol. 14, pp. 781­793, 2001. [12] S. Fusi, M. Annunziato, D. Badoni, A. Salamon, and D. J. Amit, "Spike driven synaptic plasticity: theory, simulation, VLSI implementation," Neural Computation, vol. 12, pp. 2227­2258, 2000. [13] E. S. Fortune and G. J. Rose, "Short-term synaptic plasticity as a temporal filter," Trends Neurosci., vol. 24, pp. 381­385, July 2001. [14] L. F. Abbott, J. A. Varela, K. Sen, and S. B. Nelson, "Synaptic depression and cortical gain control," Science, vol. 275, pp. 220­224, 1997. [15] F. S. Chance, S. B. Nelson, and L. F. Abbott, "Synaptic depression and the temporal response characteristics of V1 cells," J. Neurosci., vol. 18, no. 12, pp. 4785­4799, 1998. [16] F. Nadim and Y. Manor, "The role of short-term synaptic dynamics in motor control," Curr. Opin. Neurobiol., vol. 10, pp. 683­690, Dec. 2000. [17] M. S. Goldman, P. Maldonado, and L. F. Abbott, "Redundancy reduction and sustained firing with stochastic depressing synapses," J. Neurosci., vol. 22, no. 2, pp. 584­591, 2002. [18] R. S. Zucker, "Short-term synaptic plasticity," Ann. Rev. Neurosci., vol. 12, pp. 13­31, 1989. 8