The Third Wave: Photonic Neuromorphic Computing and the Physics of Intelligence

The history of computing is a history of running into physical limits and finding a way around them. Vacuum tubes hit size limits. Transistors replaced them. Transistors hit density limits. Parallel architectures emerged. GPUs hit energy and memory bandwidth limits at inference scale. The industry is still looking for the way around that one.

The answer is not a faster GPU. The answer is a different physics.

Two technologies are converging at the frontier right now that most people in AI have not yet registered as relevant to their work. Spiking Neural Networks represent a fundamentally different computational model than the transformers and convolutional networks that dominate the field. Photonic computing represents a fundamentally different physical substrate than silicon. Together, they point toward a third wave of AI hardware that does not just improve on the current paradigm but replaces it entirely.

Why Silicon AI Has a Physics Problem

To understand why this matters, you need to understand what is actually happening inside a GPU when it runs a neural network.

A transformer doing inference is primarily doing matrix multiplication. Large weight matrices getting multiplied against activation vectors, over and over, billions of times per second. The GPU is designed for exactly this. Thousands of CUDA cores operating in parallel, each performing multiply-accumulate operations on 16-bit or 8-bit numbers.

The problem is not the computation. The problem is the memory.

Every matrix multiplication requires reading weight values from memory, moving them to the compute units, performing the operation, and writing results back. Modern GPUs have hundreds of gigabytes of HBM memory connected by interfaces running at terabytes per second. That connection is still the bottleneck. Moving data costs more energy than computing with it. For a large language model, roughly 60-70% of the energy budget is memory bandwidth, not arithmetic.

This is Amdahl's Law applied to thermodynamics. You cannot GPU your way out of it. Making the compute faster does not help when the memory access is the constraint. You need a different architecture where weights and computation coexist without a bus between them, or where the computation happens in a physical medium that does not require moving electrons across a circuit.

Photonic computing addresses the second option. Neuromorphic computing addresses the first.

What Photonic Computing Actually Is

Light does not experience resistance. Photons passing through an optical medium do not heat it the way electrons heat a wire. This has been understood for a century. The engineering challenge has been building computational systems that exploit it.

Photonic neural networks perform matrix multiplication in the optical domain. The key operation is the Mach-Zehnder interferometer: a device that splits a beam of light, applies a phase shift to one arm, and recombines the two beams. The interference between them produces an output intensity that is a linear function of the input. String together a mesh of these interferometers and you have a programmable matrix multiplier, operating at the speed of light, consuming energy only to modulate and detect the signal, not to perform the arithmetic itself.

The energy advantage is significant. An optical matrix multiplication consumes energy proportional to the photodetection at the output, not to the computation itself. For large matrices, the asymptotic energy per operation approaches zero as matrix size grows, because the optical propagation is passive. The bottleneck becomes the analog-to-digital conversion at the boundary with the digital domain, not the multiply-accumulate operations in between.

The latency advantage is more striking. Light propagates through silicon waveguides at roughly 70% of the speed of light in vacuum. A chip-scale photonic circuit performs its matrix multiplication in picoseconds. The same operation in a GPU takes nanoseconds, orders of magnitude slower, because the electrons have to travel through transistors that each require a charging and discharging cycle.

For inference, where the weight matrices are fixed and the bottleneck is throughput, photonic accelerators offer something the silicon roadmap cannot: computation that is fundamentally faster and more energy-efficient at the physics level, not just at the engineering level.

The challenge is that photonic computation is inherently analog. Phase shifts in an interferometer are real-valued, not quantized. Noise accumulates across a deep optical circuit. Programming a photonic mesh means setting phase shifters to precise values, which requires calibration and drifts over time. The integration with conventional digital systems requires careful management of the analog-digital boundary.

These are engineering problems. They are not fundamental limits.

Spiking Neural Networks: The Computational Model That Matches the Hardware

Current neural networks, including every large language model in production, are built on artificial neurons that compute a weighted sum of inputs, apply an activation function, and pass a real-valued output forward. This is a continuous, synchronous, dense operation. Every neuron computes on every forward pass. The network is essentially a very large matrix multiplication pipeline.

Spiking Neural Networks work differently. Artificial neurons accumulate charge over time. When the membrane potential crosses a threshold, the neuron fires a spike: a binary, discrete event propagated to downstream neurons. After firing, the membrane potential resets and the neuron enters a refractory period before it can fire again.

This model has three properties that matter for efficiency.

First, sparsity. At any given moment, only a small fraction of neurons in a spiking network are firing. The rest are silent and consume almost no power. In biological cortex, the average firing rate is 1-5 Hz against a background silence. In an SNN with similar dynamics, the majority of neurons do nothing on any given timestep. Computation is proportional to activity, not to network size.

Second, temporal coding. Information in an SNN is represented not just in whether a neuron fires, but in when it fires relative to other neurons. This temporal dimension carries information that rate-coded networks cannot express. A pattern of spike timings can encode more information in fewer spikes than a vector of firing rates requires.

Third, event-driven computation. An SNN only computes when spikes arrive. On neuromorphic hardware, this means the chip draws power only when activity propagates through it. A sensor input that produces no change produces no spikes and draws no power. The computation is proportional to the information content of the input, not to the clock rate.

The Training Problem

If SNNs are so efficient, why does every production AI system use dense continuous networks instead?

The answer is the gradient.

Backpropagation, the algorithm that makes training neural networks practical, requires computing the gradient of the loss with respect to every weight in the network. This requires the activation function to be differentiable. In a standard network, the ReLU, sigmoid, and softmax functions are all differentiable almost everywhere. Gradients flow back through them cleanly.

The spiking neuron's activation is a Heaviside step function. It is zero until the membrane potential crosses the threshold, then one. Its derivative is a Dirac delta at the threshold: zero everywhere else, infinite at the crossing point. Backpropagation through a Heaviside function does not work. The gradient either vanishes or explodes. Standard training algorithms fail.

This was the core barrier to spiking networks for two decades. The networks were computationally attractive. The training was intractable.

The solution that has emerged is surrogate gradient descent. The idea is conceptually simple even if the implementation requires care: during the forward pass, the network computes real spikes using the Heaviside function. During the backward pass, instead of computing the gradient of the Heaviside (which is pathological), the gradient of a smooth surrogate function is used instead. Common choices include the sigmoid derivative, the piecewise linear function, or the triangular function centered at the firing threshold.

The surrogate gradient is a lie told to the optimizer. The true gradient does not exist, so a plausible substitute is used in its place. In practice, this works surprisingly well. The network learns to fire spikes at the right times even though the gradient signal it receives during training is an approximation.

Surrogate gradient methods have been combined with several other training strategies. ANN-to-SNN conversion takes a trained continuous network and converts its activations to spike rates, treating the firing rate of each neuron as an approximation of the original neuron's activation value. This is computationally expensive and introduces latency, but it allows the large ecosystem of trained continuous models to be deployed on neuromorphic hardware without training from scratch. Direct training with surrogate gradients produces networks that are more efficient in terms of spike count but requires careful initialization and hyperparameter tuning.

A third approach, inspired by biological learning, uses Spike-Timing Dependent Plasticity. STDP is an unsupervised Hebbian learning rule: synapses strengthen when the presynaptic neuron fires before the postsynaptic neuron (causal ordering) and weaken in the opposite case. This requires no global gradient signal and is local to each synapse, making it implementable in hardware directly. The limitation is that pure STDP cannot optimize arbitrary loss functions. It is better suited to unsupervised feature extraction than to end-to-end supervised learning.

The Convergence: Photonic Spiking Networks

Here is where the two threads meet.

Photonic systems are exceptionally good at linear operations: matrix multiplication, convolution, Fourier transforms. They are less naturally suited to the nonlinear operations that give neural networks their expressive power. The spiking nonlinearity in an SNN, the threshold-and-fire, maps directly onto the nonlinear optical behavior of certain photonic components.

Optical spiking neuron implementations have been demonstrated using several physical mechanisms. Vertical-cavity surface-emitting lasers (VCSELs) exhibit excitable behavior: inject enough optical power and they emit a brief pulse before returning to rest, analogous to the action potential of a biological neuron. Microring resonators can be designed to exhibit bistability: two stable optical states separated by a threshold, switching between them in response to input optical power. Semiconductor optical amplifiers show saturable absorption that creates spike-like dynamics when driven near threshold.

The implication is that a photonic neuromorphic network could compute the linear integration of inputs in the optical domain (passively, with near-zero energy cost) and compute the nonlinear spiking threshold also in the optical domain (using the intrinsic physics of the photonic device rather than a digital circuit). The entire computational primitive of a spiking neuron, integrate and fire, could be implemented in silicon photonics without any conversion to the electronic domain.

A large network of such neurons connected by photonic waveguides would perform SNN inference at the speed of light, with the energy consumption of a passive optical system rather than an active electronic one. The combination of photonic linear algebra and neuromorphic nonlinearity resolves the two biggest problems simultaneously: the energy cost of matrix multiplication and the overhead of synchronous electronic spiking.

This is not yet a product. It is a research direction that several groups worldwide are working toward, and the foundational demonstrations have been made. The engineering challenges are substantial: integrating photonic and electronic components at scale, managing thermal drift in optical phase shifters, achieving the fabrication precision that silicon photonics requires. But none of these are fundamental limits. They are manufacturing and engineering problems.

What This Means for Edge AI

The near-term application is not replacing data center GPUs, where the economics are driven by factors beyond raw energy efficiency. The near-term application is edge inference: AI computation at the point of sensing, without a connection to a data center, under power budgets that silicon AI cannot meet.

An always-on visual system that processes a continuous event stream from a silicon retina, running an SNN that detects anomalies and classifies objects, drawing less power than a small LED. That system does not exist in production today because the software infrastructure for SNNs has not been accessible enough and the hardware has required too much expertise to program.

The gap between the hardware capability and the accessible software layer is exactly what Vantar is building toward. The Nuro SDK handles the SNN training and deployment abstractions. The neuromorphic and photonic hardware provides the physical substrate. The event camera provides the input that matches the temporal dynamics of the network. The three components fit together because they share the same fundamental computational model: sparse, asynchronous, event-driven computation that is proportional to information content rather than to a fixed clock rate.

The Longer Arc

Situate this against the current moment in AI and a pattern emerges.

The LLM era runs on transformer architectures trained on internet-scale datasets, requiring data center infrastructure and measured in GPU-hours and electricity bills. The applications this enables are transformative. The economics are also strained: inference cost, energy consumption, and hardware dependency are all growing faster than the value per query can justify at scale.

The generation of AI systems after this one will run closer to the edge. Robotics that perceive and act in real time. Sensors that compute rather than stream. Medical devices that diagnose continuously. Industrial systems that detect anomalies at the point of sensing rather than sending data to a cloud for analysis. These applications have power budgets in milliwatts, latency requirements in microseconds, and privacy requirements that preclude cloud connectivity.

None of them can run on a GPU. All of them could run on a neuromorphic system. And a photonic neuromorphic system would do it with energy consumption that approaches the physical minimum.

The third wave of AI hardware is not the next generation of GPU. It is a different physics and a different computational model. It is being built right now, in research labs and startups and the intersection between photonics and neuroscience that most of the AI industry has not yet registered as the direction the field is heading.

The hardware has always been the constraint. The question is which hardware, and what it runs.