Lake Crest
a.k.a. Nervana Engine

Hardware optimized for deep learning
Lake Crest (coming in 2017) is an application specific integrated circuit (ASIC) that is custom-designed and optimized for deep learning.

What is Lake Crest?

Lake Crest includes everything you need for deep learning and nothing you don’t, resulting in a 10x increase in training speed and ensuring that Nervana Cloud will remain the world’s fastest deep learning platform for the foreseeable future! For more details on the Lake Crest, see Nervana Engine delivers deep learning at ludicrous speed or Venture Beat’s Intel will test Nervana’s ‘Lake Crest’ silicon in first half of 2017.

Lake Crest features

Blazingly fast data access via high-bandwith memory

Training deep learning networks involves moving a lot of data, and current memory technologies are simply not up to the task. The Nervana Engine uses a new memory technology called High Bandwidth Memory that is both high-capacity and high-speed, providing 32 GB of on-chip storage and a blazingly fast 8 Tera-bits per second of memory access speed.

Unprecedented computing power

The Nervana Engine design includes mostly multipliers and local memory and skips elements such as caches that are needed for graphics processing but not deep learning. As a result, the Nervana Engine achieves unprecedented compute density and an order of magnitude more raw computing power than today’s state-of-the-art GPUs.

Throughput near the theoretical limit

The Nervana Engine has separate pipelines for computation and data management, so new data is always available for computation.  This pipeline isolation, combined with plenty of local memory, means that the Nervana Engine can run near its theoretical maximum throughput much of the time.

Built-in networking for unprecedented speed and scalability

The Nervana Engine includes 12 bi-directional high-bandwidth links, enabling ASICs to be interconnected so that data can move between them — and even between chassis — in a seamless fashion. This enables users to get linear speedup on their current models by simply assigning more compute to the task, or to expand their model to unprecedented sizes without any decrease in speed. Competing systems use oversubscribed, low-bandwidth PCIe busses for all communication which greatly limits their ability to improve performance by adding more hardware.