Technology

Deep learning market and platforms

Training and inference demand massive compute resources that utilize expensive and power-hungry GPUs. Consequently, deep learning is performed in the cloud or in large on-prem data centers. Training new models take days and weeks to complete, and inference queries suffer from long latencies of the round-trip delays to and from the cloud.

Yet, the data which feeds into the cloud systems for updating the training models and inference queries is generated mostly at the edge – in stores, factories, terminals, office buildings, hospitals, city streets, 5G cell sites, vehicles, farms, homes and hand-held mobile devices. Transporting the explosion of data to and from the cloud or data center leads to unsustainable network bandwidth, high cost and slow responsiveness as well as compromises data privacy and security and reduces device autonomy and application reliability.
Deep learning needs to advance from a cloud-centric approach to an intelligent edge whereby both training and inference are processed close to where the data is generated and utilized. While solutions for inference at the edge are emerging, training is still regarded as a cloud/data center only function, preventing AI solutions from quickly learning and adapting new data and becoming truly agile, scalable and responsive

TO OVERCOME THESE LIMITATIONS, DEEP-AI HAS UNIQUELY DEVELOPED AN INTEGRATED, HOLISTIC AND EFFICIENT TRAINING AND INFERENCE DEEP LEARNING SOLUTION FOR THE EDGE. WITH DEEP-AI, APPLICATION DEVELOPERS CAN DEPLOY AN INTEGRATED TRAINING-INFERENCE SOLUTION WITH REAL-TIME RETRAINING OF THEIR MODEL IN PARALLEL TO ONLINE INFERENCE ON THE SAME DEVICE.

At the core of our technology is the ability to train at 8bit fixed-point and achieve trained models that are highly sparse

AS OPPOSED TO 32-BIT FLOATING-POINT AND NO SPARSITY WHICH IS THE NORM TODAY WITH GPUS

Innovative adaptive high-sparsity techniques allow further reduction in the overall compute and data storage needed, enabling further performance boost and efficiencies. Our adaptive algorithms compensate for the lower precision of 8 bit fixed point versus 32 bit floating point and the high sparsity levels to minimize any reduction in training accuracy. For edge applications, where the use cases are typically the retraining of pre-trained models with incremental data updates, the training accuracy is maintained in most cases and with minimal reduction in other cases.

We deploy our technology on off-the-shelf Programmable Accelerators (FPGA) PCIe cards, eliminating the need for GPUs, and providing ~10X gain in performance/power or performance/cost versus a GPU. FPGAs are today widely deployed in both data centers and edge devices, where they accelerate a wide variety of acceleration workloads. Recent advancements in deep learning enable inference with 8-bit fixed-point formats, and as FPGAs feature many 8-bit MACs they are gaining momentum for inference usage. Because they can be programmed to optimized data-paths, FPGAs also feature very low-latency inference.

DEEP-AI’S BREAKTHROUGH TECHNOLOGY TAKES A HUGE STEP FORWARD BY ALSO ENABLING TRAINING ON FPGAS WITH 8-BIT FIXED-POINT NUMBER FORMATS, AND RUNNING BOTH TRAINING AND INFERENCE ON THE SAME FPGA PLATFORM