Heterogeneous optimization

The computing power of each processor is fully utilized to enhance the overall computing performance.

Deep model customization

The model algorithm characteristics and hardware characteristics are combined to transform the model data and algorithms.

How it works

Rational division of computing tasks and allocation of tasks according to the computing characteristics and computing power of GPU/NPU enable efficient parallel computing by each processor.

The computation graph is reconstructed according to the computing power and instruction set characteristics of each hardware architecture (NPU/GPU), and the operators are rationally merged and split. Meanwhile, on the instruction layer, measures such as improving the cache hit rate, single-cycle multi-instruction parallel execution and efficient instruction pipelining are used to improve computing efficiency.

User perception

Users experience faster text generation when using large model-related apps.
Current POC gains (compared to QNN):

Power consumption: Over 20% reduction per token

Performance: 30% improvement in the decoding phase

*Data sourced from Lenovo Labs. Feature performance is for reference only and actual experience may vary.
*Power consumption and performance data sourced from Lenovo Labs. Testing conducted on YOGA Pad Pro AI YuanQi Edition using the Qwen2-7b model and the same test items for a comparative analysis of our proprietary inference engine versus the QNN-based engine in power consumption and performance. Actual results may vary.