Download PDF

Ultra-low-power Digital Signal Processing for Wireless Sensor Network Nodes (Ultra-laag-vermogen Digitale Signaalverwerking voor Draadloze Sensorknopen)

Publication date: 2012-10-17

Author:

Walravens, Cedric
Dehaene, Wim

Abstract:

The trend towards smaller, portable and more capable electronic devices gives rise to a number of significant design and implementation problems of which the limited energy supply is the most determining factor. All these issues are jointly present in the field of Wireless Sensor Networks (WSN) which is consequently a suitable context for research opportunities.This work is part of the Pinballs project for developing a smart WSN platform of integrated devices that excel at their energy efficiency and their flexible application support. Central is the platform's 3-tier hierarchy which allows to trade power consumption for complexity at each of the three levels. This trade-off is the most explicit at the intermediate Pinballs level consisting of WSN nodes.The presented work focuses on the design and implementation of a novel energy-efficient data processing approach for sensor nodes. A data-driven, many-to-few and application-specific processing solution is created, called folded tree.During the design, abstraction and modelling are crucial to manage the complexity of such a modern digital system. SystemC has become a well-established standard to cover many abstractions in one single language. ActivaSC is created within this work as a flexible, fast and non-intrusive extension to SystemC for capturing model activity without any additional code changes. By using activity profiling the designer can gain valuable insights in the system which allow him to identify bottlenecks and fine-tune performance. Additionally, the activity information can be used by power macromodels to confidently estimate the system's energy consumption. Designers, however, must be aware of the assumptions that are made by the macromodel as they have a significant impact on the accuracy of the energy estimations. ActivaSC realises a speed-up of up to 75% for elaboration overhead and an average simulation speed-up of 20% compared to state-of-the-art profiling tools.Based on derived WSN requirements, it is shown how parallel prefix operations are a useful concept for their applications. They can be used to describe many of the basic building blocks found in WSN data processing algorithms, which provides flexibility. Obvious examples of algorithms which cannot benefit from the parallel prefix approach include sequential and control algorithms or data processing that does not necessarily follow a many-to-few way during computation, such as encryption algorithms. Further, the selection of the binary tree for calculating parallel prefix operations in hardware, followed by reuseand folding brings significant energy savings. This leads to a fundamentally different architecture for the kind of data processing found typically in WSNs. The implementation of the newly introduced folded tree architecture is an efficient realisation both in terms of area and power. The simplicity of the identical processing elements (PE) that constitute the folded tree network results in high integration, fast cycle time and lower power consumption. Combined with the flexibility to program the PEs using any combination of operators available in their data path, the folded tree has the freedom to run a variety of parallel prefix applications. A 130 nm silicon implementation of a 16-bit folded tree digital signal processor (DSP) with 8 PEs was made and measured. It consumes down to 8 pJ/cycle. Compared to existing commercial solutions, it consumes at least 10 times less overall energy and is 2 to 3 times faster for a range of relevant applications.The key strengths of the design are simplicity, uniformity and reuse. Energy is saved thanks to (1) limiting the data set by pre-processing with parallel prefix operations, (2) the reuse of the binary tree as a folded tree and (3) the combination of data flow and control flow elements to introduce a local distributed memory which removes the memory bottleneck found in traditional single-CPU solutions while retaining sufficient flexibility.