For example, consider two different applications: FFT and binSearch. Since these applications use different subsets of the functionalities provided by the processor, the parts of the processor that they do not exercise are different. However, a closer look reveals that while some of the differences correspond to coarse-grained software-visible functionalities, other differences are fine-grained, software-invisible, and cannot be determined through application analysis.
The two applications use exactly the same instructions; however, die graphs in Figs 2 and 3 show that the sets of unexercised gates for the applications are different. This is due to the fact that even the sequence of instructions executed by an application can influence which logic the application can exercise in a processor depending on the micro-architectural details. Such interactions cannot be determined simply through application analysis.
Given that the fraction of logic in a processor that is not used by a given application can be substantial, and many area- and power-constrained systems only execute one or few applications for their entire lifetime, it may be possible to significantly reduce area and power in such systems by removing logic from the processor that cannot be used by the applications. However, since different applications can exercise substantially different parts of a processor, and simply profiling or statically analysing an application cannot guarantee which parts of the processor can and cannot be used by an application, tailoring a processor to an application requires a technique that can identify all the logic in a processor that is guaranteed to never be used by the application and remove unusable logic in a way that leaves the functionality of the processor unchanged for the application.
In the next section, we describe a methodology that meets these requirements. We call general-purpose processors that have been tailored to an individual application ‘bespoke processors’—reminiscent of bespoke clothing, in which a generic clothing item is tailored for an individual person.
Tailoring a bespoke processor to a target application
Bespoke processor design tailors a general-purpose processor IP to a target application by removing all gates from the design that can never be used by the application. The bespoke processor, tailored to the target application, must be functionally equivalent to the original processor when executing the application. It should retain all the gates from the original processor design that might be needed to execute the application.
Any gate that could be toggled by the application and propagate its toggle to a state element or output port performs a necessary function and must be retained to maintain functional equivalence. Conversely, any gate that can never be toggled by the application can safely be removed, as long as each fan-out location for the gate is fed with the gate’s constant output value for the application.
Removing constant gates for an application could result in significant area and power savings without any performance degradation. In addition, gate removal can expose additional timing slack, which can be exploited to increase area and power savings or performance of a bespoke design. On an average, bespoke processor design reduces area and power consumption by 62 per cent and 50 per cent, while exploiting exposed timing slack improves average power savings to 65 per cent.
The first step of tailoring a bespoke processor—input-independent gate activity analysis—performs a type of symbolic simulation, where unknown input values are represented as X’s, and gate-level activity of the processor is characterised for all possible executions of the application, for any possible inputs to the application.
The second phase of bespoke processor design technique—gate cutting and stitching—uses gate-level activity information gathered during gate activity analysis to prune away unnecessary gates and reconnect the cut connections between gates to maintain functional equivalence to the original design for the target application.
Input-independent gate activity analysis
The set of gates that an application toggles during execution can vary depending on application inputs. This is because inputs can change the control flow of execution through the code as well as data paths exercised by the instructions.
Since exhaustive profiling for all possible inputs is infeasible, and limited profiling may not identify all exercisable gates in a processor, the analysis technique is based on symbolic simulation. This technique is able to characterise the gate-level activity of a processor executing an application for all possible inputs with a single gate-level simulation. During this simulation, inputs are represented as unknown logic values (X’s), which are treated as both 1’s and 0’s when recording possible toggled gates.
Symbolic simulation has been applied in circuits for logic and timing verification, as well as sequential test generation. More recently, it has been applied in determination of application-specific Vmin. Symbolic simulation has also been applied for software verification. However, no existing technique seems to have applied symbolic simulation to create bespoke processors tailored to an application.
Fig. 4: The process for tailoring a bespoke processor to a target application