INTRODUCTION:
In
labs at Sun Microsystems, Intel and IBM, clock less chips have increased the
pace at which high-end processors do their work. In 1997, Intel developed an
asynchronous, Pentium-compatible test chip that ran three times as fast, on
half the power, as its synchronous equivalent.
Fant has focused on still another benefit of
asynchronous design. Because these chips give off no regularly timed signal,
the way clocked circuits do, they can perform encryption in a way that is
harder to identify and to crack. Improved encryption makes asynchronous
circuits an obvious choice for smart cards—the chip-endowed plastic cards
beginning to be used for such security-sensitive applications as storage of
medical records, electronic funds exchange and personal identification. .
SMT offers chip designers a big performance
gain for a relatively small design change; asynchronous logic involves a far
more dramatic rethink. As its name suggests, it does away with the cardinal
rule of chip design: that everything marches to the beat of an oscillating
crystal “clock”. For a 1GHz chip, this clock ticks one billion times a second,
and the entire chip’s processing unit’s co-ordinate their actions with these
ticks to ensure that they remain in step. Asynchronous, or “clock less”,
designs, in contrast, allow different bits of a chip to work at different
speeds, sending data to and from each other as and when appropriate.
In recent years,
however, clock less designs have started to look more appealing. One reason is
that, as chips get bigger, faster and more complicated, distributing the clock
signal around the chip becomes harder. Another drawback with clocked designs is
that they waste a lot of energy, since even inactive parts of the chip have to
respond to every clock tick. Clocked chips also produce electromagnetic
emissions at their clock frequency, which can cause radio interference.
Asynchronous
designs take up more room on a chip than conventional designs, and there are
far fewer design tools available to help create them, It is believes that such chips will have twice
the power of conventional designs, which will make them ideal for use in
high-performance computers. The most promising application for asynchronous
chips may be in mobile wireless devices and smartcards.
One
of the advantages of clock less chips is that they give off very low levels of
electromagnetic noise. The faster the clock, the more difficult it is to
prevent a device from interfering with other devices; dispensing with the clock
all but eliminates this problem. The combination of low noise and low power
consumption makes asynchronous chips a natural choice for mobile devices.
"The low-hanging fruit for clock less chips will be in communications
devices," starting with cell phones.
Wireless devices
based on asynchronous chips would run for longer between recharges, and their
circuitry would cause less radio interference. Dr Furber is developing
asynchronous chips for such devices in conjunction with ARM, a British company
whose processors appear in many handheld computers and mobile phones. Philips,
a Dutch electronics firm, has already built a pager that uses asynchronous
logic, and Theseus Logic of Orlando, Florida, is also pursuing low-power
wireless applications.
CLOCKLESS CHIPS:
There are no purely
asynchronous chips yet. Instead, today’s clock less processors are actually
clocked processors with asynchronous elements. Clock less elements use perfect
clock gating, in which circuits operate only when they have work to do, not
whenever a clock ticks. Instead of clock-based synchronization, local
handshaking controls the passing of data between logic modules. The
asynchronous processor places the location of the stored data it wants to read
onto the address bus and issues a request for the information. The memory reads
the address off the bus, finds the information, and places it on the data bus.
The memory then acknowledges that it has read the data. Finally, the processor
grabs the information from the data bus. Pipeline controls and FIFO sequencers
move data and instructions around and keep them in the right order. According
to Jorgenson, “Data arrives at any rate and leaves at any rate. When the
arrival rate exceeds the departure rate, the circuit stalls the input until the
output catches up.” The many handshakes themselves require more power than a
clock’s operations. However, clock less systems more than offset this because,
unlike synchronous chips, each circuit uses power only when it performs work.
CLOCK LESS ARM996HS
The compact, clock less ARM996HS processor is an ideal
solution for automotive, medical and deeply embedded control applications
because of its extremely low power consumption and low Electro Magnetic
Interference (EMI). The ARM996HS processor is the industry’s first licensable clock
less processor and directly addresses the needs of design engineers for
technology optimized for robust and real-time chip designs.
The ARM996HS processor utilizes Handshake Solutions’
technology, a production-proven methodology for implementing self-timed
circuitry which has been used in hundreds of millions of chips for smart cards,
advanced pagers, in-vehicle network transceivers, and cordless handsets. By
leveraging this low EME technology, the ARM996HS processor can address the
market need for very low EMI to reduce packaging and shielding costs.
Additionally, the processor can be used in applications that require low
current consumption and an extended battery life while maintaining real-time
application responsiveness and a small footprint.
By removing the clock and associated architecture of
standard ICs, clock less designs offer significant reductions in power consumption
and EMI enabling designers to squeeze more functionality into limited power
budgets. Handshake Solutions offers an extremely disciplined design methodology
that delivers the unique benefits of clock less design within a commercially
available design environment. The methodology is supported by easy-to-use and
thoroughly field-tested tools, design services, and standard IP blocks from
Handshake Solutions
The new ARM996HS processor is
optimized for use in both synchronous (clocked) and asynchronous (clock less)
system-on-chip designs, enabling easy integration by ARM semiconductor
Partners. Engineers can easily design in the new processor using standard cell libraries such as the
ARM Metro family for low power, and their existing EDA tools, for lower costs
and shorter design cycles. The key benefits of the processor include low EMI,
reducing the probability of interfering with sensitive circuitry; low current
peaks, enabling easier integration with analog components; and low power
consumption, reducing system power requirements. Because clock less processors
consume zero dynamic power when there is no activity, they can significantly
extend battery life
FUNCTIONING
Clock less logic
uses a system of local handshaking rather than a global clock to define
transaction timing. Handshakes are implemented by means of simple request and
acknowledge signals that mark the validity of the data. This approach means
that only those parts of a system actively involved in task execution draw
power, reducing active standby power consumption to zero and extending battery
lifetimes. Execution of individual functions need not wait for the next clock
pulse, enabling immediate response to exceptional events. Wake-up from interrupt can even be performed
without a running clock. Handshake Solutions’ clock less designs supports
integration with synchronous blocks and systems. This means that the clock less
ARM996HS processor can be easily integrated into a conventional, clocked SoC
design. Because they exhibit no ground bounce, Handshake Solutions functions
can be quickly and easily combined with clocked logic, analog, RF or memory
blocks to meet the exact system requirements.
A COMPLETE CLOCK LESS DESIGN SOLUTION
Although the
theoretical principles of clock less design are well-defined, implementation of
clock less logic within the context of a complex system design has not,
historically, been straightforward. Potential issues included the availability
of design flow support, the need for special clock less standard-cell
libraries, the area overhead arising from the control logic necessary to
implement the self-timing, and ensuring that clock less designs are testable
using industry-supported techniques.
Handshake
Solutions has brought clock less technology to a level of maturity where the
technology has now been proven on many designs. With dozens of production
designs incorporating Handshake Technology successfully complete, over a
hundred million Handshake Technology-based ICs have been manufactured, tested
and sold. Developing a rigorous approach has solved the problems that have been
experienced in the practical application of asynchronous technologies in the
past.
Once the Verilog netlist has been generated, the back-end design to GDSII can follow any industry-standard implementation flow. This means that for licensees of the ARM996HS who wish to integrate the processor into their SoC, it is not necessary to adopt the full Handshake Solutions design flow, but simply take the Verilog netlist and incorporate it as an IP block. The methodology also supports scan-test for complete design for test (DFT) solutions, ensuring that design teams can make use of standard automatic test pattern generation (ATPG) techniques
The ARM996HS is a new 32-bit RISC processor core –
part of the ARM9E family of cores. The ARM9E family is based on the ARMv5TE
Instruction Set Architecture (ISA), which includes the 16-bit Thumb® and 32-bit
ARM instruction sets. The Harvard-architecture core is based on a 5-stage
integer pipeline with fast 32-bit multiply-accumulate (MAC) block. It has
tightly coupled memories (TCMs) for both instruction and data, each of which
can be up to 4MB. Dual AMBA™ AHB-Lite™ synchronous buses provide the
instruction and data interfaces. Specific security enhancements for the
ARM996HS core include a memory protection unit (MPU) and the provision of
non-maskable interrupts (NMI). A hardware divide co-processor is also provided.
CLOCK LESS PIPELINE
The pipeline within the ARM996HS core mirrors the
normal ARMv5TE pipeline, with the exception that it is implemented as a clock
less design. Instead of a global synchronous clock, dedicated control logic
ensures that each stage in the pipeline is enabled only when required. The
pipeline handshakes with the system controller to fetch instructions and to
load and store data.
ENHANCED MEMORY PROTECTION UNIT
In order to enable software re-use from several
different sources, it is vital that there is a mechanism to allow each module
to be isolated so that the risk of interference is minimized. The requirement
is to be able to isolate or ‘lock down’ many of these routines. This is
performed by the memory protection unit (MPU). Current Memory Protection Units
(MPUs) typically offer 4KByte code boundaries for isolating functions. This is
typically too large for systems which have limited memory resources. Large code
boundaries mean that it is impossible to efficiently segregate many small
software routines – often several tasks will have to be included within the same
protection scheme.
To enable good segregation of individual tasks
executed within an operating system (OS) kernel, the ARM996HS MPU provides a
fine 32-byte granularity of memory region for each task. This enables more
effective use of the available memory resource for supporting multiple tasks.
With this level of granularity it is possible to separate: User from system,
task from task, data from data and stack from stack.Shared access is possible
via overlapping regions.
NON-MASKABLE
INTERRUPTS
Non-Maskable Interrupts (NMI) enables fast interrupt
requests to be made nonmaskable
by software. Where predictability of execution is
especially important, such as with control applications, NMI is a mandatory
requirement. NMI is particularly important when a watchdog is used in the
system that needs to be serviced at a particular time.
HARDWARE
DIVIDE COPROCESSOR
For applications where there is a requirement to
scale values read from sensors, the hardware divide instruction enables data to
be manipulated very efficiently. The hardware divide coprocessor provides
efficient implementation of division operations in parallel to the main
processor pipeline. It supports unsigned and signed 32-bit division through
normal coprocessor instructions. The hardware divide coprocessor is easy for
developers to use with tools and library support provided.
ADVANTAGES:
Difficulties associated with traditional synchronous design
have prompted many researchers to consider new alternatives. Three factors
drive current research and development efforts in the design and test of clock
less digital systems:
A) Low power
B) Performance
C) Design for reuse.
LOW POWER
The mainstream design style in use for today’s processors
is synchronous, that is, a clock regulates the internal timing. We measure how
fast a computer can execute instructions by the number of clock cycles per second.
Unfortunately, the clock consumes more power than most other components in the
chip. The most disturbing aspect of this characteristic is that the clock only serves
as a timer for computational tasks. It does not perform operations on data, it
simply orchestrates the many computational parts of a digital system (whether
chip, board, or computer). New problems are also evident for power consumption at-large;
that is, throughout the system in general. As the number of transistors on a
chip increases, so does the power used by the clock. Therefore, in complicated chips,
power consumption becomes an even more crucial topic. For mobile and portable
electronics, chips must conserve power even more efficiently to maximize
battery life. Low-power design is important
0 comments:
Post a Comment