INTRODUCTION:

In labs at Sun Microsystems, Intel and IBM, clock less chips have increased the pace at which high-end processors do their work. In 1997, Intel developed an asynchronous, Pentium-compatible test chip that ran three times as fast, on half the power, as its synchronous equivalent.

 Fant has focused on still another benefit of asynchronous design. Because these chips give off no regularly timed signal, the way clocked circuits do, they can perform encryption in a way that is harder to identify and to crack. Improved encryption makes asynchronous circuits an obvious choice for smart cards—the chip-endowed plastic cards beginning to be used for such security-sensitive applications as storage of medical records, electronic funds exchange and personal identification. .

SMT offers chip designers a big performance gain for a relatively small design change; asynchronous logic involves a far more dramatic rethink. As its name suggests, it does away with the cardinal rule of chip design: that everything marches to the beat of an oscillating crystal “clock”. For a 1GHz chip, this clock ticks one billion times a second, and the entire chip’s processing unit’s co-ordinate their actions with these ticks to ensure that they remain in step. Asynchronous, or “clock less”, designs, in contrast, allow different bits of a chip to work at different speeds, sending data to and from each other as and when appropriate.

In recent years, however, clock less designs have started to look more appealing. One reason is that, as chips get bigger, faster and more complicated, distributing the clock signal around the chip becomes harder. Another drawback with clocked designs is that they waste a lot of energy, since even inactive parts of the chip have to respond to every clock tick. Clocked chips also produce electromagnetic emissions at their clock frequency, which can cause radio interference.

Asynchronous designs take up more room on a chip than conventional designs, and there are far fewer design tools available to help create them,  It is believes that such chips will have twice the power of conventional designs, which will make them ideal for use in high-performance computers. The most promising application for asynchronous chips may be in mobile wireless devices and smartcards.

One of the advantages of clock less chips is that they give off very low levels of electromagnetic noise. The faster the clock, the more difficult it is to prevent a device from interfering with other devices; dispensing with the clock all but eliminates this problem. The combination of low noise and low power consumption makes asynchronous chips a natural choice for mobile devices. "The low-hanging fruit for clock less chips will be in communications devices," starting with cell phones.

Wireless devices based on asynchronous chips would run for longer between recharges, and their circuitry would cause less radio interference. Dr Furber is developing asynchronous chips for such devices in conjunction with ARM, a British company whose processors appear in many handheld computers and mobile phones. Philips, a Dutch electronics firm, has already built a pager that uses asynchronous logic, and Theseus Logic of Orlando, Florida, is also pursuing low-power wireless applications.

In the case of smartcards, Dr Furber suggests that asynchronous logic would offer better security than conventional chips. The encryption on existing smartcards can be cracked by analyzing the power consumption for each clock tick. This allows details of the chip’s inner workings to be deduced. Such an attack would be far more difficult on a smartcard based on asynchronous logic.        

CLOCKLESS CHIPS:

There are no purely asynchronous chips yet. Instead, today’s clock less processors are actually clocked processors with asynchronous elements. Clock less elements use perfect clock gating, in which circuits operate only when they have work to do, not whenever a clock ticks. Instead of clock-based synchronization, local handshaking controls the passing of data between logic modules. The asynchronous processor places the location of the stored data it wants to read onto the address bus and issues a request for the information. The memory reads the address off the bus, finds the information, and places it on the data bus. The memory then acknowledges that it has read the data. Finally, the processor grabs the information from the data bus. Pipeline controls and FIFO sequencers move data and instructions around and keep them in the right order. According to Jorgenson, “Data arrives at any rate and leaves at any rate. When the arrival rate exceeds the departure rate, the circuit stalls the input until the output catches up.” The many handshakes themselves require more power than a clock’s operations. However, clock less systems more than offset this because, unlike synchronous chips, each circuit uses power only when it performs work.

CLOCK LESS ARM996HS

The compact, clock less ARM996HS processor is an ideal solution for automotive, medical and deeply embedded control applications because of its extremely low power consumption and low Electro Magnetic Interference (EMI). The ARM996HS processor is the industry’s first licensable clock less processor and directly addresses the needs of design engineers for technology optimized for robust and real-time chip designs.

The ARM996HS processor utilizes Handshake Solutions’ technology, a production-proven methodology for implementing self-timed circuitry which has been used in hundreds of millions of chips for smart cards, advanced pagers, in-vehicle network transceivers, and cordless handsets. By leveraging this low EME technology, the ARM996HS processor can address the market need for very low EMI to reduce packaging and shielding costs. Additionally, the processor can be used in applications that require low current consumption and an extended battery life while maintaining real-time application responsiveness and a small footprint.

By removing the clock and associated architecture of standard ICs, clock less designs offer significant reductions in power consumption and EMI enabling designers to squeeze more functionality into limited power budgets. Handshake Solutions offers an extremely disciplined design methodology that delivers the unique benefits of clock less design within a commercially available design environment. The methodology is supported by easy-to-use and thoroughly field-tested tools, design services, and standard IP blocks from Handshake Solutions

The new ARM996HS processor is optimized for use in both synchronous (clocked) and asynchronous (clock less) system-on-chip designs, enabling easy integration by ARM semiconductor Partners. Engineers can easily design in the new processor using standard cell libraries such as the ARM Metro family for low power, and their existing EDA tools, for lower costs and shorter design cycles. The key benefits of the processor include low EMI, reducing the probability of interfering with sensitive circuitry; low current peaks, enabling easier integration with analog components; and low power consumption, reducing system power requirements. Because clock less processors consume zero dynamic power when there is no activity, they can significantly extend battery life


FUNCTIONING

Clock less logic uses a system of local handshaking rather than a global clock to define transaction timing. Handshakes are implemented by means of simple request and acknowledge signals that mark the validity of the data. This approach means that only those parts of a system actively involved in task execution draw power, reducing active standby power consumption to zero and extending battery lifetimes. Execution of individual functions need not wait for the next clock pulse, enabling immediate response to exceptional events.  Wake-up from interrupt can even be performed without a running clock. Handshake Solutions’ clock less designs supports integration with synchronous blocks and systems. This means that the clock less ARM996HS processor can be easily integrated into a conventional, clocked SoC design. Because they exhibit no ground bounce, Handshake Solutions functions can be quickly and easily combined with clocked logic, analog, RF or memory blocks to meet the exact system requirements.

A COMPLETE CLOCK LESS DESIGN SOLUTION

Although the theoretical principles of clock less design are well-defined, implementation of clock less logic within the context of a complex system design has not, historically, been straightforward. Potential issues included the availability of design flow support, the need for special clock less standard-cell libraries, the area overhead arising from the control logic necessary to implement the self-timing, and ensuring that clock less designs are testable using industry-supported techniques.

Handshake Solutions has brought clock less technology to a level of maturity where the technology has now been proven on many designs. With dozens of production designs incorporating Handshake Technology successfully complete, over a hundred million Handshake Technology-based ICs have been manufactured, tested and sold. Developing a rigorous approach has solved the problems that have been experienced in the practical application of asynchronous technologies in the past. 

Handshake Solutions’ offering includes a design flow based on standard cell libraries with no dedicated asynchronous cells required. The flow is compatible with standard ‘synchronous’ tools for technology mapping, logic optimization, timing analysis and testing.  A high-level design entry language called Haste (formerly Tangram), developed specifically for the specification and designs of clock less circuit, provides the entry mechanism. Haste, being similar to C and behavioral Verilog, is easy to learn and use. The design description is synthesized using Handshake Solutions’ silicon compiler. 

Once the Verilog netlist has been generated, the back-end design to GDSII can follow any industry-standard implementation flow. This means that for licensees of the ARM996HS who wish to integrate the processor into their SoC, it is not necessary to adopt the full Handshake Solutions design flow, but simply take the Verilog netlist and incorporate it as an IP block. The methodology also supports scan-test for complete design for test (DFT) solutions, ensuring that design teams can make use of standard automatic test pattern generation (ATPG) techniques           


The ARM996HS is a new 32-bit RISC processor core – part of the ARM9E family of cores. The ARM9E family is based on the ARMv5TE Instruction Set Architecture (ISA), which includes the 16-bit Thumb® and 32-bit ARM instruction sets. The Harvard-architecture core is based on a 5-stage integer pipeline with fast 32-bit multiply-accumulate (MAC) block. It has tightly coupled memories (TCMs) for both instruction and data, each of which can be up to 4MB. Dual AMBA™ AHB-Lite™ synchronous buses provide the instruction and data interfaces. Specific security enhancements for the ARM996HS core include a memory protection unit (MPU) and the provision of non-maskable interrupts (NMI). A hardware divide co-processor is also provided.

CLOCK LESS PIPELINE

The pipeline within the ARM996HS core mirrors the normal ARMv5TE pipeline, with the exception that it is implemented as a clock less design. Instead of a global synchronous clock, dedicated control logic ensures that each stage in the pipeline is enabled only when required. The pipeline handshakes with the system controller to fetch instructions and to load and store data.


ENHANCED MEMORY PROTECTION UNIT

In order to enable software re-use from several different sources, it is vital that there is a mechanism to allow each module to be isolated so that the risk of interference is minimized. The requirement is to be able to isolate or ‘lock down’ many of these routines. This is performed by the memory protection unit (MPU). Current Memory Protection Units (MPUs) typically offer 4KByte code boundaries for isolating functions. This is typically too large for systems which have limited memory resources. Large code boundaries mean that it is impossible to efficiently segregate many small software routines – often several tasks will have to be included within the same protection scheme.
To enable good segregation of individual tasks executed within an operating system (OS) kernel, the ARM996HS MPU provides a fine 32-byte granularity of memory region for each task. This enables more effective use of the available memory resource for supporting multiple tasks. With this level of granularity it is possible to separate: User from system, task from task, data from data and stack from stack.Shared access is possible via overlapping regions.

NON-MASKABLE INTERRUPTS

Non-Maskable Interrupts (NMI) enables fast interrupt requests to be made nonmaskable
by software. Where predictability of execution is especially important, such as with control applications, NMI is a mandatory requirement. NMI is particularly important when a watchdog is used in the system that needs to be serviced at a particular time.

HARDWARE DIVIDE COPROCESSOR
For applications where there is a requirement to scale values read from sensors, the hardware divide instruction enables data to be manipulated very efficiently. The hardware divide coprocessor provides efficient implementation of division operations in parallel to the main processor pipeline. It supports unsigned and signed 32-bit division through normal coprocessor instructions. The hardware divide coprocessor is easy for developers to use with tools and library support provided.

ADVANTAGES:

Difficulties associated with traditional synchronous design have prompted many researchers to consider new alternatives. Three factors drive current research and development efforts in the design and test of clock less digital systems:
A) Low power
B) Performance
C) Design for reuse.

LOW POWER

The mainstream design style in use for today’s processors is synchronous, that is, a clock regulates the internal timing. We measure how fast a computer can execute instructions by the number of clock cycles per second. Unfortunately, the clock consumes more power than most other components in the chip. The most disturbing aspect of this characteristic is that the clock only serves as a timer for computational tasks. It does not perform operations on data, it simply orchestrates the many computational parts of a digital system (whether chip, board, or computer). New problems are also evident for power consumption at-large; that is, throughout the system in general. As the number of transistors on a chip increases, so does the power used by the clock. Therefore, in complicated chips, power consumption becomes an even more crucial topic. For mobile and portable electronics, chips must conserve power even more efficiently to maximize battery life. Low-power design is important 

0 comments:

 
Top