Choosing Between an 8-bit or 32-bit MCU - Part 2

by Wade_Gillham on ‎10-16-2015 02:13 PM

8-bit MCU v 32-bit MCU - Which One to Use - cover.png

 

Introduction – Part 2

 

This blog series compares use cases for 8-bit and 32-bit MCUs and serves as a guide on how to choose between the two MCU architectures. Most 32-bit examples focus on ARM Cortex-M devices, which behave very similarly across MCU vendor portfolios.

 

There is a lot more architectural variation on the 8-bit MCU side, so it’s harder to apply apples-to-apples comparisons among 8-bit vendors. For the sake of comparison, we use the widely used, well-understood 8051 8-bit architecture, which remains popular among embedded developers.

 

Part 2 – Architecture Specifics and Conclusion: A More Nuanced View of Applications

 

Part 1 of this blog series painted the basic picture for the 8-bit and 32-bit trade-offs.

 

Now it's time to look at a more detailed analysis of applications where each architecture excels and where our general guidelines in Part 1 break down.

 

To compare these MCUs, you need to measure them. There are a lot of tools to choose from. I’ve selected scenarios I believe provide the fairest comparison and are most representative of real-world developer experiences. The ARM numbers below were generated with GCC + nanoCLibrary and -03 optimization.

 

I made no attempt to optimize the code for either device. I simply implemented the most obvious “normal” code that 90 percent of developers would come up with.

 

It is much more interesting to see what the average developer will see than what can be achieved under ideal circumstances.

 

Latency

 

There is a noticeable difference in interrupt and function-call latency between the two architectures, with 8051 being faster than an ARM Cortex-M core. In addition, having peripherals on the Advanced Peripheral Bus (APB) can also impact latency since data must flow across the bridge between the APB and the AMBA High-Performance Bus (AHB). Finally, many Cortex-M-based MCUs require the APB clock to be divided when high-frequency core clocks are used, which increases peripheral latency.

 

I created a simple experiment where an interrupt was triggered by an I/O pin. The interrupt does some signaling on pins and updates a flag based on which pin performs the interrupt. I then measured several parameters shown in the following table. The 32-bit implementation is listed here.

 

Figure 2 - IO Interrupt Experiment.png

 

The 8051 core shows an advantage in Interrupt Service Routine (ISR) entry and exit times. However, as the ISR gets bigger and its execution time increases, those delays will become insignificant.

 

In keeping with the established theme, the larger the system gets, the less the 8051 advantage matters. In addition, the advantage in ISR execution time will swing to the ARM core if the ISR involves a significant amount of data movement or math on integers wider than 8 bits. For example, an ADC ISR that updates a 16- or 32-bit rolling average with a new sample would probably execute faster on the ARM device.

 

Control vs. Processing

 

The fundamental competency of an 8051 core is control code, where the accesses to variables are spread around and a lot of control logic is used (if, case, etc.). The 8051 core is also very efficient at processing 8-bit data while an ARM Cortex-M core excels at data processing and 32-bit math. In addition, the 32-bit data path enables efficient copying of large chunks of data since an ARM MCU can move 4 bytes at a time while the 8051 has to move it 1 byte at a time.

 

As a result, applications that primarily stream data from one place to another (UART to CRC or to USB) are better-suited to ARM processor-based systems.

 

Consider this simple experiment. I compiled the function below on both architectures for variable sizes of uint8_t, uint16_t and uint32_t.

 

Figure 3 - Data Size Experiment.png

 

As the data size increases, the 8051 core requires more and more code to do the job, eventually surpassing the size of the ARM function. The 16-bit case is pretty much a wash in terms of code size, and slightly favors the 32-bit core in execution speed since equal code generally represents fewer cycles. It’s also important to note that this comparison is only valid when compiling the ARM code with optimization. Un-optimized code is several times larger.

 

This doesn't mean applications with a lot of data movement or 32-bit math shouldn't be done on an 8051 core.

 

In many cases, other considerations will outweigh the efficiency advantage of the ARM core, or that advantage will be irrelevant. Consider the implementation of a UART-to-SPI bridge. This application spends most of its time copying data between the peripherals, a task the ARM core will do much more efficiently. However, it's also a very small application, probably small enough to fit into a 2 KB part. Even though an 8051 core is less efficient, it still has plenty of processing power to handle high data rates in that application. The extra cycles available to the ARM device are probably going to be spent sitting in an idle loop or a “WFI” (wait for interrupt), waiting for the next piece of data to come in.

 

In this case, the 8051 core still makes the most sense, since the extra CPU cycles are worthless while the smaller flash footprint yields cost savings.

 

If we had something useful to do with the extra cycles, then the extra efficiency would be important, and the scales may tip in favor of the ARM core.

 

Pointers

 

8051 devices do not have a unified memory map like ARM devices, and instead have different instructions for accessing code (flash), IDATA (internal RAM) and XDATA (external RAM).

 

To enable efficient code generation, a pointer in 8051 code will declare what space it's pointing to. However, in some cases, we use a generic pointer that can point to any space, and this style of pointer is inefficient to access.

 

For example, consider a function that takes a pointer to a buffer and sends that buffer out the UART. If the pointer is an XDATA pointer, then an XDATA array can be sent out the UART, but an array in code space would first need to be copied into XDATA. A generic pointer would be able to point to both code and XDATA space, but is slower and requires more code to access.

 

Segment-specific pointers work in most cases, but generic pointers can come in handy when writing reusable code where the use case isn't well known. If this happens often in the application, then the 8051 starts to lose its efficiency advantage.

 

Identifying the “Core” Strengths

 

I've noted several times that math leans towards ARM, and control leans towards 8051, but no application focuses solely on math or control. How can we characterize an application in broad terms and figure out where it lies on the spectrum it lies?

 

Let’s consider a hypothetical application composed of 10% 32-bit math, 25% control code and 65% general code that doesn’t clearly fall into an 8 or 32-bit category. The application also values code space over execution speed, since it does not need all the available MIPS and must be optimized for cost.

 

The fact that cost is more important than application speed will give the 8051 core a slight advantage in the general code. In addition, the 8051 core has moderate advantages in the control code. The ARM core has the upper hand in 32-bit math, but that’s only 10% in the example. Taking all these variables into consideration, this particular application is a better fit for an 8051 core.

 

Figure 4 - Application Code Breakout Percentages.png

 

If we make a change to our example and say that the 32-bit math is 30% and general code only 45%, then the ARM core becomes a much more competitive player.

 

Obviously, there is a lot of estimation in this process, but the technique of deconstructing the application and then evaluating each component will help identify cases where there is a significant advantage to be had for one architecture over the other.

 

Power Consumption

 

When looking at data sheets, it's easy to come to the conclusion that one MCU edges out the other for power consumption. While it's true that the sleep mode and active mode currents will favor certain types of MCUs, that assessment can be extremely misleading.

 

Duty cycle (how much time is spent in each power mode) will always dominate energy consumption.

 

Consider a system where the device wakes up, adds a 16-bit ADC sample to a rolling average and goes back to sleep until the next sample. That task involves a significant amount of 16-bit and 32-bit math. The ARM device is going to be able to make the calculations and go back to sleep faster than an 8051 device. In this case, illustrated below, the ARM core may have higher sleep currents, but results in a lower power system.

 

Figure 5 - MCU Duty Cycle Impacts Power.png

 

Peripheral features can also skew power consumption one way or the other. For example, most of Silicon Labs’ EFM32 32-bit MCUs have a low-energy UART (LEUART) that can receive data while in low power mode, while only two of the EFM8 MCUs offer this feature. This peripheral affects the power duty cycle and heavily favors the EFM32MCUs over EFM8 devices without LEUART.

 

8-bit or 32-bit? I still can't decide!

 

What happens if, after considering all of these variables, it's still not clear which MCU architecture is the best choice? Congratulations! That means they are both good options, and it doesn't really matter which architecture you use.

 

Rely on your past experience and personal preferences if there is no clear technical advantage.

 

This is also a great time to look at future projects. If most future projects are going to be well-suited to ARM devices, then go with ARM, and if future projects are more focused on driving down cost and size, then go with 8051.      

 

What does it all mean?

 

8-bit MCUs still have a lot to offer embedded developers and their ever-growing focus on the Internet of Things. Whenever a developer begins a design, it's important to make sure that the right tool is coming out of the toolbox.

 

The difficult truth is that choosing an MCU architecture can't be distilled into one or two bullet points on a Marketing PowerPoint presentation.

 

However, making the best decision isn't hard once you have the right information and are willing to spend a little time applying it.

 

<-- PART 1

Comments
by Marc Roggemans
on ‎10-28-2015 01:33 PM

Hello,

 

After reading the blog I can only comply. But I would like to add some more advantages in favour of the 8051 core.

I teach students how microcontrollers operate on a hardware level, including the relation between software and hardware (duration of ISR's,

manipulation and usage of memory, access to peripherals etc, stack operation, ...). This is impossible without using assembly.

Understanding start-up code is not that easy when you have only learned how to program in C. Learning Cortex assembly without prior knowledge of how a CPU operates (pointers, PC-relative adressing etc) is not a walk in the park.

You did not mention the bit adressing possibilities of the 8051, making it super fast for control applications.

And last but not least, the 8051 is in use for 35 years. The Cortex is relatively young. Wil ist be around in 30 years?