bfloat16 – how it improves AI chip designs

Floating point calculations are slow for computers (specifically CPUs); possibly representing the same struggle for many humans. 🙂

I remember a time when a FPU (floating point unit) was an upgrade and one had to pay extra to get one. Very useful when you needed that extra precision in computing – and in my head, it always seemed like the Turbo button. 🙂

For most #ML workloads and computations, precision isn’t the most important criteria; with every increasing data and parameters (looking at you GPT-3 with 45 TB of data and 175 billion parameters!), what most ML needs today is speed and dynamic range.

This is where bfloat16 (Brain floating-point format with 16 bits) – a new floating-point format comes handy and in the context of #AI improves on IEEE 754 – the current floating-point arithmetic standard.

As per IEEE 754, a floating point it will always take up 32 bits (see Figure 1 below) – irrespective of the size of the number. The exponent (8 bits) tells us how many numbers we shift (left or right) and place the decimal. The fraction (23 bits), also called the mantissa, holds the actual number – i.e. the data.

Figure 1 – IEEE 754 Floating point representation

bfloat16 truncates the data size in a third (see Figure 2) – with the fraction truncated from 23 to 7 bits. This of course means bfloat16 isn’t as precise. However bfloat16 has the same exponent bits as IEEE-754 it can represent a similar range (small to large), but more importantly are easier to convert between bfloat16 and IEEE 754.

Figure 2 – fbloat16 representation

Less precision doesn’t impact the matrix multiplication as much so in the context of ML training and inference these chips at scale are more efficient – not only they are faster, they also use less power, and memory bandwidth.

What is interesting in some neural nets such as a DNN, these less precision bfloat16 are more precise compared to IEEE 754! This is because the regularization and quantization weights cannot use the finer precision represented by IEEE 754 but adapt better with bfloat16. 🙂

Finally, bfloat16 is not a universal standard (yet); most AI chips support this. ARM, Intel, and, AMD have started adding support for this in their chipsets.

Published by

Amit Bahree

This blog is my personal blog and while it does reflect my experiences in my professional life, this is just my thoughts. Most of the entries are technical though sometimes they can vary from the wacky to even political – however that is quite rare. Quite often, I have been asked what’s up with the “gibberish” and the funny title of the blog? Some people even going the extra step to say that, this is a virus that infected their system (ahem) well. [:D] It actually is quite simple, and if you have still not figured out then check out this link – whats in a name?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.