We’ve just pushed updates to each of our U-Boot source trees to address a pretty critical issue. The memory configuration settings for our SABRE-Lite based design along with Nitrogen6X and Nitrogen6X-SOM products will all fail under certain corner conditions.
We recommend that all users of the SABRE Lite or Nitrogen6X update their U-Boot as soon as convenient, and we’ll try to make that as easy as possible.
The patches
All of our primary U-Boot branches have been updated, including:
As
previously discussed in this post, our primary focus is on main-line-based U-Boot, but we added a minimal set of patches to the
U-Boot 2009.08 trees above that allow use of those trees on Nitrogen6X.
The
staging branch is needed for customers with Solo or Dual-Lite processors, or those with 2GB of RAM on the Nitrogen6X.
Upgrading
If you don’t want to re-compile U-Boot yourself, we’ve placed images on-line:
Binary |
Rename to |
Description |
u-boot-nitrogen6x.imx |
u-boot.imx |
Main-line production binary for SABRE Lite or Nitrogen6X boards |
u-boot-nitrogen6q.imx |
u-boot.imx |
Main-line staging binary for SABRE Lite or Nitrogen6X boards |
u-boot-non-android.bin |
u-boot.bin |
U-Boot 2009.08 binary for SABRE Lite or Nitrogen6X boards based on Freeescale’s L3.0.35_imx_1.1.0 release |
u-boot-android.bin |
u-boot.bin |
U-Boot 2009.08 binary for SABRE Lite or Nitrogen6X boards based on Freeescale’s imx_android_r13.4.1-ga |
The files listed as
Rename to u-boot.imx can be programmed using our
6x_upgrade script. Those with the
u-boot.bin tag can be upgraded with the
6q_upgrade script. If you have no idea what that means, please check out
this post describing the name change or further back,
this post describing the boot process.
If you don’t want to mess with the scripts, you can program a
u-boot.imx like so:
U-Boot > mmc dev 0
U-Boot > fatload mmc 0 10800000 u-boot.imx
313816 bytes read in 132 ms (2.3 MiB/s)
U-Boot > sf probe
SF: Detected SST25VF016B with page size 4 KiB, total 2 MiB
U-Boot > sf erase 0 0xc0000
U-Boot > sf write 10800000 0x400 $filesize
U-Boot > reset
Or a
u-boot.bin like this:
MX6Q SABRELITE U-Boot > mmc dev 0
mmc0 is current device
MX6Q SABRELITE U-Boot > ext2load mmc 0 10800000 u-boot.bin
Loading file "u-boot.bin" from mmc device 0:1 (xxa1)
179008 bytes read
MX6Q SABRELITE U-Boot > sf probe 1
JEDEC ID: 0xbf:0x25:0x41
2048 KiB SST25VF016B - 2MB at 0:1 is now current device
MX6Q SABRELITE U-Boot > sf erase 0 0xc0000
Erasing SPI NOR flash 0x0 [0xc0000 bytes]
................................................................................................................................................................................................SUCCESS
MX6Q SABRELITE U-Boot > sf write 10800000 0 $filesize
Writing SPI NOR flash 0x0 [0x2bb40 bytes] reset
Besides the more verbose prompt and messages, the syntax for
sf probe is different and the
u-boot.bin file gets written to address zero instead of
0x400.
Compiling
There’s a
blog post here describing how to get and compile our latest U-Boot from the production branch of our latest main-line-based (
u-boot-imx6) tree. If you have no special requirements, we recommend that, since it contains file-system enhancements and support for auto-detection of displays.
If you need one of the more advanced features supported by Freescale, especially support for secure boot, you’ll need to stay with the Freescale-derived source tree.
You can get and compile the non-Android branch like so:
~/$ git clone git://github.com/boundarydevices/u-boot-2009-08.git
~/$ cd u-boot-2009-08
~/u-boot-2009-08$ git checkout origin/boundary-imx_3.0.35_1.1.0 \
-b boundary-imx_3.0.35_1.1.0
Checking out files: 100% (6608/6608), done.
Branch boundary-imx_3.0.35_1.1.0 set up to track remote branch boundary-imx_3.0.35_1.1.0 from origin.
Switched to a new branch 'boundary-imx_3.0.35_1.1.0'
Checking out files: 100% (6608/6608), done.
...
Switched to a new branch 'boundary-imx_3.0.35_1.1.0'
~/u-boot-2009-08$ export PATH=/opt/freescale/usr/local/gcc-4.6.2-glibc-2.13-linaro-multilib-2011.12/fsl-linaro-toolchain/bin/:$PATH
~/u-boot-2009-08$ export ARCH=arm
~/u-boot-2009-08$ export CROSS_COMPILE=arm-none-linux-gnueabi-
~/u-boot-2009-08$ make mx6q_sabrelite_config
Configuring for mx6q_sabrelite board...
~/u-boot-2009-08$ make all
...
arm-none-linux-gnueabi-objcopy -O srec u-boot u-boot.srec
arm-none-linux-gnueabi-objcopy --gap-fill=0xff -O binary u-boot u-boot.bin
~/u-boot-2009-08$ ls -l u-boot.bin
-rwxrwxr-x 1 user user 179080 Feb 11 15:37 u-boot.bin
Send us a note if you have any questions.
The details
This took a
long time to find because the bug was hiding behind and appearing as if the problem stemmed from a CPU frequency scaling problem.
When we received our first shipment of
non-lidded CPUs, none of them would boot reliably at 1GHz, and we started suspecting hardware issues.
After much investigation, we found that the new devices were fused to configure the ramp rate for the on-chip LDOs to the slowest rate (250uS full-scale), and that the
CPU frequency switching code only waited 50uS after bumping the LDOs on a raise in frequency.
In the process of testing for that, we put together some shell script-lets to switch frequencies rapidly between 400MHz, 800MHz, and 1GHz as shown in the following profile snippet:
setspeed(){ echo $1 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed; }
temp(){ t=`cat /sys/class/thermal/thermal_zone0/temp` ; echo "---------------- temp $t" ; }
speed(){ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq; }
vddpu(){ if [ 0 -eq $# ]; then echo -n "vddpu: " ; cat /sys/class/regulator/regulator.1/microvolts ; else echo $* > /sys/class/regulator/regulator.1/microvolts ; fi }
vddcpu(){ if [ 0 -eq $# ]; then echo -n "vddcpu: " ; cat /sys/class/regulator/regulator.2/microvolts ; else echo $* > /sys/class/regulator/regulator.2/microvolts ; fi }
vddsoc(){ if [ 0 -eq $# ]; then echo -n "vddsoc: " ; cat /sys/class/regulator/regulator.3/microvolts ; else echo $* > /sys/class/regulator/regulator.3/microvolts ; fi }
s1g400(){ for n in `seq 1 1000` ; do for s in 996000 396000 ; do setspeed $s ; done ; done ; echo '==================> 1G->400 worked'; }
s1g800(){ for n in `seq 1 1000` ; do for s in 996000 792000 ; do setspeed $s ; done ; done ; echo '==================> 1G->800 worked' ; }
testit(){ for it in `seq 1 100` ; do echo "-----iteration $it" ; s1g800 ; s1g400 ; temp ; done ; echo '100 iterations worked'; }
These commands make use of the
userspace CPU frequency governor by reading and writing data from
/sys/devices/system/cpu/cpu0/cpufreq/.
For example, to switch to 800MHz, you can issue the
setspeed command (frequencies in kHz):
root@boundary ~$ setspeed 792000
And to see the current speed, you can issue the
speed command:
root@boundary ~$ speed
792000
We used the
testit command to switch between 1GHz and 400MHz for 1000 iterations, followed by switches between 1GHz and 800MHz. The
testit command runs through that sequence 1000 times, for a total of 4 million switches.
root@boundary ~$ testit
==================> 1G->400 worked
==================> 1G->800 worked
What we found was that
some boards would fail this test when other load is present on the system. Running four simultaneous memory tests (one per core) was sufficient to cause a failure.
Needless to say, this is a totally artificial test, but it uncovered something pretty critical. Memory timings that were slightly off caused failures on some boards with this set of conditions. With the memory timing updates, these errors don’t occur.
We’ve done extensive testing across lots of boards, and we’re very confident in the results.
It’s likely that there are other types of load that will trigger a memory failure of the same sort, which is why we encourage all users of our i.MX6-based boards to upgrade as soon as is convenient.
Parting thoughts
During the process of finding this bug, we’ve been a bit un-responsive to some of your support requests. Once we determined that the failure also occurred on other batches of CPUs, we realized that previous shipments were also suspect and it was critical to find the root cause.
We apologize for that, and will work to catch up as quickly as possible.
Besides that we have a few other updates forthcoming, including updates to our kernel trees to correct the LDO ramp times, and a new release of the L3.0.35_1.1.0 branch.