Hardware Bringup is the process of validating a new PCB produced by the team and preparing it to be integrated into the system. It requires electrical and firmware members to work very closely together to debug issues that often span hardware and software. To do this effectively, everyone involved must be thoroughly familiar with the system’s design and the low-level behavior of the firmware. When bringup is complete, the PCB is ready to be a fully-fledged component of the bike or a regular tool for future development.
For a presentable format of this information, refer to this presentation.
The first step in bringing up a new PCB is assembling and electrically validating it. This is primarily a task for the electrical team, so this section will only give a brief overview for what firmware engineers need to know about this process.
When a PCB arrives, the entire PCB should first be checked for any continuity errors/short circuits introduced during manufacturing. As unlikely as it may be, discovering this now will save a lot of time later on.
Next, the functionality of the board should be verified by ensuring the functionality of individual subsystems. Initially, only the power supplies should be populated and tested for operation. A well-designed board should provide easy access to test points for the output of each power supply.
Individual subsystems should then be populated and verified to have the required voltages and interconnects. This should be done iteratively, only a single subsystem should be validated at a time when possible. This step is complete when the board is fully populated and all ICs are being powered properly.
Once the electrical engineers are confident in their initial validation, they go to the firmware team to start working with software. The first communication protocol that needs to be validated is SWD because without it, we can’t do any programming. Until the microcontroller (uC) can be programmed, the firmware team can't help with the bringup process.
SWD functionality can be checked quickly by powering the board, connecting the ST Link to the JTAG connector and connecting to the board in the STM32 Cube Programmer. If this connection is successful, SWD is working. If not, there are a number of possible causes, ranging from incorrect pinouts to faulty power cicuitry. The electrical team must debug this before bringup can continue.
This step is not necessary if the PCB does not have an STM32 or similar uC. This would be the case for something like a custom shield, which does not have an onboard uC but still has other components to bring up.
Although not strictly necessary for a board to function, UART can make debugging a lot easier. It's so much easier, in fact, that it is often not worth trying to bring up a board without UART until it has been repaired. If other circuitry works on the first try, not having UART is fine, but debugging faulty hardware or software without UART is a time-consuming and frustrating task.
If you are using an ST-Link v3-mini, UART is built into the connector, so you don't need any external cables. If you're using an older ST-Link v2, UART is not on the connector, so you will have to use an external UART cable, like the one pictured to the right. If the 6-pin header isn’t provided, use jumpers to connect TX, RX, and GND.
Once you have a physical UART connection, you should be able to open the a virtual serial port just like you would for a nucleo. If this doesn't work right away, confirm that the baud rate you set in your terminal matches that set in the firmware. If it does, the best next step for debugging is checking TX and RX with a logic analyzer.
Onboard communication is used to control all the devices on the PCB itself. This includes communication over protocols like I2C and SPI but also basic control via GPIOs or PWM signals.
While the specific methods of debugging each protocol varies, the high-level ideas are the same. For this step, you should send the simplest possible request to each device on the PCB and confirm that it can receive and respond to messages from the STM. This simplest request can vary across devices, but it will often be something like requesting version number or hardware ID or the first byte of available data.
The goal here is only to validate the communication connections, not the actual behavior of the device. Device behavior is validated later in the process.
Inter-board communication is the communication that is exposed to the rest of the bike. PWM and GPIOs are the simplest forms of this communication, and they are also usually the easiest, so they can be checked first. This should be done with a logic analyzer if the voltage is 3.3V or below. If not, an oscilloscope can be used to check these values.
CAN is the other major form of inter-board of communication we use. CAN can be difficult to debug because problems on either side of the transceiver can prevent it from working. There are many examples in EVT-core that can be used to test CAN to confirm it is working correctly. Be sure to confirm that messages can both be sent and received. "back_and_forth" is a good sample to use to test this.
Once all communication has been validated, you can begin focusing on functionality. Before validation, there should have been some development of drivers for devices on the PCB. The samples for these drivers should be modified and run on the PCB to demonstrate that the driver works as intended. This is the point of the process at which software issues are most likely to arise, so be sure to have the necessary tools available to you to debug the driver you're working with.
Each device should be considered validated once there is a driver for it that exposes all the necessary functionality, and it works on the PCB. This should be done for all devices the board works with, on and off the PCB.
Once all the devices have been validated, the PCB can be integrated into the the bike’s electrical system. At this point, the hardware should be more-or-less completely validated, so the focus should be on testing software written to run the PCB while on the bike. As much testing as possible should be done with the PCB and software in an isolated system before introducing it to the bike.
Once a board has been fully tested and integrated into the electrical system, the bringup process is complete.
This section provides a brief overview of tools that are useful for debugging during hardware bringup. Some of these tools have further description on the Device Documentation pages.
Multimeters provide a number of important functions needed for validation, such as:
Checking the voltage on the power pin of a malfunctioning IC
Checking the state of a GPIO
Confirming continuity between two points on a board or in a wire harness
Measuring impedance on communication buses, which is important for CAN
Despite the advantages of this tool, it can also be dangerous. The simplicity and familiarity of the multimeter can often lull people into a false sense of security when debugging. Remember, live probing is one of the most common causes of damaging boards, so be very careful when checking voltages on the PCB. Also, whenever possible, turn the circuit off while you're moving probes to minimize the danger of slipping. Some of the best engineers on our team have wasted hours of their time because a probe touched something unintentionally.
The logic analyzer is one of the most valuable tools we have available. It can be used to observe the behavior of any pin header or any pin large to be grabbed with the hook probes. The Saleae software then displays the logical voltage level of each pin over time. It also allows to user to input which pins are used for different communication protocols and decodes signals for those protocols. It can analyze PWM, UART, I2C, SPI, and CAN (TX/RX).
On top of all these advantages, it is also very quick to set up and use. The interface is very basic, and it provides all the functionality you need without digging through menus.
While this is very convenient, you should keep one thing in mind. The Saleae does have some influence on the electrical behavior of the system. This means that it can cause errors that wouldn't normally happen and that it can hide errors that happen when it isn't plugged in. If you don't remember this, you can waste hours chasing ghosts. If at any point you suspect the Saleae is causing issues with the circuit, you should switch over to using the oscilloscope.
In most cases, though, the Saleae is an invaluable tool for the team and should be at the ready whenever you’re bringing up new hardware, if possible.
As compared to the Saleae, the oscilloscope is a much more versatile tool. It is capable of operating over a much wider voltage range, and it has a minimal effect on the electrical behavior of the system. It also has much higher resolution both in time and voltage. In general, the Saleae is very effective for the most common use cases, but the oscilloscope can be used in almost every use case the team will have.
While it is quite versatile, the oscilloscope has some disadvantages. The biggest issue is that it takes some time to set up and is cumbersome to work with. It also defaults to only show analog voltages, not digital values. It is capable of much of the signal analysis that the Saleae does, but setting this up takes even longer, especially because of the complexity of the interface. The Saleae is so much faster that it can be up and decoding signals before the oscilloscope is even finished booting.
In our experience, the oscilloscope has been most useful in debugging electrical problems with CAN high and low. This signal lies outside the voltage range that is safe for the Saleae, so the oscilloscope is the only option for this case.
This is the best tool for debugging logical problems with CAN messages. It decodes and records all messages on a CAN bus and can send messages at a given interval.
It does require a DE9 connector to connect to the bus, but this has been standard for the team for long enough that it likely won't be an issue, especially when you have a FUN-E SNAIL available. It also doesn't natively support CANopen commands, but it can be set up to send messages that follow the CANopen standard.
When debugging communication with a device, it can be difficult to tell if the problem is the device or the uC. In this case, a nucleo can be used to isolate the device being tested by jumpering the nucleo pins to the appropriate pins on the device. You should be careful when doing this on a PCB so you can avoid backfeeding a power rail or causing other similar electrical issues.
A nucleo can also be used to run the board code and compare with the onboard uC’s behavior to see if there are any differences. This is especially important when something electrical has happened to bring the functionality of the uC into question.
Arduinos allow simple scripts to simulate complex PCBs in initial integration testing of a PCB that needs to communicate with other PCBs. Using an Arduino, we can test the communication with other PCBs without actually having to risk the PCBs damaging each other. With the CAN shield, Arduinos can simulate a full PCB.
Arduinos can also be useful for conducting longer tests or serving utility purposes because they can be scripted in C and Python. However, because the code running on them is ours, they can provide an extra source of error, so they should be used thoughtfully. Arduino code is less complex than STM32 code, but because we use them less often, they can be even more error-prone than our STM uCs.
UART can be used for print debugging or logging information, but during bringup, UART can also allow you to develop a simple testing interface. This interface would act like a command line, exposing a number of commands to perform different tasks on the PCB.
This will be most useful when there is a device or set of devices with complex functionality on a PCB. You could manually adjust code and reprogram the board repeatedly to adjust parameters or send requests, but an interface will save time in the long run.
For a good example of this, look at the "bq_interface" target in the BMS repository.
OpenOCD allows the developer to step through code line-by-line to see how issues arise. This is useful for normal software debugging but also for having precise control over when events occur to assist with hardware testing.
Although it has full debugging features, it does have some hardware limitations because it is debugging on an external device. Specifically, it can sometimes change the behavior of the code and hardware depending on where breakpoints are placed. This change in behavior should be avoided whenever possible to ensure test behavior matches the behavior a PCB will exhibit while on the bike.
When possible, OpenOCD should be used throughout the bringup process to help with debugging. Using breakpoints in CLion will always be faster than adding print statements, recompiling, and flashing. That said, we have recently had issues with OpenOCD when using the ST-Link v3-mini, so this tool may not be an option.
Bringup is far from the end of the work that needs to be done for a PCB. Once a board has been integrated into the bike, it faces its first real testing, running on a mobile platform. This can introduce many problems with boards that would not be found any other way.
Even if there are no issues, boards can be further improved with software updates to flesh out functionality or enhance performance. Also, every board itself can be improved with a revision, to add functionality or streamline the circuitry. PCBs should constantly be re-evaluated to gain an understanding of how they might be improved in the future.
System development at the scale of EVT is a process of continuous improvement. Recognizing poor designs and iterating on them keeps our team moving forward.