Payment & Shipping Terms:
|Module Size:||160x160 Or 320x160 Or 256x256 Or 192x192 Or 256x128||Resolution:||32x32 Pixels 64x32 Pixels|
|Brightness:||More Than 2500nits||Input Voltage:||DC5V|
|Scan Type:||1/2 Or 1/4 Or 1/8 Or 1/16 Or 1/32Scan||Fresh Rate:||≥ 2400Hz|
|LED Type:||SMD2121 SMD3528 SMD3535 SMD2727 SMD5050||MTTF:||>100,000 Hours|
My latest project uses a BeagleBone Black and a Xilinx Spartan 6 LX9 FPGA to drive a 32×32 RGB LED matrix.
This project lets me display cool and interesting patterns on a matrix of 32×32 RGB LEDs. That’s 1024 RGB LEDs or 3072 individual LED chips that need to be controlled! Rather than attempt to control all the LEDs in software only or using one of the BBB’s programmable real-time units (PRU), I decided to use the CPU to generate the patterns and use the FPGA to handle the heavy duty task of refreshing the LEDs.
Using the FPGA to refresh the LEDs leaves me with nearly 100% of the BBB CPU available to generate patterns and lets me implement 12-bit color at a refresh rate of 200Hz. A 200Hz refresh rate has no perceptible flicker and prevents scan lines from showing when photographing or filming the panel. A typical 12-bit color software implementation using the PRU to refresh the panels only refreshes at 50 to 60Hz.
Having the CPU free to compute patterns rather than refresh the display lets me generate some rather complicated patterns that otherwise might not be possible. Right now the most complicated and interesting pattern is seamlessly looping Perlin noise but other various abstract patterns, animated GIFs, text, etc., can be displayed.
Seamlessly looping Perlin noise running at 50Hz on 1024 pixels requires 102,400 3D Perlin noise calculations per second. Using floating point math with no compiler optimization, this quickly burns through the BBB CPU cycles. Had I attempted to refresh the panel using a small embedded processor such as an Arduino, I wouldn’t have had the CPU bandwidth available to both calculate this complicated pattern and refresh the display.
To build this project, I used a stock BeagleBone Black SOC board, a ValentFX LogiBone FPGA board with a Xilinx Spartan 6 LX9 FPGA, a 32×32 RGB LED panel from SparkFun, and some jumper wires from Pololu Robotics. The LogiBone FPGA board was a beta unit acquired through their Kickstarter campaign. To build and simulate the FPGA, I used the free Xilinx WebPack tools. Being free, the Xilinx WebPack tools permit hobbyists (or anyone building small designs) to simulate, synthesize, map, and place and route code for a select set of Xilinx’s devices.
Speaking of simulation, do it! The very first bit file I loaded into the FPGA board worked the very first time I loaded it except that I had a mirror image on the display from feeding the RGB data into the display right-to-left instead of left-to-right. I reversed the order, ran another sim, and built another part. Bingo. Second try. Perfection.
After building everything and getting everything to work, I made a short video demonstration of the project, wrote a complete tutorial on how anyone with a BeagleBone Black, a LogiBone FPGA board, and an LED panel can replicate results, then uploaded all the required code and files to github. Below are links to the code, tutorial, and video.
In this project, we interface a SparkFun or Adafruit 32x32 RGB LED panel to a BeagleBone Black board using the Xilinx Spartan 6 LX9 FPGA on the LogiBone FPGA board. The hardware for this project is relatively easy to construct—just 16 data signals connect the LED panel to the LogiBone FPGA board. The complexity of this project lies mostly in the RTL and software.
Figure 1. RGB LED panel with a random twinkling pattern connected to the LogiBone FPGA board and some other sample panel images.
The following hardware items are required:
SparkFun or Adafruit 32x32 RGB LED panel
This panel contains 1024 RGB LEDs arranged in a 32x32 matrix. The columns are driven using multiple sets of shift registers and the rows are driven, two rows at a time, using a 4-bit address decoder. The panel is driven at 1/16th duty cycle and must be continuously refreshed to display an image.
BeagleBone Black CPU board w/ USB or +5VDC power supply
You’ll need a BeagleBone Black CPU board and a +5VDC power supply for it. You can either use a USB cable to power the board from your computer or a USB power adapter or use a separate +5VDC, 2.1mm I.D., center-positive AC adapter.
LogiBone FPGA board
The FPGA board contains a Xilinx Spartan 6 LX9 FPGA. The FPGA contains 32 18kbit block RAMs. We’ll use two of the block RAMs as frame buffers to hold the RGB pixel values to be displayed on the panel. The two Digilent PMOD-compatible connectors will be used to connect to the LED panel.
Jumper wires or PMOD-to-display adapter board to connect the FPGA to the display
Initially, I used male-to-female jumper wires to connect the panel. This allowed me to connect the LogiBone FPGA board directly to the LED display panel without using the ribbon cable included with the display. If you only have male-to-male jumper wires, you’ll need to use the 16-position ribbon cable included with the display as an adapter to connect to the male pins on the display end of the jumper wires.
A much cleaner, long-term solution is to use this board and the 16-position ribbon cable included with the LED panel to make the connection from the LogiBone FPGA board to the display’s input connector. I also used precrimped terminal wires and housings to connect the FPGA and panel together. I didn’t like this solution because the precrimped terminal wires, when installed in a 2x8 housing connector, required too much force to insert onto and remove from the display’s data connector.
+3.3V power supply, 2.0A nominal, 4.0A peak
During normal operation, the display will draw at most about 2A of current. If you “stall” the refresh with an all-white pattern displayed, the two rows that are lit will draw about 3.8A. A small 3.3V, 3.0A desktop power supply such as this one from Mouser will be sufficient during normal operation. You will need to supply your ownIEC60320 C13 power cord to use with this adapter.
These panels can also be run from +5V instead of 3.3V. You will get brighter greens, brighter blues, and less-red whites if driven from +5V instead of +3.3V. You'll also pull about 15% more current and use about 65% more power at +5V instead of +3.3V. If you use a +5V supply, be extra careful not to accidently connect the LogiBone FPGA board to the display’s output connector.
Female DC barrel jack adapter (optional)
A female DC barrel jack adapter will make connecting the panel to the power supply much easier. If you don’t have an adapter, you can always cut, splice, solder, and heat shrink the connections between the power supply and led panel.
This system has three major components: the LED panel, the FPGA code, and the C++ code. Let’s examine each of these three major components in detail.
The LED panel contains 1024 RGB LEDs arranged in a matrix of 32 rows and 32 columns. Each RGB LED contains separate red, green, and blue LED chips assembled together in a single package. The display is subdivided horizontally into two halves. The top half consists of 32 columns and 16 rows. The bottom half also consists of 32 columns and 16 rows.
The display’s columns are driven by one set of drivers and the display’s rows are driven by another set of drivers. To illuminate an LED, the drivers for both the column and the row for that LED must be turned on. To change the color of an LED, the red, green, and blue chips in each LED package are controlled individually and have their own column drivers. Figure 2 below is a schematic representation of the display’s column and row driver organization.
Figure 2. RGB LED panel column and row driver organization.
The panel contains six sets of column drivers; three for the top half of the display and three for the bottom. Each driver has 32 outputs. The three drivers for the top of the display drive the red, green, and blue chips in each of the 32 columns of LEDs in rows 0 to 15 of the panel. The three drivers for the bottom of the display drive the red, green, and blue chips in each of the 32 columns of LEDs in rows 16 to 31 of the panel.
Each of the drivers has a serial data input, a blanking input, a shift register, and a parallel output register as shown below in Figure 3. The data present on the serial data input is shifted into the shift register using the SCLK signal. After an entire row of data has been shifted in to the shift register, the LATCH signal is used to transfer the row of pixel data from the shift register into the parallel output register. If a bit in the output register is a ‘1’ and the blanking input is deasserted, the driver for that column will be enabled; otherwise, the driver will be turned off. Data is shifted from the right edge of the display to the left edge of the display. In other words, the first bit shifted in will be displayed on the left edge of the display and the last bit shifted in will be displayed on the right.
Figure 3. Column driver operation for the R0 data input and top-half red columns outputs. There are two more of these shift registers at the top of the display for the top-half green and blue columns and three more at the bottom for the bottom half red, green, and blue columns.
The red, green, and blue column drivers for the top half of the display are attached respectively to the R0, G0, and B0 data inputs. The red, green, and blue column drivers for the bottom half of the display are attached respectively to the R1, G1, and B1 data inputs. All six of the 32-bit drivers share common SCLK, LATCH, and BLANK signals.
The rows are driven using four address bits and an address decoder. The four-bit address input to the row drivers is decoded and the two row drivers corresponding to that address will be turned on. When A[3:0] is 0, rows 0 and 16 of the display are turned on. When A[3:0] is 1, rows 1 and 17 of the display are turned on. This pattern continues until A[3:0] is 15 and rows 15 and 31 are turned on.
In addition to the row and column logic and drivers, the display has a blanking input. This input is most likely connected to the column drivers. When the blanking signal is asserted, all of the pixels are turned off and the display will be black. When the blanking signal is deasserted, the addressed rows and columns will be driven and the corresponding pixels illuminated. To display an image without flickering and ghosting, all of these signals must be used and properly sequenced when driving the panel.
The display is multiplexed and has a 1/16th duty cycle. This means that no more than one row out of the 16 in the top half of the display and one row out of the 16 in the bottom half of the display are ever illuminated at once. Furthermore, an LED can only be on or off. If both the row and column for an LED are turned on, the LED will be illuminated; otherwise, the LED will be off.
To display an image, the entire LED panel must be scanned fast enough so that it appears to display a continuous image without flickering. To display different colors and different brightness levels, the brightness of the red, green, and blue LED chips within each LED package must be adjusted by varying the amount of time that each LED chip is on or off within a single refresh cycle.
The basic process used to refresh the display when using three bits-per-pixel color (one bit for red; one bit for green; and one bit for blue) is the following:
The above process uses one bit per LED color. This will give you eight possible colors: black; the primary colors red, green, and blue; the secondary colors cyan, magenta, and yellow; and white.
To display more colors and brightness levels the above technique is modified to use binary coded modulation. In binary coded modulation, each pixel is controlled using more than a single bit per color per pixel. The amount of time each red, green, and blue LED chip is on is then varied proportionally to the pixel’s red, green, and blue values.
In binary coded modulation, the following process is performed to refresh the display:
Note that in actual implementations, the process of shifting the pixel data into the shift registers in Step 1 is usually done during the wait time in Step 6.
Global display dimming can be performed by varying the amount of time the blanking signal is asserted or deasserted within the wait time period, N. For example, asserting the blanking signal 25% early will result in a display with a brightness of 75% instead of 100%. Note that during global dimming, the wait time itself is not shortened or lengthened; only the blanking signal is modified to be asserted earlier than it normally would be.
The FPGA interfaces the C++ pattern generation software running on the BeagleBone Black CPU to the LED panel. The FPGA does the heavy lifting required to refresh the entire LED panel about 200 times per second. This leaves the BeagleBone Black CPU free to generate the patterns and perform other tasks.
Figure 4. Block diagram of the system including a block diagram of the FPGA’s major functional blocks.
As shown in Figure 4 above, software running on the BeagleBone Black generates patterns. These patterns are fed to the FPGA on the LogiBone board using the TI SOC’s GPMC bus. These patterns are written to a dual-port memory that serves as a display buffer. Finally a display controller reads the patterns out of the dual port memory, shifts the data into the display, and enables the row drivers as needed to display the image. The entire process is repeated about 200 times per second and generates a 32 x 32 RGB image with 12-bit color without any interaction from the BeagleBone Blacks’ CPU.
The TI SOC has a programmable memory interface called the general-purpose memory controller (GPMC). This interface is extremely flexible. It can operate in both synchronous and asynchronous modes and the bus timing is programmable in 10ns increments. The GPMC bus will be used to transfer pixel data from the software on the BeagleBone Black to the FPGA on the LogiBone board.
In our system, the GPMC is configured to operate in its asynchronous, multiplexed address/data mode. In this mode, both the address and data buses are 16 bits wide. This permits an entire 12-bit pixel to be transferred from the CPU on the BBB to the FPGA on the LogiBone board in a single write operation. For more information on the GPMC’s asynchronous, multiplexed mode of operation, see sections 126.96.36.199.10.1.1 of the AM335x ARM® Cortex™-A8 Microprocessors Technical Reference Manual.
I’m using a slightly different circuit in the FPGA to interface to the GPMC bus than the stock LogiBone projects. It’s a bit slower than the stock VHDL circuit, but guarantees that each write from the CPU over the GPMC bus creates exactly one write strobe pulse to the register interface inside the FPGA. Because it’s slightly slower than the stock circuit, it requires modified bus timing and thus a custom device tree setup file. Figure 5 below shows the bus timing using the modified GPMC interface to perform a write to the FPGA. Figure 6 below shows the bus timing using the modified GPMC interface to perform a read from the FPGA.
Figure 5. Simulation of a write to the GPMC target using the modified bus timings.
Figure 6. Simulation of a read from the GPMC target using the modified bus timings.
The read or write address is latched into a temporary holding register on the rising edge of the GPMC_ADVN signal and the write data is latached into its own temporary holding register on the falling edge of the GPMC_WEN signal. This requires using the GPMC_ADVN and an inverted version of the GPMC_WEN data signals as clocks. Technically, using data signals as clocks is gross. It’s actually so gross, the Xilinx tools will generate an error for this condition. But you can set an exception in the UCF file for the affected nets and force synthesis to continue. It would be much better to use the GPMC in its synchronous mode, but this technique is good enough for an FPGA until I have time to build a synchronous version of the interface, a synchronous GPMC bus model for simulation, and learn how to modify the device tree further.
In addition to latching the address and write data values into holding registers, the GPMC_CSN, GPMC_WEN, and GPMC_OEN controls signals are registered and brought into the FPGA’s 100MHz clock domain. Once in the FPGA’s clock domain, the WEN and OEN signals are gated with the CSN signal and edge detected to detect writes to the GPCM target and reads from the GPMC target. When a read or write is detected, the contents of the address and write data holding registers are captured into registers in the FPGA’s 100MHz clock domain.
The primary reason to slow down the GPMC bus versus the stock device tree setup file was to stretch the time that each of these control signals is low or high to at least 30ns to guarantee that the edges of the signals could be detected in the FPGA’s 100MHz clock domain. This also guaranteed that the address and data would be stable in their own holding registers before moving the contents of those registers into the address and data registers that are clocked in the FPGA’s 100MHz clock domain.
The outputs of the GPMC target are a bus that I’m calling the slow bus. The slow bus connects the GPMC target to the FPGA’s register interface. Figure 7 shows an example slow bus write operation. Figure 8 shows an example slow bus read operation.
Figure 7. Simulation of a slow bus write.
sb_addr, sb_wr, and sb_wr_data will be valid for exactly a single 100MHz clock pulse every time a write occurs on the GPMC bus. When the register interface sees sb_wr asserted, it writes sb_wr_data into the register at sb_addr.
Figure 8. Simulation of a slow bus read.
sb_addr and sb_rd will be valid for exactly a single 100MHz clock pulse every time a read occurs on the GPMC bus. The register interface sees sb_rd asserted then must return the value of the register at the address sb_addr on the sb_rd_data bus on the very next clock cycle.
The register interface is implemented in the top level of the FPGA Verilog. The register interface defines the view the software has of the FPGA. Table 1 below lists the registers in the FPGA.
|FPGA Address||BBB SOC Address||Name||Description|
|0x0000||0x0000||R/W Test Reg 1||Read / write test register. Write any value to this register. Reads return previously written value.|
|0x0001||0x0002||R/W Test Reg 2||Read / write test register. Write any value to this register. Reads return previously written value.|
|0x0002||0x0004||R/W Test Reg 3||Read / write test register. Write any value to this register. Reads return previously written value.|
|0x0003||0x0006||R/W Test Reg 4||Read / write test register. Write any value to this register. Reads return previously written value.|
|0x0004||0x0008||Read-Only Test Reg 1||Read-only test registers. Reads return hard-coded values. See RTL for returned values.|
|0x0005||0x000a||Read-Only Test Reg 2||Read-only test registers. Reads return hard-coded values. See RTL for returned values.|
|0x0006||0x000c||Read-Only Test Reg 3||Read-only test registers. Reads return hard-coded values. See RTL for returned values.|
|0x0007||0x000e||Read-Only Test Reg 4||Read-only test registers. Reads return hard-coded values. See RTL for returned values.|
|0x0008||0x0010||Display Buffer Address Register||Writes to this register set the display buffer address pointer. The display buffer address pointer points to the location in the display buffer memory that will be modified when a pixel value is written to the display buffer data register. See the display buffer section of this document for the arrangement of pixels in memory.|
|0x0009||0x0012||Display Buffer Data Register||Writing a pixel value to this register writes the pixel value to the display buffer at the address pointed to by the display buffer address pointer. After each write, the display buffer address pointer is incremented by one to point at the next pixel in the display buffer.|
|0x000a||0x0014||Display Buffer Select Register||0 selects buffer 0 for display; 1 selects buffer 1 for display; Reads return which buffer is currently being displayed.|
Table 1. FPGA registers.
The display buffers are implemented usinx Xilinx Block RAMs configured as dual-port memories with asynchronous read and write ports. The first RAM contains display buffers 0 and 1 for the top half of the display. The second RAM contains display buffers 0 and 1 for the bottom half of the display. Structuring the memories to contain half the display each permits the pixels in rows 0 to 15 to be read from memory on the exact same clock that the pixels in rows 16 to 31 are read from memory.
Display buffer 0 is located at address 0x0000. Display buffer 1 is located at address 0x0400. Each display buffer contains 1024 12-bit RGB values arranged as 32 rows of 32 columns. Within each display buffer, the top-left pixel is stored at offset 0, the bottom-right pixel is stored at offset 0x3ff. Bits 4 to 0 of the pixel offset are 0x00 for pixels in the leftmost column on the display; bits 4 to 0 of the pixel offset are 0x1F for pixels in the rightmost column.
Pixels are stored in the memory as 12-bit RGB values. These values are stored right-justiified. Bits 11 to 8 are the red pixel level, bits 7 to 4 are the green level, and bits 3 to 0 are the blue level.
The display driver reads pixel values from memory, shifts those values to the display, and cycles through the rows of the display as required to implement binary coded modulation as described in the theory of operation section of this document. The display driver is implemented as a state machine. Each state implements a step in the refresh process. When that step is complete, the state machine moves to the next step in the process.
Figure 9 below shows simulation waveforms for the control and data outputs for three rows worth of display data. The basic process is to blank the display, latch in the previously shifted data, update the row selects, unblank the display, shift in the next set of pixel data, and then wait for an update timer to expire. This is repeated four times for each row. If you examine the blanking output, you’ll notice that its low period doubles three times within the output period for each display row. This is the result of using binary coded modulation to vary the intensity of each pixel.
Figure 9. Simulation waveforms for the display data output connections.
The demonstration software uses the /dev/logibone_mem device to communicate with the FPGA. The driver for this device is part of the stock LogiBone Ubuntu image and its loadable kernel module is installed by the modified device tree setup shell script that’s included in the GitHub repository for the LED panel. (More on this subject in a later section.) This driver maps the registers in the FPGA to a portion of the BBB CPU’s address space using the GPMC. The GPMC normally maps memory into the CPU’s address space. Because our FPGA looks like a memory to the GPMC bus, its registers can be mapped into the CPU address space too. Pretty cool. No SPI, I2C, etc.; just fast parallel accesses between the CPU and the FPGA. This memory-mapped space can then be accessed by opening the /dev/logbone_mem device using the C library open function call and reads and writes to a register in the FPGA can be performed using the pread and pwrite C library function calls.
Figure 10 below is a block diagram of the demonstration software stack. In the demonstration software, main opens the /dev/logibone_mem device, fills the global buffer memory, gLevels, with all black, and then calls WriteLevels to write the global buffer to the display and clear the display. Once the display is cleared, the main function instantiates a pattern / animation subclass such as a radiating circle, perlin noise, or colorwash subclass. This subclass is derived from a generic pattern base class.
The generic pattern base class uses a constructor to set the height and width of the pattern to generate. Derived classes may add their own arguments to their own constructors. The base class also has two pure virtual member functions, init and next, that any derived classes must implement. The init function prepares a pattern to be displayed for the first time. It typically resets any state information back to the start of the pattern. The next function calculates the next frame of the pattern and writes that frame to the global gLevels buffer.
After main has instantiated the pattern subclass, it calls the subclass’s init funciton. Main then installs a timer that executes at 50Hz and goes to sleep. When the timer expires, a timer handler function is called. The timer handler function calls WriteLevels to write the previously computed frame in gLevels to the next available display buffer in the FPGA and makes that display buffer active. The writes to the FPGA display buffers are performed using the registers documented in the Register Interfacesection of this document.
After WriteLevels has completed, the timer handler function calls the pattern’s next member function. The next function generates the next frame in the animation, writes that frame to gLevels, and returns—without calling WriteLevels. The timer handler then sleeps until the next time the timer expires. By calling WriteLevels before callingnext, the amount of time between displayed frames will not vary even if the amount of time that next takes to execute varies between frames.
In order for animations to run smoothly, the timer handler function must complete execution before the timer expires next. This means that each frame in the animation must take less than roughly 20ms to compute.
Figure 10. Block diagram of the demonstration software stack.
The display requires only a data connection to the LogiBone FPGA board and a power connection to a +3.3V power supply to operate. These connections are detailed in the sections below.
Figure 11 below lists the connections between the PMOD connectors and the display’s data input connector. You will need to make 16 connections total between the LogiBone board and the display panel. Thirteen of these are data connections; three of these are grounds. You can either use jumper wires or the PMOD-to-display adapter board. If you use jumper wires, the wiring will look something like Figure 12. With the adapter board, it will look something like Figure 13. Note that the PMOD connectors’ pins are numbered differently than double row headers are normally numbered.
Figure 11. PMOD connector pin outs, connections between the PMOD connectors and the display input connector, and the display connector pin out.
Figure 12. LogiBone FPGA board connected to RGB LED panel using jumper wires.
Figure 13. LogiBone FPGA board connected to RGB LED panel using the PMOD-to-display adapter board.
Once the data signals have been connected, make the power supply connection to the display. Figure 14 below shows the basics. Using the DC barrel jack adapter, connect the positive terminal of the power supply to the red wire of the wire harness and connect the negative terminal of the power supply to the black wire of the wire harness. Before connecting the wire harness to the display, use a volt meter to verify the polarity of the connections. Once you’ve verified the polarity, disconnect the power and plug the wire harness into the display.
I left the spade lugs on the wire harness because I plan on using the display in a bigger project and don’t want to remove them until I’m sure I don’t need them in the bigger project. If you leave the spade lugs on too, be careful they don’t accidentally short to any other electronics. You might want to wrap them with electrical tape just to be sure. If you don’t need or want the spade connectors, feel free to cut them off, strip a bit of insulation off the wires, and connect them directly to the DC barrel jack adapter.
Figure 14. Connecting the power supply to the RGB LED panel using a female DC barrel jack adapter.