The STM32F2xx has a great SD Card interface

advertisement
The STM32F2xx has a great SD Card interface. It’s a true 4-bit parallel
interface, and in general it works pretty well. I have come across a few,
fairly minor but still significant, considerations when using the interface
that I thought I’d pass on.
Initialisation Sequence
Proper initialization of the SD card is important, because SD cards have
no reset line and it’s not going to behave if its internal state machine
wanders off to where you don’t expect it. It’s a good idea to have some
way of removing power from the card (a p-channel MOSFET for example)
so you can reset it if it goes crazy on you.
At power-up it seems to be helpful to have all signals in the idle state,
which is high. This can be done by first having the SDIO pins configured
as GPIOs, and set the GPIOs to be outputs, set high. Then you can switch
those pins to their alternate function for the SDIO port.
Don’t Stop The Clock
The SD Card specification allows for the stopping of the clock. This can be
helpful at times. The STM32F2xx allows for this, but when I first tried it, it
didn’t work – I wasn’t able to reestablish communications with the card
afterwards. Soon afterwards ST published an errata that said this feature
doesn’t work. I don’t know if they plan to fix this, and I personally don’t
care too much – I don’t plan on using it anyway. But if your application
needs this, you should probably check with ST what their intentions are.
Voltage Level Translators & SD Card Timing
The STM32F2xx can be run at 1.8V, however the SD Card is a 3.3V device
(most of them anyway). Ideally the processor would have a Vcc pin
specifically for the SDIO pins (some processors do have this) but the
STM32F2xx does not, so in this example its SDIO pins will be at 1.8V.
This is fine for reading from the SD card, because the STM32F2xx is 3.3V
tolerant on its inputs. But it’s no good for writing to the card, because the
SD card won’t recognize 1.8V as being a logic “high”. A level translator is
required.
Some processors provide a “direction” pin as part of their SDIO interface;
this can be used to drive an external lowcost bidirectional buffer. The
STM32F2xx does not provide this pin so an automatic switching
bidirectional buffer is required. ST has one, however the most commonly
used buffer appears to be the Texas Instruments TXS0108. There are
several others.
Using an external buffer substantially affects the timing for SD card reads.
Some processors provide a “clock input” pin as part of their SDIO
interface, which is the clock used for read cycles. The STM32F2xx does
not implement this either – the SDIO clock output by the STM32F2xx is
the clock used for both writes and reads.
What this means is that for a SD read cycle, the clock “arrives” at the
processor much earlier than the data does. Consider the read cycle for a
moment. The processor outputs a rising edge of the clock, and the
processor then expects to clock in data on the next falling edge of that
clock. Once the SD card sees the rising edge there will be a delay within
the card before it outputs the data. So the sequence looks like this:
Rising edge output from processor -> clock delayed through level
translator en route to card -> delay due to SD card response time and
then data is placed on bus -> data delayed through level translator en
route to processor -> processor reads data on falling edge of clock
With a 25 MHz SD clock, assuming a perfect 50% duty cycle, there’s only
20 ns between the rising and the falling edge of the clock. In those 20 ns
we need 2 trips through the level translator (clock going out and data
coming in) plus the delay due to the SD card, plus any setup time for the
processor read (which is zero thankfully). 20 ns is not sufficient – the
level translator simply isn’t fast enough, nor necessarily is the card.
You need to do the timing analysis yourself for your exact components
and system, but I think you’ll find that running at 25 MHz with a level
translator on the STM32F2xx simply isn’t possible. Somewhere between
15 – 20 MHz for the SDIO is probably where you’ll end up.
Busy Signalling and Data Transfer
The STM32F2xx SDIO port contains hardware support for the card to
signal busy. If the card cannot accept data it indicates this by pulling its
Data0 line low. Once it’s able to accept data again, it sets Data0 high and
data transfer can continue. For the most part the STM32F2xx SD interface
handles this pretty seamlessly, pausing when the card is busy and
continuing when its able. It’s quite transparent to the programmer. One
intermittent exception I’ve found is when the processor is about to start
sending data to the card. If the card is signalling ‘busy’ at the time the
processor wants to commence sending data, sometimes (not always) the
processor attempts the data transmit, stops (when it realises the card is
busy), and then generates a CRC error (SDIO_STA register bit 1). This
error halts the entire SD transfer. This behaviour is intermittent – usually
the processor handles a ‘busy’ at the beginning of a transfer normally,
however sometimes it results in an SDIO port CRC error.
The solution to this problem, quite obviously, is to wait with initiating a
data transfer until the card is not busy. There are a couple of ways of
accomplishing this. One is to tie the SDIO Data0 line to a free GPIO pin,
which you can then read to ensure the pin is high before kicking off a data
transfer. Another is to poll the card (this is what I do). Probably the
easiest thing to do is to send the card a CMD13 “SEND_STATUS”
command. The response to this command is the 32-bit “R1″ response,
which is the card’s “card status”. Bit 8 will be high if the card is ready for
data, low if the card is busy. Just sit in a loop, sending the card CMD13
commands and checking bit 8 of the response until it’s high.
SDIO_STA register TXACT bit
Be careful interpreting the TXACT bit (bit 12) in the STM32F2xx
SDIO_STA register. The documentation says:
Bit 12 TXACT: Data transmit in progress
This would imply the bit is set while a data transfer is in progress, and
clear when it’s not. You might think you can look at this bit to determine
if the SDIO port has finished transmitting data to the card, so you know
when you can start transmitting your next chunk of data to the card.
I’ve found this bit only behaves that way for single-block writes (CMD24
commands). For multi-block writes (CMD25 commands) I’ve seen this bit
remain set even after the SDIO has sent all the data to the card and the
SDIO_DCOUNT register is zero. If the card is still in its receive-data state
(state 6) at the completion of the data transfer, the TXACT bit may still be
set.
There are other ways to know if the SDIO port has finished transmitting
its data. If you’re using DMA (and you should be) then you can check
your DMA NDTR register to confirm it’s zero. The SDIO_STA register
DATAEND bit (bit 8 ) will have been set (and generated an interrupt if you
have it enabled) at the completion of the data transfer. And of course the
SDIO_DCOUNT register will be zero. You don’t need to rely on the TXACT
bit, and I suggest you don’t because it can be a bit misleading, at least
the way it’s currently documented.
CRC Error with CMD5
The SDIO peripheral calculates a CRC regardless of whether a CRC is
actually present or not. This results in the SDIO hardware generating CRC
errors in the case of commands which don’t contain a CRC. Be aware that
in the case of sending CMD5 to the card, the return data does not contain
a CRC. The SDIO hardware will generate a CRC error in this case:
CCRCFAIL bit in the SDIO_STA register will be set and may generate an
interrupt if you have the interrupt enabled in the SDIO_MASK register.
This reported CRC error is wrong – make sure your software is prepared
to accomodate this “special case” in the case of a CMD5.
Update Dec 2011: This is now mentioned in the STM32F4xx errata,
however the STM32F2xx errata still does not mention this.
Standard Peripherals Library SD Card Software
The Standard Peripherals Library for the STM32F2xx is a set of example
software routines that can be downloaded from the ST website. If you’re
using this processor the library is very valuable. Aside from providing a
bunch of examples for using different peripherals and features of the chip,
it also provides a standard set of definitions, some example start-up code,
and more.
This doesn’t mean you should blindly use this code for your production
product however. ST tries to make it clear this code is “example” code,
and in many cases that’s all it is. Certainly this statement is true for the
SD Card examples. You need to go through the code carefully and make
sure it meets your requirements, or modify it to suit your needs if it
doesn’t. It’s a great starting point, but don’t assume it’s anything more
than just a starting point.
With regards to the SD Card example code in there, I’ve come across a
few things worth noting.
Timeouts
The SD Card specification suggests timeout values for various operations,
and the STM32F2xx SDIO peripheral contains a hardware timer you can
use to implement this. It’s a simple clock counter. Alternatively you can
use one of the many general-purpose timer/counters the processor
provides.
The example SDIO code sometimes uses the SDIO “clock counter”
timeout; when it does it sets it to its maximum value. That’s not very
useful – yes it will eventually timeout, but not for a really long time.
Usually when the example code implements a timer, it uses a simple loop
counter, for example:
static SD_Error CmdError(void)
{
SD_Error errorstatus = SD_OK;
uint32_t timeout;
timeout = SDIO_CMD0TIMEOUT; /*!< 10000 */
while ((timeout > 0) && (SDIO_GetFlagStatus(SDIO_FLAG_CMDSENT) == RESET))
{
timeout--;
}
The problem with this is you’ve no idea what the value of the timeout is.
A compiler can potentially optimize it away to nothing, or it could take a
long time. In practice I’ve found these timeouts expiring very
prematurely, resulting in the functions returning errors before the SDIO
transaction has had a chance to complete. There are also many places in
the example code where there’s no timeout implemented at all, meaning
the code can potentially hang-up in those locations.
4 GB Maximum Card Size
The SDIO example code uses an unsigned 32-bit variable (a uint32_t) for
the card address. For example:
SD_ReadBlock (uint8_t *readbuff, uint32_t ReadAddr, uint16_t BlockSize)
A little math: 2^32 = 4 GB. Beyond that this address variable overflows.
SD Cards can be up to 2 TB in size (2^32 x 512 bytes). Whether this
limitation is a problem for you depends upon what kind of cards you
intend to support. You may want to consider changing the following 5
functions to use a “sector” parameter instead of an “address” parameter.
Given that modern large cards all use 512-byte sectors (or blocks) this
allows the code to match up with how the card behaves.
SD_ReadBlock (uint8_t *readbuff, uint32_t ReadAddr, uint16_t BlockSize)
SD_ReadMultiBlocks (uint8_t *readbuff, uint32_t ReadAddr, uint16_t
BlockSize, uint32_t NumberOfBlocks)
SD_WriteBlock(uint8_t *writebuff, uint32_t WriteAddr, uint16_t
BlockSize)
SD_WriteMultiBlocks (uint8_t *writebuff, uint32_t WriteAddr, uint16_t
BlockSize, uint32_t NumberOfBlocks)
SD_Erase(uint32_t startaddr, uint32_t endaddr)
SD Card Initialisation
This was mentioned earlier, but just to reiterate. Card initialization seems
to be more reliable if the SDIO pins are placed in a high “idle” state
before the pins are switched to the SDIO peripheral “alternate function”.
The example code does not do this.
Blocksize
I haven’t personally experienced this, but it’s been reported on the
forums that some smaller cards (eg 2 GB) have problems because the
example read and write functions do not issue a CMD16 blocksize
command to the card before performing the transaction. SDHC cards have
a fixed blocksize of 512 bytes and do not require the CDM16 command,
however non-SDHC cards do need that command.
SDIO_SetPowerState() function
Many thanks to Brad & Andrew over at the STM32 forum for finding this
one. The issue is that during SDIO port power-up, which is part of the
SDIO initialisation routines, the power-up may not always succeed. If you
find the function SDIO_SetPowerState() contains this:
SDIO->POWER &= PWR_PWRCTRL_MASK;
SDIO->POWER |= SDIO_PowerState;
then try changing those two lines to this:
if (SDIO_PowerState == SDIO_PowerState_ON)
SDIO->POWER |= SDIO_PowerState;
else
SDIO->POWER &= PWR_PWRCTRL_MASK;
I believe the problem with the original code is described in the
documentation for the SDIO_POWER register:
Note: At least seven HCLK clock periods are needed between two write
accesses to this register.
Note: After a data write, data cannot be written to this register for three
SDIOCLK (48 MHz) clock
periods plus two PCLK2 clock periods.
You can see the original code does two writes to the register in quick
succession. That would be bad. This code change helped things for me.
SD_SendSDStatus() function
Updated 7 Dec 2011. This was a tough one to reliably reproduce and
hence to find. In the STM32F2xx SD code, the function SD_Init() calls the
function SD_GetCardStatus() which in turn calls the function
SendSDStatus() passing it a pointer to a buffer, like so:
errorstatus = SD_SendSDStatus((uint32_t *)SDSTATUS_Tab);
The purpose of SD_SendSDStatus() is not immediately obvious from the
code (read the comment block for SD_GetCardStatus() if you need a good
laugh), but what it does is send an ACMD13 command to the card. It then
retrieves the 512-bit status that the card sends in reply and writes it into
the buffer. The problem? This:
static uint8_t SDSTATUS_Tab[16];
By my math, 16 bytes = 128 bits. So what happens is that
SD_SendSDStatus() writes 64 bytes of data into a 16 byte buffer,
resulting in a big buffer overrun and a bunch of innocent SRAM locations
being stomped on. Which creates all manner of flakely problems. The
simple and obvious fix is to increase the size of SDSTATUS_Tab, although
a more robust solution would include a rewrite of SD_SendSDStatus().
Summary
It should be clear by now that the standard peripheral library SDIO code
cannot reliably be used as-is. It contains far too many limitations, ranging
from a serious lack of error-handling (and is sometimes error-generating)
to outright functional restrictions. It’s a good starting point to show how
things can work, but it’s far from being production-ready. For any real
product you have no choice except to grab a copy of the SD Card
specification and get busy. With that said, I’ve found the SD Card
interface on the STM32F2xx to perform pretty well and the library code to
be a big time-saver. Just don’t expect it to be production-ready code.
More information can be found in the posting: SDIO Interface Part 2.

RSS feed for comments on this post
8 RESPONSES
1. Memphis
15|Nov|2011
1
The biggest problem is that reference manual has too few informations
about how to exactly do the read and write operations with DMA support.
The given examples are really for a start point but the production code
must be totally rewriten by yourself (means the Utilities directory too!),
only a library can be used as it is, but sometimes it needs a little fix too.
And also i am little bit suprised about SDIO, i expected more self work
HW than making a bunch of code just for a simple read and write
operations. I comapred the SPI library for SD card with the SDIO and it
looks that simple SPI are written with less code, which doesnt make sense
:-/
2. Edward Keyes
03|Jan|2012
2
Just wanted to say thanks a million for documenting these issues with the
ST library code. I was tearing my hair out about the block-size problem:
“Why am I only getting the first 8 bytes of each sector?!” Just the mere
mention of that issue saved me a ton of debugging time…
3. Décio
20|Feb|2012
3
Have you found any issues with the order of CPSM/DPSM/DMA
initialization in the SD_ReadBlock() and SD_WriteBlock() functions?
What the sample SD_ReadBlock() does is (in summary):
1. SDIO_DataConfig(…);
2. SDIO_SendCommand(…);
3. Enable DMA
It’s possible (particularly in interrupt heavy code) for it to take a long
time between steps 2 and 3. In my code I changed the order to 1, 3, 2
and I don’t recall ever having read errors again.
As for SD_WriteBlock(), what it does is (again in summary):
1. SDIO_SendCommand(…);
2. SDIO_DataConfig(…);
3. Enable DMA
Now I don’t see why this should be a problem, even with many interrupts
firing, unless there’s some kind of timeout that the SD card enforces
which I missed from reading the spec. Yet, from time to time, some of my
cards — particularly some old 1 GB and 2 GB SD (not SDHC) cards — will
timeout after attempting a write, and inspecting the DMA registers reveals
that DMA_CNTDR (i.e. the remaining number of words to send) is
something like 125 out of the original 128 (128 words x 4 bytes/word =
512 bytes).
The impression I get is that the transmission started before enabling
DMA, and by the time it was enabled, it was near the end of the transfer
and it only managed to send a few words before the transfer completed. I
doubt this explanation is right, though, seeing as the FIFO should be
empty before enabling the DMA, and so I don’t see how the DPSM would
go from the Wait_S to the Send state.
4. frank
20|Feb|2012
4
For both a read and a write, it’s important to setup the DMA before
initiating the data transfer. For the read, you’ve already worked that out.
For the write, you need the DMA to be enabled before starting the write.
Remember that when you setup the SDIO DMA, the DMA is configured as
“peripheral controlled”. The SDIO is the only peripheral with this ability.
This means that even though you’ve enabled the DMA, the DMA is not
actually doing anything – it’s just sitting there, waiting for the peripheral
(the SDIO) to tell it to run. There’s no harm in having the DMA enabled
beforehand.
But for the write it is important. Because once the SDIO commences the
write to the card, it pretty much immediately requires data to give to the
card. If it doesn’t get it pronto, you’ll end up with an SDIO error (usually
a FIFO error because the transmit FIFO underran).
5. Décio
21|Feb|2012
5
Thanks for the tip. So I guess there are two possibilities for the order of
the write operation: 3, 1, 2 or 3, 2, 1.
I believe 3, 1, 2 may work, but again, if an interrupt fires between 1 and
2, and if the card enforces some sort of timeout which I don’t know
about, a write may be dead before it starts. But if there’s no such
timeout, then this should work.
As for 3, 2, 1, it might be completely safe, since when the command is
sent the DPSM and the DMA are already configured, so there’s no risk of
underrun. But this also depends on the SDIO peripheral not starting the
data transfer immediately after doing 2 (SDIO_DataConfig(…)). Here’s
what the reference manual says:
“Depending on the transfer direction (send or receive), the data path
state machine (DPSM) moves to the Wait_S or Wait_R state when it is
enabled:
? Send: the DPSM moves to the Wait_S state. If there is data in the
transmit FIFO, the DPSM moves to the Send state, and the data path
subunit starts sending data to a card.”
So my understanding is, if DMA’s already configured, then the moment I
call SDIO_DataConfig(), it’ll start the transfer, even before the command
is sent. So I guess this order is out. Do you agree?
Also, if you’ll allow me to contribute a tidbit that wasn’t mentioned in your
post: SD_FindSCR() from ST’s SDIO driver is also buggy. When reading
the response to SD_CMD_SD_APP_SEND_SCR, the code sometimes gets
stuck in the loop after reading the full 64 bits of the response — somehow
it doesn’t realize the transfer has ended. You might add some code to
break from the loop after reading 64 bits, but it feels like a kludge. One of
my attempts to fix it was using DMA for the transfer. At that time I didn’t
know about the ordering of DMA and command transmission, so I did it in
the wrong order and still had problems from time to time. Which is weird,
since the SDIO peripheral’s FIFO is deep enough to hold the 64 bits of the
response to that command, but still, I had problems. Maybe if I tried
changing the order, it might always work.
But before trying that I realized that there’s no point in calling this
function. It’s only used at one point in the driver code, where it queries
the SCR to see if the card supports 4-bit mode before enabling it. But the
SD simplified physical layer spec section 5.6 says quite clearly: “Since the
SD Memory Card shall support at least the two bus modes 1-bit or 4-bit
width, then any SD Card shall set at least bits 0 and 2
(SD_BUS_WIDTH=”0101″).” Hence, at least for this one use, there’s no
need to query the SCR since we know the answer beforehand. If you have
no other uses for the SCR (and I didn’t in my FATFS port) then you can
just do away with SD_FindSCR().
Hope this is useful for you or someone else.
6. Décio
21|Feb|2012
6
I’ve found some guidance from ST’s own reference manual (for the
STM32F103xx series) regarding the order of operations of a write. From
section 22.3.2 of the latest manual:
***
SDIO/DMA interface: procedure for data transfers between the SDIO and
memory
In the example shown, the transfer is from the SDIO host controller to an
MMC (512 bytes using CMD24 (WRITE_BLOCK). The SDIO FIFO is filled by
data stored in a memory using the DMA controller.
1. Do the card identification process
2. Increase the SDIO_CK frequency
3. Select the card by sending CMD7
4. Configure the DMA2 as follows:
a) Enable DMA2 controller and clear any pending interrupts
b) Program the DMA2_Channel4 source address register with the memory
location’s base address and DMA2_Channel4 destination address register
with the SDIO_FIFO register address
c) Program DMA2_Channel4 control register (memory increment, not
peripheral increment, peripheral and source width is word size)
d) Enable DMA2_Channel4
??5. Send CMD24 (WRITE_BLOCK) as follows:
a) Program the SDIO data length register (SDIO data timer register
should be already programmed before the card identification process)
b) Program the SDIO argument register with the address location of the
card where data is to be transferred
c) Program the SDIO command register: CmdIndex with 24
(WRITE_BLOCK); WaitResp with ‘1’ (SDIO card host waits for a
response); CPSMEN with ‘1’ (SDIO card host enabled to send a
command). Other fields are at their reset value.
d) Wait for SDIO_STA[6] = CMDREND interrupt, then program the SDIO
data control register: DTEN with ‘1’ (SDIO card host enabled to send
data); DTDIR with ‘0’ (from controller to card); DTMODE with ‘0’ (block
data transfer); DMAEN with ‘1’ (DMA enabled); DBLOCKSIZE with 0×9
(512 bytes). Other fields are don’t care.
e) Wait for SDIO_STA[10] = DBCKEND
6. Check that no channels are still enabled by polling the DMA Enabled
Channel Status register.
***
So I guess the right order is 3, 1, 2 indeed. I’ll try it again and report on
my findings.
7. Andy
02|Aug|2012
7
I appreciate creation of the blog.
I’ve port the STM’s mass storage examle into STM32F103VET – based dev
eval board from China.
It’s been taken a while, here are a few caveats for there who whold like to
repeat the excersise
unzip recent and well paired versions (destributed as one zip file) of
CMSIS, STM32F10x_StdPeriph_Driver, and STM32_USB-FS-Device_Driver
from STM’s webpage
A few trivial C compilation bugs for making it compiled with gcc.
My board has SD slot built in connected into SDIO. That’s why I’ve put DUSE_STM3210E_EVAL -DSTM32F10X_HD -DUSE_STDPERIPH_DRIVER
into my project main makefile.
disable card detect pin = get rid code around SD_DETECT_PIN and make
uint8_t SD_Detect(void) always returning SD_PRESENT
Example check if interrupts are still working = make SysTick running,
place non-0 size ISR stack out of main stack in linker script
(stm32f10x_flash_hd.ld in my case)
Connect interrupt service routines into your startup – these names were
defined in startup_stm32f10x_hd.S file in my case.
Removed code for LM75 / keys / leds / joy
Removed code around FSMC and MAL_Init(1) (no such device on my
board, just one SD card connected into SDIO, so MAL_Init(0) = one USB
device suffice).
Implemented many robustness fixes from this thread, but it still did not
work really.
Block size reported/read from the 2GB SD card as 1024 B (and time spent
on trying to correct it) was not really causing an issue (I came back into
original max of 512B finaly).
It was kind of working (USB device found under windows), however no
operation were possible. External disk was showing up, but no label / nor
read operation were generally possible. I saw disk label 2-3 times
recognized only during 2 days. Solution: My cpu was rinning at max
allowed speed of 72MHz. After making the change, I can see USB
device/disk, and file operations working nice:)
/**
* @brief SDIO Data Transfer Frequency (25MHz max)
*/
//#define SDIO_TRANSFER_CLK_DIV ((uint8_t)0×01)
#define SDIO_TRANSFER_CLK_DIV ((uint8_t)0×04)
Thanks for creating the lib and the post (giving a good feedback) so fixes
can be pulled into recent versions of the lib by authors) again.
8. Root
03|Oct|2012
8
Hi. Thanks for the informative post.
Concerning the TXACT flag, I believe I’ve discovered that this flag actually
seems to indicate when the chip is done writing its data and is ready to
receive more. Have you or anyone else seen this or been able to confirm?
I am trying to find a way to trigger an interrupt when the DAT0 line
comes up after a write, but have not found a way yet. Catching the
interrupt off of the TXACT flag seems to only generate an IRQ when
TXACT is active, and not when it becomes inactive, which is what I want.
The idea of tying DAT0 to another pin to use as an input could, however,
solve this problem. Has anyone tried this that you know of.
Thanks again.
A very significant limitation with the STM32F4xx family (STM32F405 / 407 / 415 / 417) is that fully a
third of its internal RAM is inaccessible to the DMA controller. Of the 192 kB of available RAM, only 128
kB can be accessed by the DMA. The other 64 kB, known as the CCM, cannot be read or written by DMA.
For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a
glaring oversight. DSP type operations are all about reading data in, processing the data, and writing the
resultant data out. Two of those three tasks require the DMA if they’re to be performed efficiently, and
on the STM32F4xx family the DMA is unusable for a third of its RAM. For me personally, coming from a
long DSP background, this stilted memory architecture is crazy beyond words.
Still, it’s not the first time the hardware designers have made life tough for the software folks, and it
won’t be the last. We just have to deal with it as best we can. I’ve been attempting to get the SDIO SD
Card interface working under interrupt, so that the additional 64 kB of RAM we’re paying for can be
accessed by the SDIO. This post will share a few things I’ve learned.
ST SD Card Interrupt Examples
As far as I can find, there aren’t any. I’ve looked through both the STM32F2xx and STM32F4xx software
examples, and it all uses DMA exclusively for the data handling. If you come across any ST example
code doing SD card data handling via interrupt, please let me know.
Double-Handling the Data
This is an option, and I have considered it. The idea would be (as an example):



DMA data from SD card into the 128 kB of RAM
Software copy the data into the 64 kB of RAM
process the data


Software copy the results from the 64 kB of RAM into the 128 kB of RAM
DMA the results from the 128 kB of RAM to the SD card
Obviously what I’ve listed is worst-case and ugly as sin. You really wouldn’t want to do it. Still, if you
did, an efficient software-copy routine would be essential. This Stellaris forum posting contains details
for a fast assembler Cortex-M3/M4 memory copy routine. I’ve played with it and it works well.
SDIO Requests More Data Then It Needs
If you’re using the STM32F2xx / STM32F4xx SDIO to transmit data to an SD Card, under interrupt you’ll
probably be using the “transmit FIFO half empty” TXFIFOHE interrupt flag. When this triggers, you know
your interrupt handler software needs to write 8 words (32 bytes) to the SDIO FIFO.
The problem is that the SDIO will request more data than what it actually requires, which could, if you’re
not careful, result in you reading past the end of your data buffer, possibly generating some kind of a
bus fault or hard fault. To explain, take a look at this example code snippet from within an SDIO
interrupt handler:
if (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
ptr = source_addr;
// address of source data to Tx to card
while (SDIO->STA & SDIO_FLAG_TXFIFOHE) {
BUTTON_OUT_HIGH
SDIO->FIFO = *ptr++;
// write first word (32 bits = 4 bytes) to the
FIFO
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
SDIO->FIFO = *ptr++;
// 8th word of data written to the SDIO FIFO
BUTTON_OUT_LOW
}
source_addr = ptr;
// remember data position for next time
}
You can see it’s checking to see if the Tx FIFO Half-Empty flag is set, and if so, it writes 8 words (32
bytes) of data to the FIFO, updates its data pointer, and that’s it. We’ve made it slightly more efficient
by wrapping it in the while() loop, so it does it repeatedly until the Tx FIFO is no longer needing more
data – this allows it to more quickly fill the FIFO at startup when the FIFO is empty.
The BUTTON_OUT_xxxx sets a GPIO pin so we can see on the oscilloscope what’s happening.
When writing a single block / sector to the SD card, which is 512 bytes, we would expect to see 512 / 32
= 16 writes (of 32 bytes) to the FIFO. Let’s look at the scope:
There are a few things of great interest to be seen here.
At the start of the scope plot, on the left, we can see 4 writes in very quick succession. This is thanks to
the while() loop in the code. The SDIO Tx FIFO is 32 words deep, so the TXFIFOHE remains set until the
FIFO is full, which requires 4 sets of 8 words to be written. This is good – we’re getting the Tx FIFO filled
very quickly.
If we count the total number of writes on the scope plot, we see 19. Huh? We expected to see 16; what
gives? 19 means we’ve read 608 bytes from our data buffer (actually: right past the end of our data
buffer) and given it to the SDIO; that’s too much for a 512 byte write. The reason is the title of this
section: the SDIO requests more data than it needs. It appears the designers of the SDIO block did not
give it the intelligence to compare its FIFO level with its DCOUNT register. If the FIFO contains sufficient
empty space to accept another 8 words, it will set its TXFIFOHE flag to request more data, EVEN
THOUGH IT DOES NOT NEED IT TO COMPLETE THE CURRENT TRANSFER. Be aware of this.
Changing our SDIO IRQ handler slightly to consider the DCOUNT register, for example like this:
if ((SDIO->STA & SDIO_FLAG_TXFIFOHE) && (SDIO->DCOUNT >= 32)) {
does not help, because we cannot know the amount of data currently held in the FIFO.
To deal with this, you need to keep your own “data remaining count” variable, which you can count
down as you give data to the SDIO FIFO. Then when your count variable reaches zero, you should turn
off the TXFIFOHE interrupt (by clearing its bit in the SDIO->MASK register).
Something else to note from this scope capture is the interrupt rate and CPU utilisation. In this example
the SDIO clock is 20 MHz, meaning we can write data to the card at 10 MB/s. Given that we’re writing
32 bytes at a time (except at the very beginning where we write 4 times that), we calculate we’re
writing data every 3.2 microseconds. The scope shot bears this out. This corresponds to an interrupt
rate of 312.5 kHz! This is a very high rate for a small processor, and the CPU utilisation should be
expected to be high. From the scope shot we can estimate we’re spending about 12% – 15% of our 120
MHz processor doing nothing except servicing these SDIO interrupts. It’s a steep price to pay for making
so much RAM inaccessible to the DMA.
Tx FIFO Underrun
Getting data transmit (send data to the card) to startup properly on the SMT32F4xx / 2xx can be very
tricky. Here’s my understanding.
When you enable the SDIO (via the DTEN bit in the SDIO_DCTRL register) the FIFO is empty. So the
TXFIFOHE interrupt will trigger immediately, and at the same time the SDIO peripheral will start
attempting to write data to the SD card. Hence data must appear in the Tx FIFO extremely quickly,
otherwise a Tx FIFO underrun will occur and the SDIO peripheral will shut down.
It is not possible to pre-load the FIFO before enabling the SDIO. I’ve tried and it doesn’t work. I believe
the FIFO is hardware-cleared until the SDIO is enabled, or something similar to that.
What this means is that at the moment of SDIO turn-on (when the DTEN bit is set), that TXFIFOHE
interrupt must trigger. At that point in time it must be the highest priority interrupt in the system, or be
the only interrupt. If it’s delayed for any reason, for example because another interrupt occurs at that
time, then a Tx FIFO underrun will very quickly follow. Think very carefully about your enabled
interrupts at that critical SDIO transmit start-up point. You may want to consider using the NVIC to
make the SDIO be the highest priority interrupt, permitted to preempt all other interrupts. Or, come up
with some other scheme to ensure that first TXFIFOHE interrupt can execute immediately.
SDIOIT Status Bit
The SDIO_STA status register contains the SDIOIT bit with a very vague description. I’ve seen this bit
being set from time to time but I’ve never worked out what it means. If you understand what it actually
represents, please let me know.

RSS feed for comments on this post
6 RESPONSES
1. Stefan
09|Jun|2012
1
Very useful, explained in great detail. Thanks a lot! Out of curiosity: Did you try to operate the SD
interface at more than 25MHz? 48MHz should be possible? What was the best write speed you could
manage?
2. frank
11|Jun|2012
2
I’ve not personally run faster than 20 MHz, due to the fact that my hardware has a level converter (1.8V
– 3.3V) between the STM32 and the SD Card. Given that, the fastest write speed I’ve achieved has been
9 MB/s, which actually isn’t too bad (I’ll admit it, I was happy). I’ve no doubt faster speeds are possible,
but you’d certainly need to run the STM32 at 3.3V so you can avoid a level converter to make those
faster SD Card clock speeds possible.
3. Tobias
15|Jun|2012
3
Remember to align your SD-Card (if you use a FS) and write only Data-Blocks with a SD-Erase-SectorSize multiplier. Then you will get easily a faster write-speed.
4. Christopher James Huff
18|Sep|2012
4
“For a Cortex-M4 processor that is promoted using DSP type benchmarks (filters and FFTs etc), this is a
glaring oversight. ”
It’s not an oversight, it’s a feature, one that’s actually oriented toward DMA-heavy DSP applications. The
CCM can be accessed by the core without competing with peripherals for the AHB bus. You can DMA to a
buffer in main memory, use CCM for intermediate values, and store the result in main memory for DMA
output, with the core only hitting main memory when it needs to load new data or store the results of
computations.
In fact, the main memory is split into 112 and 16 kB blocks with separate connections to the AHB
matrix, so if you arrange things carefully, you can have two simultaneous DMA operations going while
the core is happily crunching data, without any contention for memory accesses.
Not having to wait for the AHB to become available can also be important for some hard-realtime tasks.
The CCM is always immediately available.
5. frank
18|Sep|2012
5
Your comment is very valid and I’m glad you posted it. I personally believe (based on my experience)
that the size of the CCM (as a percentage of the total memory of the processor) is far too large for
“intermediate values”. Others may disagree with me, and I certainly hope they do.
6. Christopher James Huff
22|Sep|2012
6
Well, if you’re devoting the bulk of your memory to one input buffer, one output buffer, and one working
buffer, all of equal size, then it’s pretty close to ideal. CCM can also be a good place to put stacks, task
state, application variables, etc. I’ve been using it as main memory, with the system memory set aside
for DMA buffers. Some more flexibility would be nice, but it seems a reasonable compromise.
If you can execute code from CCM, that’d be another possible application for hard-realtime tasks, due to
more deterministic execution speed…no waiting for code to load from flash or for a peripheral to finish
up with the system SRAM. I’m not sure if this can actually be done, though…the AHB diagram seems to
indicate that the core’s instruction bus can only be connected to the flash, the 112 kB block of system
SRAM, or the FSMC. (An aside: the 16 kB block of system RAM appears only accessible by the core via
the system bus, which might make it a particularly bad location for a stack…which happens to be a setup
I’ve seen often in online examples.)
Download