Monday, May 10, 2004
LAN91C111 Driver for XScale
Our XScale CPU based platform has engaged two LAN91C111 chips for the Ethernet function. We got the driver for LAN91C111 from SMSC, but it doesn't works very well. The speed is slow, sometime it will lock up while transfer data from a network share folder.
After debugging, we found the driver will get lots of RCV_OVERRUN_INTs (receiving overrun interrupts) while transferring data. It's means transferring data from LAN91C111 chip to system memory is not fast enough.
Since some developers ever mentioned utilizing XScale DMA channel to transfer data from LAN91C111 chip to system memory will solve such problem, we have tried using DMA to transfer the data.
Currently we have added DMA data transfer code in the driver, but so far it now works. we found when we transferring data from one memory loaction to another via DMA , it works fine. But if we transferring data from LAN91C111 Data port to system memory, the data is not correct. I'll try to figure out why that happened.
After debugging, we found the driver will get lots of RCV_OVERRUN_INTs (receiving overrun interrupts) while transferring data. It's means transferring data from LAN91C111 chip to system memory is not fast enough.
Since some developers ever mentioned utilizing XScale DMA channel to transfer data from LAN91C111 chip to system memory will solve such problem, we have tried using DMA to transfer the data.
Currently we have added DMA data transfer code in the driver, but so far it now works. we found when we transferring data from one memory loaction to another via DMA , it works fine. But if we transferring data from LAN91C111 Data port to system memory, the data is not correct. I'll try to figure out why that happened.
Comments:
<< Home
I have made two mistakes:
1. When set the source address of the DMA, I used the starting address of LAN91C111 chip. The correct value is chip address plus 0x300 offset. The IO port of LAN91C111 is locate at offset 0x300 of the chip.
2. The data transfer length of the DMA is not correct. Original code is:
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, iLength);
Here 'iLength' should be 4*iLength in bytes.
So far the DMA works but the performance even worse. I need more work on this.
1. When set the source address of the DMA, I used the starting address of LAN91C111 chip. The correct value is chip address plus 0x300 offset. The IO port of LAN91C111 is locate at offset 0x300 of the chip.
2. The data transfer length of the DMA is not correct. Original code is:
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, iLength);
Here 'iLength' should be 4*iLength in bytes.
So far the DMA works but the performance even worse. I need more work on this.
Here is a chart for the testing result:
In order to measure the I/O performance of LAN91C111 chip, I wrapped GPIO calls with NdisRawReadPortBufferUlong and NdisRawReadPortBufferUlong function. My code is looks like this:
For Reading
while(1)
{
v_pGPIOReg->GPSR_x |= GPIO_28;
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, 10);
v_pGPIOReg->GPCR_x |= GPIO_28;
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, 10);
}
For Writing:
while(1)
{
v_pGPIOReg->GPSR_x |= GPIO_28;
NdisRawWritePortBufferUlong(IOBase + BANK2_DATA1, Adapter->TxBuffer, 10);
v_pGPIOReg->GPCR_x |= GPIO_28;
NdisRawWritePortBufferUlong(IOBase + BANK2_DATA1, Adapter->TxBuffer, 10);
}
Here is a chart for the testing result:
Function calls ----- Data Transferred ----- Time used (microseconds)
NdisRawReadPortBufferUlong ----- 40 Bytes ----- 9.7
NdisRawWritePortBufferUlong ----- 40 Bytes ----- 3.5
From the testing result my calculation is:
Maximum Reading Speed = 3.92MB/S
Maximum Writing Speed = 10.89MB/S
Looks like the read/write speed is fast enough. So the problem is not the I/O access speed. The problem should relate to the driver structure or the chip limitation( someone mentioned that LAN91C111 internal buffer can only hold at the most 4 packets, that maybe the main hardware reason that cause receive overrun )
In order to measure the I/O performance of LAN91C111 chip, I wrapped GPIO calls with NdisRawReadPortBufferUlong and NdisRawReadPortBufferUlong function. My code is looks like this:
For Reading
while(1)
{
v_pGPIOReg->GPSR_x |= GPIO_28;
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, 10);
v_pGPIOReg->GPCR_x |= GPIO_28;
NdisRawReadPortBufferUlong(DataPort, (ULONG *)ReadBuffer, 10);
}
For Writing:
while(1)
{
v_pGPIOReg->GPSR_x |= GPIO_28;
NdisRawWritePortBufferUlong(IOBase + BANK2_DATA1, Adapter->TxBuffer, 10);
v_pGPIOReg->GPCR_x |= GPIO_28;
NdisRawWritePortBufferUlong(IOBase + BANK2_DATA1, Adapter->TxBuffer, 10);
}
Here is a chart for the testing result:
Function calls ----- Data Transferred ----- Time used (microseconds)
NdisRawReadPortBufferUlong ----- 40 Bytes ----- 9.7
NdisRawWritePortBufferUlong ----- 40 Bytes ----- 3.5
From the testing result my calculation is:
Maximum Reading Speed = 3.92MB/S
Maximum Writing Speed = 10.89MB/S
Looks like the read/write speed is fast enough. So the problem is not the I/O access speed. The problem should relate to the driver structure or the chip limitation( someone mentioned that LAN91C111 internal buffer can only hold at the most 4 packets, that maybe the main hardware reason that cause receive overrun )
The original driver developer from SMSC has following suggestion regarding the performance issue:
Make sure that the CPU/System has enough performance even under busy(!) conditions. When a bigger SW stack is involved (using network shared folders is another SW layer over TCP/IP), the perfcormance goes down dramatically. For testing the pure driver performance the system should be as lean as possible, not a full-blown one (e.g. Firewall, Bridging, IPv6, ICS, several servers...removed). Especially when it comes to networking features, only TCP/IP should be running with max. the FTP server. Is the system interrupt latency o.k., or do interrupts have to wait too long until serviced? Then RCV_OVRN interrupts occur as well. These are overall system performance related issues.
What about the media used (Koax/Twisted-pair) and Half Duplex/Full Duplex? Assuming Twisted-pair with Full Duplex. Was the initialization done correctly. When using Full Duplex with Twisted-pair, SWFDPLX bit in TCR must be set. When playing with the rigestry setting this might go wrong. Make sure the registry setting are all default as described in the readme.pdf of the driver. The register settings after initialization should be checked anyway. You can use the DumpRegister function included in the driver.
The pure 91c111 HW chip access (~4/11 MB/s for read/write) doesn't seem too bad. Actually this is pretty good. But don't forget, this is just when one interrupt is serviced, and only for 10 32bit-accesses.
SW related.
This all relates to driver version 2.0 - the latest, available on the Web site.
If the Auto-Release feature is used, try to switch it off. Not sure if this has ever been tested sufficient. The feature is handled with the SMSC_AUTO_RELEASE #define.
Furthermore a few source code changes are recommended due to improvements or bugs:
file LAN91C111_Intr.C
In function MinportHandleInterrupt the order of the processed interrupts can be changed. If the TX_Interrupt is moved to the beginning of the interrupt handler list (before the RCV_Interrupt_Handler which is first now), TX interrupts get a higher priority over RCV interrupts and therefore memory from the (already) sent packets is released faster to be available for the receive process again.
file LAN91C111_Intr.C
In function AllocIntEnabler: OldIntReg |= INT_ALLOC << 8; //Mask bits in high byte Add '<<8' to this line. If the bit is not shifted to the high byte, ALLOC_INT interrupt is never enabled.
file LAN91C111_Miniport.C
In MiniportReset: NdisRawReadPortUshort(IOBase + BANK2_MMU_CMD, (PUSHORT) &TempWord);
The address operator is missing for the TempWord variable in the while loop. It must be the same as 2 line before. There the address operator is not missing.
The last one is to change the behaviour of the MiniportCheckforHang function and affects several files. (Adapter.h, Miniport.c, Init.c, Intr.c) This should avoid a few unnecessary driver resets.
1) The function now uses an additional global variable Adapter.TXCounter. This is to determine better when the send process is currently not using any memory pages. (test for Adapter->TXCounter == 0)
The variable TXCounter needs to be defined in the header file, initialized, and maintained (incremented/decremented).
Definition:
In file Adapter.h, global structure MINIPORT_ADAPTER.
USHORT TXCounter; //Counter for packets currently being processed
Initialization:
In file Init.c, function AdapterReset and Miniport.c, function MiniportReset. It's best after the queues are initialized, at the same location in both files.
//Initialize Queues.
ClearPacketQue(Adapter->AckPending);
ClearPacketQue(Adapter->AllocPending);
Adapter->AllocIntPending = FALSE;
Adapter->TXCounter = 0;
Maintenance:
Incremented
In file Miniport.c function AdapterAllocBuffer right after the MMU allocation command.
NdisRawWritePortUshort(IOBase + BANK2_MMU_CMD,(USHORT)(CMD_ALLOC));
Adapter->TXCounter++; //Adjust number of TX packets being processed
Decremented
In file Intr.c, in functions ALLOC_Interrupt_Handler, TX_Interrupt_Handler and EPH_Interrupt_Handler (3 occurrencies) always right the MMU release command.
NdisRawWritePortUshort(IOBase + BANK2_MMU_CMD,(USHORT) CMD_REL_SPEC);
Adapter->TXCounter--; //Adjust number of TX packets being processed
The receive process is checked as before. (test for FIFO & 0x8000)
2) When the send and receive process are currently not using any memory pages, all memory must still be available. (no memory leakage) This is tested with the MIR register. But now, MIR is only checked for less than MEM_PAGES minus 1 instead of less than MEM_PAGES. 'Minus 1', because one packet might already be on the way of being received and a page already allocated by the MMU. MEM_PAGES must still be defined in the Header file with '#define MEM_PAGES 4'.
Now the CheckforHang function should be changed to this one:
BOOLEAN LAN91C111_MiniportCheckforHang (NDIS_HANDLE AdapterContext) {
MINIPORT_ADAPTER *Adapter = (MINIPORT_ADAPTER*)AdapterContext;
USHORT FIFO, MIR;
BOOLEAN RetVal = FALSE;
NdisRawWritePortUshort(Adapter->IOBase + BANK_SELECT, 0);
NdisRawReadPortUshort(Adapter->IOBase + BANK0_MIR, &MIR);
NdisRawWritePortUshort(Adapter->IOBase + BANK_SELECT, 2);
NdisRawReadPortUshort(Adapter->IOBase + BANK2_FIFOS, &FIFO);
//check memory allocation for TX and RX process
if ((Adapter->TXCounter == 0) && (FIFO & 0x8000))
{
//If Nothing in TX and RX then all memory (or -1 page) should be available!
//(-1, because one packet might already be on the way of being received and a page already
allocated)
if ((MIR >> 8) < MEM_PAGES-1)
RetVal = TRUE; //Reset will occur!
}
return RetVal;
}
Make sure that the CPU/System has enough performance even under busy(!) conditions. When a bigger SW stack is involved (using network shared folders is another SW layer over TCP/IP), the perfcormance goes down dramatically. For testing the pure driver performance the system should be as lean as possible, not a full-blown one (e.g. Firewall, Bridging, IPv6, ICS, several servers...removed). Especially when it comes to networking features, only TCP/IP should be running with max. the FTP server. Is the system interrupt latency o.k., or do interrupts have to wait too long until serviced? Then RCV_OVRN interrupts occur as well. These are overall system performance related issues.
What about the media used (Koax/Twisted-pair) and Half Duplex/Full Duplex? Assuming Twisted-pair with Full Duplex. Was the initialization done correctly. When using Full Duplex with Twisted-pair, SWFDPLX bit in TCR must be set. When playing with the rigestry setting this might go wrong. Make sure the registry setting are all default as described in the readme.pdf of the driver. The register settings after initialization should be checked anyway. You can use the DumpRegister function included in the driver.
The pure 91c111 HW chip access (~4/11 MB/s for read/write) doesn't seem too bad. Actually this is pretty good. But don't forget, this is just when one interrupt is serviced, and only for 10 32bit-accesses.
SW related.
This all relates to driver version 2.0 - the latest, available on the Web site.
If the Auto-Release feature is used, try to switch it off. Not sure if this has ever been tested sufficient. The feature is handled with the SMSC_AUTO_RELEASE #define.
Furthermore a few source code changes are recommended due to improvements or bugs:
file LAN91C111_Intr.C
In function MinportHandleInterrupt the order of the processed interrupts can be changed. If the TX_Interrupt is moved to the beginning of the interrupt handler list (before the RCV_Interrupt_Handler which is first now), TX interrupts get a higher priority over RCV interrupts and therefore memory from the (already) sent packets is released faster to be available for the receive process again.
file LAN91C111_Intr.C
In function AllocIntEnabler: OldIntReg |= INT_ALLOC << 8; //Mask bits in high byte Add '<<8' to this line. If the bit is not shifted to the high byte, ALLOC_INT interrupt is never enabled.
file LAN91C111_Miniport.C
In MiniportReset: NdisRawReadPortUshort(IOBase + BANK2_MMU_CMD, (PUSHORT) &TempWord);
The address operator is missing for the TempWord variable in the while loop. It must be the same as 2 line before. There the address operator is not missing.
The last one is to change the behaviour of the MiniportCheckforHang function and affects several files. (Adapter.h, Miniport.c, Init.c, Intr.c) This should avoid a few unnecessary driver resets.
1) The function now uses an additional global variable Adapter.TXCounter. This is to determine better when the send process is currently not using any memory pages. (test for Adapter->TXCounter == 0)
The variable TXCounter needs to be defined in the header file, initialized, and maintained (incremented/decremented).
Definition:
In file Adapter.h, global structure MINIPORT_ADAPTER.
USHORT TXCounter; //Counter for packets currently being processed
Initialization:
In file Init.c, function AdapterReset and Miniport.c, function MiniportReset. It's best after the queues are initialized, at the same location in both files.
//Initialize Queues.
ClearPacketQue(Adapter->AckPending);
ClearPacketQue(Adapter->AllocPending);
Adapter->AllocIntPending = FALSE;
Adapter->TXCounter = 0;
Maintenance:
Incremented
In file Miniport.c function AdapterAllocBuffer right after the MMU allocation command.
NdisRawWritePortUshort(IOBase + BANK2_MMU_CMD,(USHORT)(CMD_ALLOC));
Adapter->TXCounter++; //Adjust number of TX packets being processed
Decremented
In file Intr.c, in functions ALLOC_Interrupt_Handler, TX_Interrupt_Handler and EPH_Interrupt_Handler (3 occurrencies) always right the MMU release command.
NdisRawWritePortUshort(IOBase + BANK2_MMU_CMD,(USHORT) CMD_REL_SPEC);
Adapter->TXCounter--; //Adjust number of TX packets being processed
The receive process is checked as before. (test for FIFO & 0x8000)
2) When the send and receive process are currently not using any memory pages, all memory must still be available. (no memory leakage) This is tested with the MIR register. But now, MIR is only checked for less than MEM_PAGES minus 1 instead of less than MEM_PAGES. 'Minus 1', because one packet might already be on the way of being received and a page already allocated by the MMU. MEM_PAGES must still be defined in the Header file with '#define MEM_PAGES 4'.
Now the CheckforHang function should be changed to this one:
BOOLEAN LAN91C111_MiniportCheckforHang (NDIS_HANDLE AdapterContext) {
MINIPORT_ADAPTER *Adapter = (MINIPORT_ADAPTER*)AdapterContext;
USHORT FIFO, MIR;
BOOLEAN RetVal = FALSE;
NdisRawWritePortUshort(Adapter->IOBase + BANK_SELECT, 0);
NdisRawReadPortUshort(Adapter->IOBase + BANK0_MIR, &MIR);
NdisRawWritePortUshort(Adapter->IOBase + BANK_SELECT, 2);
NdisRawReadPortUshort(Adapter->IOBase + BANK2_FIFOS, &FIFO);
//check memory allocation for TX and RX process
if ((Adapter->TXCounter == 0) && (FIFO & 0x8000))
{
//If Nothing in TX and RX then all memory (or -1 page) should be available!
//(-1, because one packet might already be on the way of being received and a page already
allocated)
if ((MIR >> 8) < MEM_PAGES-1)
RetVal = TRUE; //Reset will occur!
}
return RetVal;
}
At first, I've applied all the SW related change mentioned by SMSC except the SMSC_AUTO_RELEASE definition.
And the testing result shows that's no difference after made those changes.
Later, I turned on the Auto-Release feature (By default the SMSC_AUTO_RELEASE is not defined in the driver )
That seems helps a lot. After I turned on Auto-Release feature , I seldom see the adapter reset happen while receiving data,though I still get lots of receive overrun.
Now the transfer speed is 614/350Bps (sending/receiving) when transfer a 30MB file to/from a netwwork share folder.
That is slightly better than our StrongARM+LAN91C96 system which speed is 593/296Bps (sending/receiving) when do the same thing.
And the testing result shows that's no difference after made those changes.
Later, I turned on the Auto-Release feature (By default the SMSC_AUTO_RELEASE is not defined in the driver )
That seems helps a lot. After I turned on Auto-Release feature , I seldom see the adapter reset happen while receiving data,though I still get lots of receive overrun.
Now the transfer speed is 614/350Bps (sending/receiving) when transfer a 30MB file to/from a netwwork share folder.
That is slightly better than our StrongARM+LAN91C96 system which speed is 593/296Bps (sending/receiving) when do the same thing.
At first, I've applied all the SW related change mentioned by SMSC except the SMSC_AUTO_RELEASE definition.
And the testing result shows that's no difference after made those changes.
Later, I turned on the Auto-Release feature (By default the SMSC_AUTO_RELEASE is not defined in the driver )
That seems helps a lot. After I turned on Auto-Release feature , I seldom see the adapter reset happen while receiving data,though I still get lots of receive overrun.
Now the transfer speed is 614/350Bps (sending/receiving) when transfer a 30MB file to/from a netwwork share folder.
That is slightly better than our StrongARM+LAN91C96 system which speed is 593/296Bps (sending/receiving) when do the same thing.
Post a Comment
And the testing result shows that's no difference after made those changes.
Later, I turned on the Auto-Release feature (By default the SMSC_AUTO_RELEASE is not defined in the driver )
That seems helps a lot. After I turned on Auto-Release feature , I seldom see the adapter reset happen while receiving data,though I still get lots of receive overrun.
Now the transfer speed is 614/350Bps (sending/receiving) when transfer a 30MB file to/from a netwwork share folder.
That is slightly better than our StrongARM+LAN91C96 system which speed is 593/296Bps (sending/receiving) when do the same thing.
<< Home