I recently wrote an algorithm that will extract (or atleast try to) X number of bits from a buffer. But, for some reason it doesn't seem to work all the time. For example, it can extract the bits the first time I run my algorithm, but not after that...I figure it's a problem with the way I am storing the bits in my buffer.

Just for your reference:
#define S8  signed char
#define S16 signed short
#define S32 signed int
#define U8  unsigned char
#define U16 unsigned short
#define U32 unsigned int
Now, here is my algorith expressed in C:
U32 U32SomeFunction(U16 U16Bits)
  U32 U32Return;

    S32 S32BitsNeeded = U16Bits - m_U32BitBufferSize;

    if(S32BitsNeeded > 0) /* Need to append more bits to buffer */
      m_U32BitBufferSize += 8; /* Increase buffer size */
      m_U32BitBuffer <<= 8; /* Shift buffer over one byte */
      m_U32BitBuffer |= U8GetByte(); /* append byte to buffer */
    else /* Have enough bits in buffer, return bits */
      U32Return = m_U32BitBuffer >> (m_U32BitBufferSize - U16Bits); /* Save bits for return */
      m_U32BitBufferSize -= U16Bits; /* Save new buffer size */
      m_U32BitBuffer << (32 - m_U32BitBufferSize); /* Get rid of old bits */
      m_U32BitBuffer >> (32 - m_U32BitBufferSize); /* Shift back right, padding 0's to the left */
      return U32Return;
Any help would be great.