repeated subtraction can be exponentiol in hte number of bits used; example:
bigNumber_N_bits / 1
needs 2 exp N subtractions. Each subtraction is again linear in N
So the result will be O(N*expN) (less than perfect)
shifting will be log in each output bit
with potential N output bits you get
O(NlogN) (better and likely optimal in O() notation)
As for the constants (practical programming), doing a full subtraction for each output bit is tough.
You might want to consider working only on a limited window of bits and keeping track of the operations you missed on the data outside of the window.
I don't know if it works at all (just an idea). It certainly won't spare you any of the O() operations,
but might localize data access a bit more and thus run faster. Window size should grow with log of bitnumber N, else the number of operations stored will be linear in N and you don't gain anything.
I hope this helps