metamerist

Wednesday, September 28, 2005

Parallel isdigit( )

This is a C/C++ topic, dear reader, so if that's not your thing, you may want to ditch now. (I really need to move to MoveableType or TypePad and/or Blogger needs to get category support).

I've been playing around with some parsing in my free time. I'm not sure what's going on, but I've found Microsoft's string-to-float conversions (atof, strtod, fscanf, etc) abysmally slow--at least on my system with my version of their C++ compiler.

In the process of investigation, the thought crossed my mind of creating my own implementation of isdigit( ). I decided to do a little more work than isdigit( ) with a function I titled CharToDec( ). It returns the values 0-9 for ASCII chars '0'-'9' and 0xFF otherwise.

The numeric value for ASCII '0' is 0x30, '1' is 0x31, '2' is 0x32 and so on.

Here's my approach...

1. Check (char AND 0xF0) for a match with 0x30. If no match, return 0xFF.
2. AND char with 0x0F. At this point, the possible values are 0-15.
3. Check ((char+6) AND 0xf0) to see if number is greater than 15. If so, return 0xFF.
4. Return char.

int CharToDec(int ch)
{
if ((ch & 0xf0)!=0x30) return 0xff;

ch &= 0x0f;

if ((ch+6) & 0xf0) return 0xff;

return ch;
}

That should seem like a roundabout way of doing it, since all one really needs to do is check if the value is in the range 0x30 - 0x39 and subtract 0x30. The thing is, this one can be easily parallelized to do multiple chars at once. For example, here's a version that can do 8 chars at a time and should work nicely on 64-bit systems. A 32-bit version should be straightforward. I'm not sure if I'll find much use for it, but it could come in handy parsing strictly formatted text files.

unsigned _int64 CharToDec8X(const char* sz)
{
unsigned _int64 u = *((unsigned _int64 *) sz);

if ((u & 0xf0f0f0f0f0f0f0f0L) != 0x3030303030303030L) return 0xffffffffffffffffL;

u &= 0x0f0f0f0f0f0f0f0fL;

if ((u + 0x0606060606060606L) & 0xf0f0f0f0f0f0f0f0L) return 0xffffffffffffffffL;

return u;
}

Standard disclaimers apply. No guarantees this code is good for anything. Also, I don't normally format my code as listed, but I am having trouble getting Blogger to indent properly.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home