Well... obviously detecting 'is an ascii string' is easy... beyond that it gets a bit curly doesn't it?... If you knew the underlying encoding you could work out if it could be represented as ascii (like what we did for the unicode props in the properties) but without knowing that how do we go about it? I think I asked virtually the same question a few weeks back about the future unicode plans
My suggestion above was a little terse (I didn't have much time to post this weekend).
At the moment all strings in the engine are 'native strings' (sequences of bytes that are interpreted as being the native text encoding). Due to the 1-1 mapping between char and byte in the native encodings, all strings in the engine are also 'binary strings'. This duality works very well - until you want to manipulate text that is in a larger encoding than the native ones (i.e. one that takes more than 1 byte per char).
So, right now, 'is a native string' will always return true for values that convert to strings (which is all at the moment).
Moving forward, all strings in the engine will be replaced by an MCStringRef abstraction. This opaque type will be able to hold either a native/binary string or a unicode string. More abstractly, an MCStringRef represents a sequence of characters - there's no need (from the outside) to be concerned about the internal representation (or encoding).
At that point 'is a native string' might not return true, if the text contained within the string cannot be converted (losslessly) to the native encoding.
In fact, (in the future) a whole family of 'is a string' type operators would be useful:
- is a binary string - returns true if the string can convert to binary (i.e. is natively encoded) (and the value converts to a string)
- is a native string - returns true if the string can be encoded as native (and the value converts to a string)
- is a simple unicode string - returns true if the string can be encoded in unicode with no surrogate pairs (and the value converts to a string)
- is a unicode string - returns true if the value converts to a string
- is a string - returns true if the value converts to a string
So the above will probably cause more questions than it answers, but at least it's a start