UUIDs.. you know you want them

mwieder · Post by **mwieder** » Sun May 05, 2013 6:04 am

I have several XP vms around for various purposes. We keep one last Windows machine running because we still need it, but my main Windows box died about a month ago and I'm not inclined to resurrect it. We do need to support xp, but I don't know how far back beyond that makes sense.

LCMark · Post by **LCMark** » Tue May 07, 2013 11:18 am

Just to chip in here (Monte and I have already have some discussions about this - I'm sure we'll have more at the conference). I think it's very important to separate two things here - features that are required to make VCS work, and features that are required to help inter-stack references of objects.

I do feel that the latter (inter-stack references) would be better served by enabling any properties at the moment which take an object id (e.g. icon, behaviors, pixmaps etc.) to, instead, take a name and then (in the future) perhaps a partial object reference. e.g. button "foo" of card "bar", or button "foo" of stack "bar". If names were used in these instances rather than ids, it means the scripter is free to use whatever naming scheme he/she wishes and never has to deal with id's at all.

In regards to changing the id property then is this really needed? I see the value in using UUIDs within a VCS - it solves numerous issues that have come up and is an elegant solution - however, if this is only for VCS then surely making it metadata that is (perhaps engine-supported in some way) essentially an implementation detail of VCS would be a better way to go? After all, ideally, VCS would be as unobtrusive as possible - as it is with text-file based languages.

Also, from a usability point of view, we have id's at the moment and (while I have no particular love of them - see below) they are quite convenient and easy to remember when poking around in things (debugging and such). There's no way anyone (well, any person with an average memory at least!) could ever remember UUIDs in this instance, indeed even comparing two UUIDs visually as strings is not something you can do at a glance.

[ Btw, I should also point out here that I do have half an eye on the future here. With the refactoring comes the potential to have *proper* control references that aren't strings - i.e. an 'opaque type' that the engine can manage appropriately internally, and will 'do the right thing' when saved and loaded (I guess a Mac file alias would be the closest analogy I can think of at the moment) - this would provide a much better solution than UUIDs for things such as custom-controls needing to manage lists of controls, as well as being much more efficent. ]

Just my n cents

monte · Post by **monte** » Tue May 07, 2013 12:03 pm

I do feel that the latter (inter-stack references) would be better served by enabling any properties at the moment which take an object id (e.g. icon, behaviors, pixmaps etc.) to, instead, take a name and then (in the future) perhaps a partial object reference. e.g. button "foo" of card "bar", or button "foo" of stack "bar". If names were used in these instances rather than ids, it means the scripter is free to use whatever naming scheme he/she wishes and never has to deal with id's at all.

I agree that this would be a good move. My main question on it is if someone set the icon to an integer would you search for a name and then an id or would id have precedence? The other thing makes me wonder if any incomplete object reference should resolve this way... refer to button "X" and it might be on the same card but it could also be on a substack or another stack in memory... hmm... what fun we will have with spelling mistakes then... delete button "X" ... whoops... that just deleted some object on a stack in the IDE...

In regards to changing the id property then is this really needed? I see the value in using UUIDs within a VCS - it solves numerous issues that have come up and is an elegant solution - however, if this is only for VCS then surely making it metadata that is (perhaps engine-supported in some way) essentially an implementation detail of VCS would be a better way to go? After all, ideally, VCS would be as unobtrusive as possible - as it is with text-file based languages.

Unfortunately I think we need a relatively immutable object reference in our scripts. In particular library scripts that manage and use custom properties that are object references. Users might not have set a name for the control or might change the name later. ID is the only robust reference for these scripts to use. If the ID scheme can't handle a merge then it's then the burden of the VCS export/import script to resolve the references. However, VCS export/import must be as fast as possible otherwise it will reduce developer's productivity rather than increase it.

Also, from a usability point of view, we have id's at the moment and (while I have no particular love of them - see below) they are quite convenient and easy to remember when poking around in things (debugging and such). There's no way anyone (well, any person with an average memory at least!) could ever remember UUIDs in this instance, indeed even comparing two UUIDs visually as strings is not something you can do at a glance.

Hmm.... maybe this is a point for the low IDs that you get when you first start working on a stack but throw a few datagrids on there and see how quickly the numbers start climbing... pretty soon you are in numbers that aren't easy to visually diff..

[ Btw, I should also point out here that I do have half an eye on the future here. With the refactoring comes the potential to have *proper* control references that aren't strings - i.e. an 'opaque type' that the engine can manage appropriately internally, and will 'do the right thing' when saved and loaded (I guess a Mac file alias would be the closest analogy I can think of at the moment) - this would provide a much better solution than UUIDs for things such as custom-controls needing to manage lists of controls, as well as being much more efficent. ]

I guess I'd need to learn more about what you're planning to do. Right now this sounds like it would actually break any VCS export because a binary reference to a control will be hard to deal with in that context. Why not make your control references UUIDs and kill two birds... create a table mapping uuid to MCObject...

mwieder · Post by **mwieder** » Tue May 07, 2013 5:15 pm

I'm with Monte on this. There are other situations besides implementing a vcs where we can easily run into id conflicts, and making the transition to uuids neatly solves these problems. Examples are icon id conflicts (namespaces would be nice, but I realize that's a ways off yet), the ever-present "a stack by that name is already in memory" problem, etc. I think using the short name form for an object is a nice shorthand description when it works, but can be prone to hard-to-debug problems when it doesn't.

But yes, those areas that current *require* an id (the icon of a button, etc) would be well served by allowing a more flexible means of addressing an object.

SirWobbyTheFirst · Post by **SirWobbyTheFirst** » Wed May 08, 2013 9:37 pm

I think Mark Smith had a UUID generation library on his website at the top of the table, could you not utilize this or is this for the main C++ engine? I think changing to UUIDs would be fantastic, additionally having the ID be locked to that specific stack so that it couldn't be changed to something else, would be beneficial to me in particular because it could be used for my Xenon project, where a main stack is linked to a process.

Currently I use a combination of a local array in the kernel stack along with the NameChanged message and whilst it works, I'm always open to other ways to tie a stack to a process object in my project and I think a UUID attached to a stack would be hella useful.

But yeah, if you want here is Mark's website, his library LibUUID is listed at the top if its useful.

http://marksmith.on-rev.com/revstuff/

EDIT: Forgot the URL. Oi vey.

mwieder · Post by **mwieder** » Wed May 08, 2013 10:27 pm

@Mike - yes, Mark Smith's library works fine at a higher level, this is for building in at the engine level, both for speed and for having the engine use uuids natively as object IDs. The uuids do need to be modifiable, but for any usage except very specialized use (possibly in a version control system, probably when a stack is cloned) I don't see uuids changing once they're assigned.

SirWobbyTheFirst · Post by **SirWobbyTheFirst** » Wed May 08, 2013 10:30 pm

Hmm, I thought it would be at the engine level because, Mark's libraries are usually the first thing to be brought up when talking about this kinda stuff.

geoffcanyon · Post by **geoffcanyon** » Tue Sep 10, 2013 6:19 pm

I found (ha, apparently I can't post a url, so take the spaces out of this) http : // preshing . com/20110504/hash-collision-probabilities which simplifies the math regarding collisions to:
probability of collision ~= (number of values chosen)^2 / (2 * number of possible values)

2 * 2^128 ~= 7 * 10^39

Given the severity of the potential issues associated with an id collision, let's aim for a 1 in a billion probability, which is less than the likelihood you'll die by shark, but some people do die that way...

1:10^9 leaves 7 * 10^30

The number of ids chosen is squared, so that's about 2.6 * 10^15 ids

BUT the situation is actually better than that. The collision calculation assumes all previous ids are maintained -- no one is going to have a stack with 10^15 controls.

So really we're talking more along the lines of, given that a stack might have as many as 100,000 controls (purposefully going big) at any one point in time, how many times do we have to delete one control id and add another to make a collision likely?

Taking some really bad short cuts, I'm estimating that we're up in the range 10^21..10^24. In other words, never never never going to happen.

monte · Post by **monte** » Tue Sep 10, 2013 10:25 pm

Yeah... I think there's a uuid for very grain of sand on the planet or something... there isn't 128 random bits in a uuid though... off my head it's 122 but still a really big number. It's worthwhile applying your math to a 64 bit random number too because I've come to the conclusion that would be sufficient.

The max ID issue could probably be solved easier with some kind of recycling of unused IDs once you reach the max stack id. The engine would start back at 1 and use the first unused ID.

It's the fact that a sequential ID makes parallel development and merging extremely complicated. While I've got lcVCS to the point where it can be used on quite complicated projects (eg Clarify) we have to follow some fairly strict rules that some developers might find unreasonable. A couple of examples:
- you can't set the icon to an id in script
- every custom property set that stores an object reference must have a lcVCS plugin to deal with it
- no circular object references between stack files (stack A can't have an object reference to an object in stack B if stack B has an object reference to an object in stack A)
- all object references must be found in the repo or otherwise loaded into every IDE before import/export

There's more rules... all because of this ID issue!

mwieder · Post by **mwieder** » Wed Sep 11, 2013 12:06 am

I really don't like the idea of sequential ids. Recycling them means the engine has to keep track of the recycled ones in a separate pool. There's nothing magic of having ids be sequential: after object creation, as far as the engine is concerned ids are just opaque unique numbers. If we decoupled the creation of object ids from the idea of being sequential then we could solve id conflicts once and for all. And uuids give us a big sandbox to play in.

monte · Post by **monte** » Wed Sep 11, 2013 1:54 am

yeah, I was just saying that the max ID issue probably has a much easier solution so if there's no hope for random large IDs then gracefully handling maxing out IDs seems like a good idea. It's just a case of checking if the stack ID is at the max then if it is parsing the objects in the stack to find an unused number. Certainly not my preferred solution but not complicated to implement.

@runrevmark's concerns about the visibility of the change to users aside (I can't really understand why that's a huge issue) there's two potential issues I can think of with UUIDs.
- saving as a legacy file format - maybe a stack needs a useUUIDs property and if set to true setting it back and saving as legacy isn't supported
- slower object reference parsing - not sure how much of an issue this would be but it's obviously faster to parse an integer than it is to parse a uuid.

I gather support for version control is not seen by RunRev to be part of their core business so the current situation is not understood as a serious problem. I disagree on both points. I suspect there's people dismissing LiveCode as an option every day because of this issue.

LiveCode Forums

UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them

Re: UUIDs.. you know you want them