the number of items

mwieder · Post by **mwieder** » Sun Aug 11, 2013 1:26 am

@Monte- I checked and "countlines" does indeed do the same thing, and the same fix applies.
Counting words does something similar but a bit different, and doesn't look like it needs fixing.

So I need a good name for the property (and it's a good point that it should be a stack property):
ignoreEmptyLastItem won't work since this would apply to lines as well as items.
ignoreLastDelimiter doesn't quite make it because it's only if the last delimiter is the last character in a string or list.
I'm fond of everyDelimiterCounts, where the default is false or empty for the current behavior.

malte · Post by **malte** » Sun Aug 11, 2013 8:26 am

legathyChunkBehavior ?

monte · Post by **monte** » Sun Aug 11, 2013 8:36 am

Hmm... That makes me think of hcaddressing. What about hcchunkbehavior?

malte · Post by **malte** » Sun Aug 11, 2013 9:14 am

Sounds good to me. And thinking about it, it can default to false (make the new behaviour the default), as long as this is documented as an important change in the releasenotes. Thus newcomers will learn the (IMHO correct) behavior, while oldhands can just continue using what they are used to.

Klaus · Post by **Klaus** » Sun Aug 11, 2013 12:15 pm

Hi friends,

I haven an old PDF from 2007 with some "drafts" and "thoughts" for (a possible) Rev Next Generation Core

Here I found this interesting paragrah, looks like this problem has been discussed a TAD earlier before:
...
11?!? I’m sure I only had 10 items yesterday...
Introducing sequences and real lists
...
–To solve this problem we introduce ‘real lists’...
–a value that represents a sequence of values – but not as a string!
–i.e. you can do tValue1, tValue2, tValue3, ... with impunity – never having to worry about itemDelimiters ever again!
...

Sounds very good indeed will fit exactly the problems discussed here!
Maybe this is something worth to re-consider?
But maybe this idea has already been abandoned long ago?

Before you ask, no there are no description on how this should be done in the engine ero something, just the drafts!
This is "CONFIDENTIAL MATERIAL", so I can not put it on my server for you to download before the mothership agrees.

Best

Klaus

jacque · Post by **jacque** » Sun Aug 11, 2013 7:25 pm

Putting on my wrathful hat here:

I'd hardly call backward compatibility "superstitious". In 26 years I have never needed to adjust for trailing delimiters and so most of my scripts don't bother. Sometimes they get data from places I have no control over, and which include trailing delimiters (like the one I"m working on now.) In my current project I get data back from a server that includes a trailing delimiter. My script needs to branch depending on the number of items in the string. Right now it works. I don't want to have to fix it, and I don't want to have to ask the server person to change their scripts as well. Sometimes you're using libraries where you can't even see what's in there. It can be anything. The point is not that people shouldn't write scripts that way, the point is that there are already lots of cases where there ARE scripts like that and we have to work with them.

Consider tabstops in a display field. If there's a trailing delimiter, it isn't an issue right now:

set the tabstops of fld 1 to "20,100, 200,"

I'd get 3 columns and the last would repeat. If the behavior changes, the grid is truncated at 3 visible columns and one hidden column.

Consider a script that gets data from some other source and and does math operations on it:

repeat with x = 1 to the number of items in "1,2,3,4,"
put 100/x
end repeat

"Divide by zero" error.

There are countless examples of what will break if the current behavior changes. I'm pretty sure that the HC team chose the current behavior in order to reflect the behavior of word processors of the day, where the carriage return is attached to the paragraph it defines. If you triple-click a paragraph, the trailing return is also selected. Styled text info for a paragraph is stored in its paragraph mark, and copying the paragraph marker allows you to paste the style into a different paragraph. If there was any sin in their choice, it was perhaps in naming the property "delimiter" rather than "terminator". I suspect if it were named "itemTerminator" no one would ever have been confused.

I do not think we should blithely assume that since some people always remove trailing delimiters, that everyone does. We're talking years of expected behavior here. I'd suggest leaving everything as it is now, and introducing a new toggle switch for different behavior. The default should not change; anyone who wants empty items at the end of strings can set the new property.

I'd also suggest making the scope of the new property act like itemDelimiter does now -- local to the running handler -- since it's a parallel concept. But I don't care so much about this part. Just don't break 26 years of scripts.

</wrath>

mwieder · Post by **mwieder** » Sun Aug 11, 2013 8:03 pm

@Jacque- thanks. That's just the sort of feedback I've been looking for. Your concerns are good points, but let me point out that

set the tabstops of fld 1 to "20,100, 200,"

Nope. That still works fine.

repeat with x = 1 to the number of items in "1,2,3,4,"
put 100/x
end repeat

"Divide by zero" error.

Um... actually, no. I'm guessing you meant to type something else there, but I don't know what.
Update: ah... I think you meant "put 100/item x of "1,2,3,4,"

The "superstitious" part is not backward compatibility but believing *something* will break without any hard evidence. You're falling into that as well with "libraries where you can't even see what's in there"... that doesn't confirm that the libraries will break.

But your point about branching on the number of items is a good one. I agree that the current behavior should be the default for existing stacks, and I have to reluctantly agree that it should be that way for new stacks as well, because otherwise the documentation gets into an awkward situation. But with a new stack property, which could be applied to the templateStack on creating a new stack, I think there's no conflict.

LCMark · Post by **LCMark** » Mon Aug 12, 2013 8:34 pm

Ah - this issue

A lot of people seem to think of the current behavior as a 'quirk' of the language, and indeed I did too up until a few months ago where I (finally?) had some time to review it (as a result of a bug report, although I think there have been others before it - http://quality.runrev.com/show_bug.cgi?id=10727).

First of all, it is important to bound what we are talking about. The 'ignoring last empty element' of a string list issue only comes up when dealing with string lists that could contain an empty element - it has no effect (for obvious reasons) if the lists you are manipulating never contain empty elements... So, in what follows, when I say 'nullable string list' I mean 'string list which could contain empty elements'.

It is my present opinion that the current behavior of the engine when manipulating (nullable) string lists is not a bug nor it is not a quirk; it is simply how things have to be if one wants a logical and consistent semantic of working with them. Indeed, I really do think he HyperCard engineers thought very carefully about this over 26 years ago, and it is why they chose the semantic they did.

The issue comes, very simply, down to this. If you want to be able to represent a nullable string list of any number of empty items from 0 upwards then the trailing delimiter has to be ignored. i.e.

Code: Select all

   put the number of items of "" -- is 0
   put the number of items of "," -- is 1
   put the number of items of ",," -- is 2

If you do not have the 'ignore the trailing delimiter' semantic then you have two choices: either you can represent lists of empty items of length 1 and upwards (option A):

Code: Select all

  put the number of items of "" -- is 1
  put the number of items of "," -- is 2
  put the number of items of ",," -- is 3

Or you can't represent lists of empty items of length 1 (option B):

Code: Select all

  put the number of items of "" -- is 0 (as empty should be empty and mean the empty list which means no elements)
  put the number of items of "," -- is 2
  put the number of items of ",," -- is 3

So, what's the problem with these two options...

Well, option (A) breaks the universal idea that 'empty' means nothing and thus auto-converts to the appropriate 'nothing' for the given data type that is requested (i.e. as a number it becomes 0, as a string it becomes "", and in the refactored engine - in a 'strictMode' - as an array it will become the array with no elements). Having this universal idea of empty (rather than a typed one) is, to my mind at least, an exceptionally useful part of the language and works very well with the typeless/contextually-typed nature of LiveCode and is something that I personally feel is well worth preserving.

Option (B) is, as far as I can see, unworkable - not being able to represent the list of one empty element creates a huge discontinuity that would require explicit script to work-around at every point you might want to deal with such a thing.

So, whilst it might be very well to propose ditching (or making optional) the 'ignore the trailing delimiter' semantic, the consequences of it have significant side effects which, based on my analysis, result in a worse situation code-wise than is proposed we have at the moment.

When it comes down to it, with the current semantics (trailing empty elements being ignored) it essentially means that everyone and everything must observe two very simple rules:

If you are processing a string list that could contain empty elements then you must preserve the trailing delimiter.
If you are producing a string list which may contain empty elements then you must make sure you have a trailing delimiter.

Therefore I think it is far more likely that the issue / quirkiness people perceive with the current rule is entirely down to either: these two rules not being observed correctly in some places in the engine and externals; or it not being articulated in the documentation etc. that these two rules are important to observe (if you are dealing with string lists that could contain empty elements) and thus meaning there is code that people have written that does not observe them.

Whilst I'm generally always for 'giving people choice', in this scenario I think it would be a very unwise move - it really is something that has to be a global semantic. Making the rule optional will create a significant interoperability problem - libraries and code that are written to except (nullable) string lists with the current rule will require explicit checks and extra code to make them work with libraries and code that are written without the current rule. To give an example...

Let's say you have the following function in a library stack written with the current rule:

Code: Select all

  function mapList pList, pFunc
    local tMappedList
    repeat for each item tItem in pList
      dispatch function pFunc to me with tItem
      put the result & comma after tMappedList
    end repeat
    if char -1 of pList is not comma then
      delete the last char of pList
    end if
    return tMappedList
  end mapList

This function is simply a 'map' primitive, it applies pFunc to each item in the list, returning the result. It doesn't know whether pList could contain empty elements, so by rule (1) above it ensures it preserves the trailing delimiter.

Now, let's say a function is an object script elsewhere which is written without the current rule (here I'm assuming its a context local property):

Code: Select all

    function typeItem pValue
      if pValue is empty then
        return "empty"
      else if pValue is an integer then
        return "integer"
      else if pValue is a number then
        return "real"
      end if
      return "string"
    end typeItem

    on testMapList
      set ignoreTrailingDelimiter to false
      put "1,2.5,foo," into tList
      answer mapList(tList, "typeItem")
    end testMapList

So, if I were writing 'testMapList', I'd expect an answer dialog to pop up and give me "integer,real,string,empty". However, what I will get is "integer,real,string,". Why? Well, with the 'ignoreTrailingDelimiter' set to false, I'm giving it a list of 4 items the last of which is empty; however, because I'm passing the list to a function that ignores trailing delimiters then it only sees three items.

Of course it would be possible to argue that the above examples suggests that 'ignoreTrailingDelimiter' should be passed down the callee chain, and in this simple case that would make things work. However, if 'mapList' were a function which were doing something a great deal more complicated such as making and manipulating its own (nullable) string lists or calling other functions that do then it (or the functions it calls) would have to explicitly check for and conform to the current setting of ignoreTrailingDelimiter (or turn it off and on as appropriate). Therefore, ultimately, allowing a choice means that a heavy burden is put on anyone wanting to share their code (whether in libraries, snippets, custom controls etc.) - they'd have to take into account the context in which it is being called.

[ You could also argue that the engine 'should know' that tList is a list built under ignoreTrailingDelimiter == false and so should be able to adjust as appropriate, but it can't know because lists are just strings. ]

Given that not having the ignoreTrailingDelimiter rule means either one of option (A) or option (B) above (which have significant flaws of their own), I have to ask - is it really worth it? If everyone and everything follows the rules required by ignoring trailing delimiters then (as far as I can see) everything works well, you don't have to juggle things when passing lists between code that is written to two different standards, and you don't have to deal with the fact that either 'the number of items of empty is 1' (A), or that you can't represent a list of 1 empty item (B).

Ideally of course, one would not have to worry about the two rules above at all - nor the flaws (A) or (B) of choosing a different way to handle trailing delimiters - however, I do believe it is unavoidable if ones lists are always strings. Thus, the real solution (beyond keeping the status quo but ensuring complete consistency of application of the two rules above) is having real lists - i.e. ones that are a structured data type. These will be added to the refactored engine (although perhaps not in the initial release) and will slide naturally into the current semantics we have. When we have these, I don' think anyone will ever really have need to consider any delimiter issues - as it won't be a relevant concern. [ Note that I'm not proposing changing existing functionality by introducing this new idea of 'real lists', you will be able to use them if you wish, or just carry on how you do things now. ]

Now, of course, all that being said the above is entirely my analysis of the situation and I'm more than happy for someone to prove me wrong in any part of it. More specifically, as much as someone above referred to defaulting to 'superstition' and such to defend the status-quo, it seems to me that the suggestion of changing the existing behaviour hasn't really been backed up by any demonstrations of how it would improve the ease of coding beyond 'that rule doesn't make sense, it should change'. So, what we really need is real world code examples which would support the proposed introduction of an option to switch the current semantic on and off which we can then analyse - evaluate it in the two contexts and try to determine absolutely whether there is an advantage to implementing the proposed addition. However, I must say, that it is my gut feeling that any such example will either demonstrate an area where one of the two rules (required by the current 'ignore trailing delimiter' rule) are not being observed (whether that be in the engine/externals, or in the code itself), or a situation in which string lists will never be sufficient to perform the intended purpose of the code.

mwieder · Post by **mwieder** » Mon Aug 12, 2013 9:14 pm

@runrevmark- (coming up for air after the long read) Thanks. Much food for thought.
Given that you expect the number of items of "" to be zero, what about the number of items of "hello"? I would counter that there should be consistency between these two, and also that the answer in both cases should be 1, with the first being an empty item.

And after the lengthy discussion on the use-list, here's what I implemented:

1. *NO* change to the current functionality of itemDelimiter and lineDelimiter
2. New properties itemSeparator and lineSeparator that work as follows:

the number of items of "1,2,3" is 3
the number of items of "1,2,3," is 4
the number of items of "1,2,3, " is 4

i.e., consistency between whitespace / no whitespace as a final character.
This requires no stack or global property to set, just the replacement of itemDelimiter with itemSeparator.

Code: Select all

  set the itemDelimiter to comma
  put the number of items in "1,2,3," into field 1 -- gives 3
  -- while
  set the itemSeparator to comma
  put the number of items in "1,2,3," into field 1 -- gives 4

monte · Post by **monte** » Mon Aug 12, 2013 11:20 pm

@mwieder How do you know which of itemDelimiter and itemSeparator is counting the items?

@runrevmark Thanks for some perspective on the topic. I wonder if it would be worthwhile re-examining the bug reports on this from the perspective of resolving inconsistencies with the current implementation rather than changing the current implementation. The original post about this was something to do with the revdb I think.

mwieder · Post by **mwieder** » Tue Aug 13, 2013 12:28 am

Setting the itemSeparator also sets the itemDelimiter.
Setting the itemDelimiter sets the itemSeparator to null.

The pseudocode for counting items is as follows (counting lines is similar)

-- edited for formatting --

Code: Select all

if tChar is itemDelimiter then
  if tChar is itemSeparator then
    count++
  else
    if tChar is not the end of the string then
      count++
    end if
end if

and both itemDelimiter and itemSeparator have the same scope, i.e., local to the current handler.

monte · Post by **monte** » Tue Aug 13, 2013 12:55 am

Hmm... It occurs to me that to use this solution you would need to be aware of the item counting issue and if you are aware of the item counting issue then I'd be very surprised if it weren't possible to handle the situation in your script. One simple solution is if there is code that you know might return an empty last item or line with no trailing delimiter then you just append a delimiter... you don't even need to care if the last item is empty. You just need to know that the function doesn't return a trailing delimiter... it occurs to me now that my practice of stripping trailing delimiters is a bad one although because I avoid lists like the plague I can't recall any code that returned empty items...

mwieder · Post by **mwieder** » Tue Aug 13, 2013 7:17 am

@Monte- if one is aware of the counting issue then it's certainly possible to deal with this in a script.

That's not the point - it's more that the situation with counting items and lines is confusing, especially to newcomers to the problem. But not only to newbies, because it requires thinking about the issue each time it comes up. And while I still consider the current behavior wrong, we're faced with almost 30 years of legacy code that has dealt with it. I'd love to change the way itemDelimiters work and the create a new property for the current situation, but I know that's not gonna fly.

I think what I've come up with here is a way to have your cake and eat it too -- nothing in current stacks will break, and you can keep on using itemDelimiters in exactly that way if you want the item count to behave the way it now does. But using itemSeparators instead will do something different (and IMO more correct) with counting, and you have that option as well.

The only possible conflict here is with using different itemDelimiters and itemSeparators in the same handler, and I can't think of a scenario where that even begins to make sense.

monte · Post by **monte** » Tue Aug 13, 2013 7:41 am

Hmm... having potentially two ways to count items in the same script seems more confusing to me. It might mean we end up having to scan long scripts to work out which way it's being counted. I liked the stack property idea but I'm leaning towards the concept that code that can't deal with empty last items being the broken thing rather than the counting... we should perhaps start ensuring trailing delimiters rather than the reverse. I'd rather that than every repeat for each item running the code on empty strings.

LCMark · Post by **LCMark** » Tue Aug 13, 2013 9:49 am

Given that you expect the number of items of "" to be zero, what about the number of items of "hello"? I would counter that there should be consistency between these two, and also that the answer in both cases should be 1, with the first being an empty item.

I do see where you are coming from to a certain degree, but the reality is because we are talking about strings representing everything in this context there is a limit to what can be represented (and choices to be made in doing so). In a scenario like this, I'd say it was better that the choices made should enable as much as possible to be represented which is the current situation.

Given that we are used to thinking of empty as nothing, it doesn't make sense to me personally that 'the number of items of empty' should be 1 - as a list of length 1 is *something* (even if that item is empty).

If you interpret 'the number of items of empty' as being 1 then you take away something you can represent now - the empty list (one of no items). This interpretation isn't even limited to string lists that could contain empty items either, it affects all string lists. It means that there is no way to tell a function which takes a list of items that you actually mean no items.

That's not the point - it's more that the situation with counting items and lines is confusing, especially to newcomers to the problem. But not only to newbies, because it requires thinking about the issue each time it comes up. And while I still consider the current behavior wrong, we're faced with almost 30 years of legacy code that has dealt with it. I'd love to change the way itemDelimiters work and the create a new property for the current situation, but I know that's not gonna fly.

I must confess that I have never had the same experience with having to 'think about the issue every time it comes up', but then I've always implicitly (without knowing about it explicitly) followed the rules above with my script that deals with nullable string lists. So this more suggests to me that it is easy to start thinking about manipulating string lists in LiveCode in the wrong way (or using primitives in the language / externals which aren't following the rules correctly) and thus getting tied up in knots about an issue which goes away if one changes point of view slightly. Thus this falls into a bugfix / documentation / education problem and not a semantic issue.

I think what I've come up with here is a way to have your cake and eat it too -- nothing in current stacks will break, and you can keep on using itemDelimiters in exactly that way if you want the item count to behave the way it now does. But using itemSeparators instead will do something different (and IMO more correct) with counting, and you have that option as well.

This doesn't allow people to have their cake and eat it - it still has the interoperability problem. Lists processed with one semantic are incompatible with lists processed with the other so when you write code that calls, or is to be called by, code you can't control it would be another thing you have to take into account that will require more code, more thought and more complexity.

You assert that not ignoring trailing delimiters is 'more correct' - but how so? As I said above, there was a choice to be made, the choice made was the one that makes the most out of the limited representational ability of strings. All making a different choice results in having to write code to handle the cases which that choice can't handle and thus is unlikely to make things any easier, it just rearranges the dust.

Since it's so easy to fix, the reason it hasn't been done is one of superstition: there's possibly something out there that might break if we change the code, but nobody's actually seen one.

You are starting out from the position that something is broken - when in reality nothing is broken. The point we are discussing comes down to a choice that had to be made in what (essentially) 'empty' means when being considered as a string list. The choice that was made was that 'empty' should mean the empty list (the list containing no items) and thus far, I see no evidence that making a different choice actually makes things any better in any way; thus those proposing the change are also guilty of superstition - the superstition of 'the grass will be greener on the other side'.

I think it is quite clear that a lot of code will break if we changed the semantic wholesale - jacque's divide by zero code illustrates that succinctly. All code that relies on the count of a number of items in a list will potentially break on input where there is a trailing delimiter (which, as pointed out above is required if you are dealing with string lists which could contain empty items). With this in mind the only option other is to allow choice; however I think this would actually make things worse - code written to two different delimiter standards will not be interoperable without code to bridge the gap. The latter, to me at least, seems like an exceptionally bad idea, it makes sharing code, libraries and components a great deal harder. (I'm pretty sure the example I gave above illustrates the interoperability problem quite well).

If you really think something is 'broken' and 'needs fixing' in some way then articulate why - present your thinking, logic, code and examples where the current situation is not tenable and demonstrates that something needs to be changed or added. If we are to add something such as you propose to the very heart of the language then there needs to be a substantial amount of evidence that it is needed, particular as I've said above it comes runs the risk of bifurcating the community and in doing so, making everyone's lives more difficult.

LiveCode Forums

the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items