the number of items

mwieder · Post by **mwieder** » Tue Aug 13, 2013 5:46 pm

...leaving aside the issue of strings vs lists for the moment, because I think once that happens this discourse will change completely, but at the moment we don't have that to deal with...

Given that we are used to thinking of empty as nothing, it doesn't make sense to me personally that 'the number of items of empty' should be 1 - as a list of length 1 is *something* (even if that item is empty).

If you interpret 'the number of items of empty' as being 1 then you take away something you can represent now - the empty list (one of no items). This interpretation isn't even limited to string lists that could contain empty items either, it affects all string lists. It means that there is no way to tell a function which takes a list of items that you actually mean no items.

? How many items are in "1,,,2,,,3" ? Aren't there empty items in that list?

And having just tested this again, the number of items of "" is 0 in either case, so you're not losing that.
The difference in the two is the number of items in "," where using the itemDelimiter you get 1 and using the itemSeparator you get 2.
In either case you end up with at least one empty item.

Whether or not that's confusing is subjective, especially given the item list example above. My opinion (and this is just mine) is that it's actually less confusing to explicitly test "if tItem is empty" rather than relying on an implicit assumption about the repeat loop. And do notice that if you don't set the itemSeparator in your handler that processes the loop there's no change in current behavior.

With this in mind the only option other is to allow choice; however I think this would actually make things worse - code written to two different delimiter standards will not be interoperable without code to bridge the gap.

Sorry - I don't understand this at all. If this were the case, my local copy of the engine/IDE combination would be falling flat on its face, no?
Both the itemDelimiter and itemSeparator have the scope of the local handler and revert to their default states afterwards.
Thus, even though I have written handlers that rely on counting items using an itemSeparator of comma and can tell the difference between empty and non-empty items by code like

Code: Select all

if item x of tList is empty --

they have no problem coexisting.

Returning to the strings vs lists thing for a moment: are you implying that when we get real lists you intend to keep the current paradigm. i.e., the last item or line in a list can't be empty?

LCMark · Post by **LCMark** » Tue Aug 13, 2013 7:22 pm

@mwieder: Shall we do battle sir

And having just tested this again, the number of items of "" is 0 in either case, so you're not losing that.
The difference in the two is the number of items in "," where using the itemDelimiter you get 1 and using the itemSeparator you get 2.
In either case you end up with at least one empty item.

Okay, so I perhaps misunderstood the itemSeparator proposal slightly - it is essentially equivalent to option (B) in one of my previous posts. If the itemSeparator is set to the itemDelimiter, then:

You can represent the empty list, but cannot represent the list of 1 empty item
Lists with a trailing delimiter have one more item than they would with the itemSeparator set to empty

The first of these represents quite a big logical discontinuity as far as I can see - what if you have a function that at some point might need to manipulate string lists of length 0, 1 and 2 which could contain empty elements (in the context of using the itemSeparator)?

As I said above, it seems to me that option (B) is worse than option (A) from the point of logical consistency... (Perhaps that is just the lapsed mathematician in me - or I've missed something obvious somewhere...)

My opinion (and this is just mine) is that it's actually less confusing to explicitly test "if tItem is empty" rather than relying on an implicit assumption about the repeat loop.

I'm not sure I understand what you mean by 'an implicit assumption about the repeat loop' - could you elaborate?

And do notice that if you don't set the itemSeparator in your handler that processes the loop there's no change in current behavior.

I realize that - the direct effect of using the itemSeparator goes away outside of scripts that do not use it... However, the indirect effects do not...

Sorry - I don't understand this at all. If this were the case, my local copy of the engine/IDE combination would be falling flat on its face, no?

I think you misunderstand where I'm coming from... With an addition of syntax that only acts in a local handler context then it will 'coexist' with scripts that do not use it but only directly within that handler. The potential issue I perceive here is not with the direct effects that itemSeparator (or variant there of) has on code executing within a handler (which you control), but on interpretation of values that such handlers take in and return to scripts that you do not control.

Let's say you write a library that uses the itemSeparator (and takes in or returns values that are string lists that could contain empty items) then it is then necessary to declare as part of your libraries API that this is the case - why? Because the number of items that it will interpret as those lists having will be different from those that scripts that do not use the itemSeparator.

Take my example above - let's say someone writes a general handler for mapping lists. The person in question knows nothing of 'itemSeparator', so their handler is this:

Code: Select all

function mapList pList, pFunc
    local tMappedList
    repeat for each item tItem in pList
      dispatch function pFunc to me with tItem
      put the result & comma after tMappedList
    end repeat
    if char -1 of pList is not comma then
      delete the last char of tMappedList
    end if
    return tMappedList
  end mapList

Here this code is ignoring trailing delimiters so passing in "" will result in empty, "," will result in one item being returned and ",," will result in two.

Now, as part of larger handler someone else uses the mapList primitive in the library:

Code: Select all

function typeItem pItem
      if pValue is empty then
        return "empty"
      else if pValue is an integer then
        return "integer"
      else if pValue is a number then
        return "real"
      end if
      return "string"
end typeItem

command doSomethingWithLists
  set the itemSeparator to ","
  
  local tInputList
  -- lots of code that manipulates things and generates tInputList
  -- for the sake of argument, it turns out like this
  put "1,2.5,foo," into tInputList

  local tMappedInputList
  put mapList(tInputList, "typeItem") into tMappedInputList

  -- do things to tMappedInputList based on its length which is assumed to be 4
  if the number of items of tMappedInputList is not 4 then
    answer "something odd happened"
  end if
end command

In this case the code would fail because the interpretation of the number of items in doSomethingWithLists is different from that in mapList (mapList sees three items, doSomethingWithLists sees four). This is the interoperability problem I have been talking about. So, puzzled why their handler doesn't work, the author of doSomethingWithLists looks at the docs for mapList notices there is nothing about itemSeparator - so it makes them think that maybe that's the issue. Sure enough they go and change their code to fix the problem:

Code: Select all

   ...
   put comma after tInputList -- adjust input for a handler that doesn't use itemSeparator
   
   local tMappedInputList
   put mapList(tInputList, "typeItem") into tMappedInputList
   
   if the last char of tMappedInputList is comma then -- adjust output for a handler that doesn't use itemSeparator
     delete the last char of tMappedInputList
   end if
   ...

And find it now works.

This has required more code, and more thought to get to work then it would if there was a universal and immutable definition of how many items are in lists.

Of course, if mapList() checked the context of the caller and took the value of itemSeparator then that would potentially mitigate the above issue (there's a part of me that says tertiary effects would still be possible - but I haven't come up with a cogent example yet) but that is then essentially imposing a tax on those writing scripts that they want to share (whether it be libraries, custom controls or whatever).

Essentially, as far as I can see, adding syntax to make it so that you can change the interpretation of the number of items of a string list would make everybody's lives harder given the huge benefit everyone gains from being able to share code easily.

Now, I realize that interoperability is more than just at the script level - certain things require policies and conventions that the engine could not enforce - however, I do feel that as a language, LiveCode shouldn't actively encourage (through provision of core syntax) something that makes it harder.

Returning to the strings vs lists thing for a moment: are you implying that when we get real lists you intend to keep the current paradigm. i.e., the last item or line in a list can't be empty?

No - 'real lists' will essentially be equivalent to (and automatically convert between) one-dimensional numerically indexed arrays with lower index 1 and all keys up to the upper index - i.e. a vector. They'd be accessed with an 'element' chunk (e.g. element 1 of tList), and would be structured data types like arrays, so you can put lists in elements, arrays in elements and so forth. There would also be syntax to insert and manipulate lists. For example (note that none of this is final, just expression of the general idea):

Code: Select all

push tValue onto back of tList
pop tValue from front of tList into tTopItemOfList
insert tValue after element 3 of tList

Additionally, we could consider automatic conversion to strings (perhaps with an 'elementDelimiter' local property), but one that in a strict mode would tell you if you were trying to convert a list to a string that has no representation as a string (perhaps because one element is a an array) it would be an error.

i.e., the last item or line in a list can't be empty

Addendum: This point of view does propagate a fallacy in some ways: the last item or line in a list can be empty at the moment - if you are dealing with string lists that could contain empty items then they must always have the trailing delimiter, you can't leave it off. That's the point of the rules I was talking about above - it is the price you pay for being able to represent lists as strings with any number of empty elements from 0 upwards.

EDIT: I made a mistake in the 'typeItem' handler above - corrected.
EDIT: I made a mistake in the 'mapItems' handler above - corrected.

mwieder · Post by **mwieder** » Tue Aug 13, 2013 8:14 pm

@runrevmark-

Look, I follow what you're saying, but it doesn't work that way. I'm not just coming up with straw man arguments here, I'm testing this as we go along. The itemSeparator only has a *local* scope - it doesn't get passed on, so there are no secondary effects.

Code: Select all

on mouseUp
    local tList
    local x
    
    set the itemseparator to comma
    put "1,2,3," into tList -- no trailing space, last char is comma
    put the number of items in tList & cr after msg  -- returns 4
    put test(tList) into x
    put x after msg  -- returns 3 -- itemSeparator didn't get passed to function
end mouseUp

function test pList
    return the number of items in pList into tReturn
end test

My changes are pushed into my number_of_items branch. I'm all ears for arguments as to why this is a Bad Idea, but so far your list of conflicts don't pass the real-world tests. It works as expected, and there are no repercussions that I can see. Note that Jacque's repeat loop example, which is a good example of why the current behavior shouldn't be changed, will also fail currently if changed to

Code: Select all

put "1,2,,3,4" into tList
repeat with x = 1 to the number of items in tList
  put 100/ item x of tList  -- fail on the third item
end repeat

and so the same test for an empty item would need to be done to avoid the divide-by-zero error.

mwieder · Post by **mwieder** » Tue Aug 13, 2013 8:17 pm

push tValue onto back of tList
pop tValue from front of tList into tTopItemOfList
insert tValue after element 3 of tList

Ah! So we can have quacks! (combo queue + stack, FIFO and LIFO in the same object)

[-hh] · Post by **[-hh]** » Wed Aug 14, 2013 12:35 am

..........

mwieder · Post by **mwieder** » Wed Aug 14, 2013 3:32 am

Well, it's an interesting discussion, but I want to point out (and not being a mathematician I may be on shaky ground here)...

Remark 1 seems rather more a corollary of Def 6b than a remark, i.e., it naturally falls out if Def 6b is accepted as true.

in Def 4, a DELIMITER in LiveCode is currently a string of 1 character only (or 0 if you want to consider the empty delimiter case).
Thus, Example 3 for Proposition 1 as written is invalid, since it deals with a two-character delimiter string.

Def 5a describes the behavior of the proposed itemSeparator. In order to describe the current itemDelimiter behavior, it becomes:
== Def 5a. If D is a non-empty delimiter of S, then a (contiguous) substring T of S is called a 'TERM of S with respect to the delimiter D', if (D is not in T) and ( (T is S) or (T is in S directly following an occurence of D) or ((T is in S directly followed by an occurence of D) and (T is not the last character in S)) ).

Otherwise, thank you. This is a well-thought-out syntax description. Of course, natural language may or may not follow logical syntax, and a cognitive approach to the syntax may not conform to mathematical proof.

[-hh] · Post by **[-hh]** » Wed Aug 14, 2013 12:44 pm

..........

mwieder · Post by **mwieder** » Wed Aug 14, 2013 5:21 pm

[This must be a typo 'T is not last char of S', you mean 'S ends not with (T&D)'?]

Yes, thanks for catching that.

Why not? Let's have both chunk definitions, a new/different one and don't change the old one. This saves the life of billions of (hopefully) functional scripts ...

LOL.

jacque · Post by **jacque** » Wed Aug 14, 2013 6:49 pm

mwieder wrote:@runrevmark-

Look, I follow what you're saying, but it doesn't work that way. I'm not just coming up with straw man arguments here, I'm testing this as we go along. The itemSeparator only has a *local* scope - it doesn't get passed on, so there are no secondary effects.

Let's say you distribute a library that does some math on a list I send it:

Code: Select all

function incrementAndAverage pList
  set the itemSeparator to comma
  repeat with x = 1 to the number of items in pList
    add 10 to item x of pList
  end repeat
  return average(pList)
end incrementAndAverage

and I call it from my script:

Code: Select all

on mouseUp
  put "1,2,3,4," into tList
  get incrementAndAverage(tList)
  put it
end mouseUp

This is what runrevmark means by secondary effects. The result is that either the stack author has to figure out what's wrong and change their script, or your script has to figure out what system the list is using and branch. Either way, it makes things harder for one side or the other. In this case, I'd say it should be the library script that needs to adjust.

mwieder · Post by **mwieder** » Wed Aug 14, 2013 7:30 pm

@Jacque- fair point, but I think that's a matter of expectations.
What should happen if you pass an empty item in your script?
What should happen if the library routine receives an empty item?
Should the library routine average or ignore empty values, i.e., do they count as missing data or as zeros?

This is no different from the situation today if your code reads

Code: Select all

    function incrementAndAverage pList
      set the itemDelimiter to comma
      repeat with x = 1 to the number of items in pList
        add 10 to item x of pList
      end repeat
      return average(pList)
    end incrementAndAverage

    on mouseUp
      put "1,2,3,,4" into tList
      get incrementAndAverage(tList)
      put it
    end mouseUp

You are going to end up with different results depending on your expectations. I would agree that the library routine should be written to handle empty values because empty values happen and have always happened. But what the library should do with the empty values is up to the developer of the library and the library api should be documented to remove the ambiguities.

LCMark · Post by **LCMark** » Wed Aug 14, 2013 8:37 pm

@mwieder: Right, so it's about documenting what the handler does so it can be called correctly. So, if the incrementAndAverage handler Jacque defines above came with documentation then it would probably read something along the lines of:

Code: Select all

  -- Adds 10 to every item in the list and takes the average.
  -- (empty items are treated as zero).
  function incrementAndAverage pList

So, if someone else then uses the handler as Jacque does, they get a result which they weren't expecting (because incrementAndAverage() interprets the input list as having 5 items, and not 4 which is what the calling handler sees). Thus, this means the documentation of all handlers has to start to declare how lists are interpreted:

Code: Select all

  -- Adds 10 to every item in the list and takes the average.
  -- (empty items are treated as zero)
  -- (the trailing delimiter is not ignored)
  function incrementAndAverage pList

Thus the code in the mouseUp handler has to be adjusted to make it work:

Code: Select all

on mouseUp
  put "1,2,3,4," into tList
  if the last char of tList is the itemDelimiter then -- adjust because calling a handler that doesn't ignore the trailing delimiter
    delete the last char of tList
  end if
  get incrementAndAverage(tList)
  put it
end mouseUp

This adds another thing that a developer has to remember and be aware of - i.e. that if they are using other people's code which is manipulating lists then their handlers might interpret the number of items in a string differently from others.

The thing I am currently finding it difficult to see is why not ignoring the trailing delimiter is actually a better situation than we have now* (some examples would help here). i.e. where using the itemSeparator == itemDelimiter produces better (e.g. cleaner, more readable, more succinct) code than not using it. (Particularly as by using itemSeparator you can actually represent less than you could before - you can't represent the list of one empty item).

* Btw, by 'better situation than we have now', I really mean - the current situation assuming that all places in the engine and externals that manipulate lists follow the two rules I proposed as being necessary to ensure correct functionality with the present semantic of ignoring a final delimiter (I've not had a chance yet to check where any bugs in that regard might be lurking).

mwieder · Post by **mwieder** » Wed Aug 14, 2013 9:15 pm

OK - good point about needing to know that the trailing delimiter is/isn't ignored.

Code: Select all

  if the last char of tList is the itemDelimiter then
    delete the last char of tList
  end if

I do this as a matter of course anyway, but I realize that other developers may have different styles.

The thing I am currently finding it difficult to see is why not ignoring the trailing delimiter is actually a better situation than we have now* (some examples would help here). i.e. where using the itemSeparator == itemDelimiter produces better (e.g. cleaner, more readable, more succinct) code than not using it. (Particularly as by using itemSeparator you can actually represent less than you could before - you can't represent the list of one empty item).

I don't suggest that it's a "better" solution (well, yes, I do, but only in hindsight if I could go back 26 years and have itemDelimiter work differently). I do suggest that it's an alternative form that fits some solutions better, has more cognitive resonance, and doesn't have to be rehashed on the use-list once a year. I actually see itemDelimiter as the edge case and itemSeparator as the more common usage and would encourage this to be the case for future scripting, relegating itemDelimiter to those instances where you really do want to ignore trailing empty items.

Here's a (somewhat contrived) example... let's say I have a database-agnostic function that returns a list of field headers.

Code: Select all

function dbQuery pDbSQL
  put revDBDataFromQuery(dbID, pDbSQL) into tList
  repeat for each item tField in tList
    if tField is empty then
      put "empty" into tField
    end if
    put tField & tab after tHeaders
  end repeat
  delete the last char of tHeaders
  return tHeaders
end dbQuery

Assuming that I'm returned a list "h1,h2,,h4,h5," from the database, I will miss the fact that the last (empty) item in the list should be returned.
If I set the itemSeparator then I get the last item.

LCMark · Post by **LCMark** » Wed Aug 14, 2013 9:38 pm

@mwieder:

I actually see itemDelimiter as the edge case and itemSeparator as the more common usage and would encourage this to be the case for future scripting, relegating itemDelimiter to those instances where you really do want to ignore trailing empty items.

Even though you can't represent the list of one empty item with itemSeparator as posed? (I know I'm pressing this point quite strongly, but it really does seem quite important - it introduces a rather unpleasant discontinuity, to my eyes at least).

Here's a (somewhat contrived) example... let's say I have a database-agnostic function that returns a list of field headers.

Right, so you've given a query to the database in which you are expecting 6 fields (i.e. SELECT has 6 columns named) and it's returned 'h1,h2,,h4,h5," - if it does this then revDBDataFromQuery is buggy. That function is failing to respect the rule about a trailing delimiter being required if the list could contain empty items - the revDBDataFromQuery function *should* be returning "h1,h2,,h4,h5,,". If it does return what it should be doing then the handler works correctly - you get "h1,h2,empty,h4,h5,empty".

So, as far as I can see, in this instance the reason you'd use the itemSeparator is to work around a bug in the revDB external. Thus doesn't this just demonstrate something that needs to be fixed there, rather than a justification for adding itemSeparator?

jacque · Post by **jacque** » Wed Aug 14, 2013 9:55 pm

So, as far as I can see, in this instance the reason you'd use the itemSeparator is to work around a bug in the revDB external. Thus doesn't this just demonstrate something that needs to be fixed there, rather than a justification for adding itemSeparator?

It was, in fact, a problem with databases that caused the issue to come up again this time on the list. Maybe we could squelch this thing for good by fixing those bugs.

mwieder · Post by **mwieder** » Wed Aug 14, 2013 10:03 pm

@Jacque, runrevmark-

Jacque, as you pointed out earlier

In my current project I get data back from a server that includes a trailing delimiter. My script needs to branch depending on the number of items in the string.

So it's not just a thing that can be fixed in the db functions. Data can come from anywhere.

and runrevmark, I *did* say it was a contrived example. Ignore the internal functionality of revdbdatafromquery and substitute foo().

LiveCode Forums

the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items

Re: the number of items