Sort multidimensional array

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: Sort multidimensional array

Post by bogs » Wed Mar 03, 2021 11:48 am

Thierry wrote:
Wed Mar 03, 2021 11:43 am
bogs wrote:
Wed Mar 03, 2021 11:25 am
I'm beginning to think that Craig is right, "DO" is the answer to everything ;)
DO NOTHING is the answer to everything ;)
I DO NOTHING way too much already :D
Image

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7237
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Sort multidimensional array

Post by jacque » Wed Mar 03, 2021 7:05 pm

the sort command IS NOT expecting a string
I meant the merge function expects a string. The dictionary shows examples in quotes.

But you're right that "do" is concise and workable in this case.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

stam
Posts: 2682
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Sort multidimensional array

Post by stam » Thu Mar 04, 2021 4:02 pm

Further to my previous conversation with Thierry, i implemented his newer sortIndex method and compared it with sortArray. I don't think there is a significant benefit in using the sortIndex approach.

For smaller arrays it makes no difference - both methods completed in 0 ms.
I tested with a much a larger array (> 20,000 records, corresponding to a 33 Mb TSV file). SortArray took about 200 ms to complete. SortIndex performed slightly better, but only about 60 ms faster (140 ms on average) -- both really fast so I suspect it would have to be a much more massive array to make a palpable difference. Plus sortIndex doesn't actually return a sorted array, which is frequently needed.

However in my testing i came across a new problem: It sorts fine by text and numerically but sorts incorrectly on dateTime -- i can't seem to get this to work as expected at all. Since dates tested were in UK format (dd/mm/yyyy) i set the useSystemDate to true but that made no difference. Neither did ensuring all dates are after 1970, as i believe that may sometimes be an issue.

To hilight the issue I've put both methods in a stack with some sample data (the sortArray method refers to 'do merge()' method above, sortArrayOld refers to the same algorithm but without do merge(), using conditionals; and sortIndex refers to Thierry's method) and you can set the parameters visually for ease.

Sorting is perfect on text/numeric, but incorrect on dateTime.
The sorting algorithms are already posted above so i've not repeated them here - they're in the stack script of the attached stack...

Grateful for help on date sorting...
Stam


-------- edited to include a screenshot actually showing the dateTime sort issue ----------------
Attachments
dateTime sorting issue.jpg
array sortring test.livecode.zip
(10.89 KiB) Downloaded 185 times
Last edited by stam on Fri Mar 05, 2021 12:51 am, edited 1 time in total.

stam
Posts: 2682
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Sort multidimensional array

Post by stam » Fri Mar 05, 2021 12:45 am

I should mention that this doesn't seem to be an issue with the algorithms/handlers above.

Even when using the original post by MaxV here (specific dateTime sort for multidimensional arrays) the same issue occurs - most of the dates are sorted correctly but there is at least one that is in the wrong position.

I'm thinking this may possibly be a bug?

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Sort multidimensional array

Post by FourthWorld » Fri Mar 05, 2021 1:09 am

Associative arrays have no sense of order by their nature.

The most common way to access them in a specific sequence is to obtain and sort the keys.

What is the goal here?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

stam
Posts: 2682
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Sort multidimensional array

Post by stam » Fri Mar 05, 2021 1:22 am

FourthWorld wrote:
Fri Mar 05, 2021 1:09 am
Associative arrays have no sense of order by their nature.

The most common way to access them in a specific sequence is to obtain and sort the keys.

What is the goal here?
Hi Richard -- i realise this is a loooooong thread to read through, so for convenience please see the handlers below.

My handler (returns a sorted array, either 1-dimensional or multidimensional):

Code: Select all

function sortArray @pArray, pDirection, pSortType, pKey
   local tNextIndex, tSortedArray, tSortText
   
   get the keys of pArray
   put "sort lines of it [[pDirection]] [[pSortType]] by pArray[each]" into tSortText //1-dimensional array
   if pKey is not empty then put "[pKey]" after tSortText //multidimensional array
   do merge(tSortText)
   repeat for each line tKey in it
      add 1 to tNextIndex
      put pArray[tKey] into tSortedArray[tNextIndex]
   end repeat
   return tSortedArray
end sortArray
Thierry's handler than returns a sorted index to reference the elements of the original array in a new sort order:

Code: Select all

function sortArray @pArray, pDirection, pSortType, pKey
   local tNextIndex, tSortText, tSortedIndex 
   get the keys of pArray
   put "sort lines of IT [[pDirection]] [[pSortType]] by pArray[each]" into tSortText //1-dimensional array
   if pKey is not empty then put "[pKey]" after tSortText //multidimensional array
   do merge(tSortText)
   repeat for each line tKey in IT
      add 1 to tNextIndex
      put tKey into tSortedIndex[tNextIndex]
   end repeat
   return tSortedIndex
end sortArray
both work fine when passing numeric or text as the sort type, and with very large arrays (> 20,000 rows) there is a slight speed advantage with Thierry's approach.

However: when passing dateTime as a parameter (or even using a simplified version of this , hard-coded for dateTime sorting), i get the wrong sequence back -- please see the screenshot in my post 2 entries above.

I can't explain this :-/

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Sort multidimensional array

Post by FourthWorld » Fri Mar 05, 2021 5:32 am

I took the liberty of modifying the stack script from the example you posted earlier, with two goals: run multiple iterations, and view the keys directly to discern the arrays true order.

Most isolated tasks like sorting are pretty fast in LC, often much faster than normal variance attributable to unrelated factors (CPU load from other processes, etc.). Here I set the number of iterations to 1000, which is where I usually start benchmarking non-GUI tasks.

At the end I display first the elapsed time, then the keys obtained directly from the array.

Here's what I get on my intentionally-aging system (custom Linux box I built some years ago and promised I wouldn't upgrade until I complete a performance-critical project; most of you will see shorter times):

Code: Select all

sortArray: 36 ms for 1000 iterations
8
6
2
3
5
1
4
7

Code: Select all

sortIndex: 36 ms for 1000 iterations
8
6
2
3
5
1
4
7
Unless I misunderstood something or just plain screwed up (it's the end of the day, yeah, that'll be my excuse), it would appear both routines take about the same amount of time and ultimately produce the same array, all the way down to the element order as expressed in the keys.

This is what I would expect, given that LC's associative arrays are the common hash type, and have no internal sense of order.

The key to all this (all puns intended) is considering how we try to understand array contents.

The dataGrid will sometimes sort by default, if the keys lend themselves to that. That may explain why the display of the array appeared in order, except for the non-numeric key.

Using the tree widget is my favorite way to view arrays (so simple to set up, and handles n-depth), but IIRC it always sorts keys for display, giving us only a choice of how to sort it, but not over whether it's sorted at all.

So to view the array order as closely to how the engine sees it as we can from the scripting interface, the keys are the most raw, unaltered representation.

Thierry's been doing this a while. When he recommends focusing on keys and not worrying about attempting to re-order the array, that's guidance worth heeding.

If you review the Wikipedia link I included above that'll help make it clear how associative arrays are implemented with no sense of order.

The implications of all this is that attempting to order the array internally doesn't matter, anyway. What matters is how we display the array in our interfaces. DataGrid and tree widget will sort for you, and you have a wide variety of ways to sort keys however you like in your own scripts. Once you have the keys in the order you want, the same mechanism that denies the array's in-memory representation from having any sense of order also makes it super-fast to access, so you'll usually find iterative operations accessing arrays from ordered lists of keys quite satisfying perfomancewise.

If you find yourself doing a lot of benchmarking, I have some tips on that here:
http://livecodejournal.com/tutorials/be ... vtalk.html

Here's the modified stack script:

Code: Select all

constant kIterations = 1000

global gDirection, gType, gMethod

on preOpenCard
   set the hilited of button "text" to true
   set the hilited of button "ascending" to true
   set the hilited of button "sortArray" to false
   set the hilited of button "sortArrayOld" to false
   set the hilited of button "sortIndex" to false
end preOpenCard


command doSort pMethod
   local tArray, t0, tType, tDirection, tSource, tIndexA, x
   
   put the milliseconds into t0
   put empty into field "sortedArray"
   set the useSystemDate to the hilited of button "useSystemDate"
   put the dgData of group "data" into tSource
   
   repeat kIterations
      
      switch pMethod
         case "sortArray"
            put sortArray(tSource, gDirection, gType, "data") into tArray
            # put "5th sorted element: " & tArray[5]["data"] & ", elapsed: " & the milliseconds - t0 && "ms" into field "metrics"
            break
            
         case "sortArrayOld"
            put sortArrayOld(tSource, gDirection, gType, "data") into tArray
            # put "5th sorted element: " & tArray[5]["data"] & ", elapsed: " & the milliseconds - t0 && "ms" into field "metrics"
            break
            
         case "sortIndex"
            put sortIndex(tSource, gDirection, gType, "data") into tIndexA
            # put "5th sorted element: " & tSource[tIndexA[5]]["data"] & ", elapsed: " & the milliseconds - t0 && "ms" into field "metrics"
            --            repeat for each element tElement in tIndexA
            --               add 1 to x
            --               put tSource[tIndexA[x]] into tArray[x]
            --            end repeat
            break
      end switch
      
   end repeat --iterations
   
   put the millisecs - t0 into tTime
   
   if pMethod = "sortIndex" then
      repeat for each element tElement in tIndexA
         add 1 to x
         put tSource[tIndexA[x]] into tArray[x]
      end repeat
   end if
   put pMethod&": "& tTime & " ms for "& kIterations&" iterations" &cr& the keys of tArray
   
   set the dgData of group "dataSorted" to tArray
   set the dgHilitedLines of group "dataSorted" to 5
   repeat for each element tElement in tArray
      put tElement["data"] & return after field "sortedArray"
   end repeat
end doSort
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

stam
Posts: 2682
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Sort multidimensional array

Post by stam » Fri Mar 05, 2021 9:19 am

Thanks Richard -- while i completely agree, that also misses the point of my question a bit (my fault - too much text!)

The question is about dateTime sorting -- this does not produce a correct result, only partially sorting the dates. The stack i posted was just a sampled example to illustrate this, rather than benchmark.


With respect to benchmarking -- i didn't post my benchmarking stack... i tested this by loading a 33 Mb TSV file with > 20,000 records as smaller arrays don't show a difference. I suspect the sortIndex approach will be very helpful if dealing with a large array of BLOBs for example, but in most 'normal' use cases there isn't much of a difference and it's sometimes simpler to just return a 'normal' array, but his will depend on the situation.
FourthWorld wrote:
Fri Mar 05, 2021 5:32 am
Thierry's been doing this a while. When he recommends focusing on keys and not worrying about attempting to re-order the array, that's guidance worth heeding.
Absolutely! And i'm grateful that both you and Thierry have taken time to respond to this :)

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Sort multidimensional array

Post by Thierry » Fri Mar 05, 2021 6:07 pm

stam wrote:
Fri Mar 05, 2021 9:19 am
I suspect the sortIndex approach will be very helpful if dealing with a large array of BLOBs for example,
but in most 'normal' use cases there isn't much of a difference
True, absolutely true !

Here is a benchmark for an array of 4609 entries,
full size of the json input data > 3 Mbytes.

First number on top-left are the average of the time for each iteration
followed by the time for each iteration.

22,1 against 19.5 !
So a gain > 10% in favor of the SortIndex.

screenshot 2021-03-05 à 17.56.39.jpg
benchmark
Have a nice week-end,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Sort multidimensional array

Post by FourthWorld » Fri Mar 05, 2021 6:28 pm

stam wrote:
Fri Mar 05, 2021 9:19 am
Thanks Richard -- while i completely agree, that also misses the point of my question a bit
Did it?

Let's see:
The question is about dateTime sorting -- this does not produce a correct result, only partially sorting the dates.
I took the time to review array fundamentals, with particular emphasis on arrays having no sense of order and not even being viewable, in hopes of raising awareness that whatever we use to display an array will also be the thing that gives it the appearance of order.

While this thread has grown in many interesting ways, the simplest answer was in the first post the whole time: option 2 there uses the DataGrid's SortDataByKey command. The user simply had a syntax error; had he included his code there this thread might be only two posts, his question and one answer.

When the DG is the thing we're using to display the data, we can see if its API anticipates our display needs, such as flexible sorting options, and we find that it does. This is the link included the first post in this thread, to a DG Lesson on this specific subject:
https://lessons.livecode.com/m/datagrid ... y-s-values

While its possible to replicate the DG's code to create and manage the outer index level of the array ourselves, some edge cases may remind us why using the built-in options for this can be useful beyond saving time. Sorting by date is one of them.

The useSystemFormat is a local property, so while it's useful for handling non-US dates in Thierry's method which orders the index, it has no effect on anything outside the handler it's used in, such as a separate custom sorting function. For that we have the "international" sort option.

You could modify your sample stack to support that, but even easier would be to just let the DG handle it with its SortDataByKey command. Trevor does thorough work, and the completeness of the DG's API satisfies this and a great many other needs robustly and efficiently.

While it never hurts to learn things, and this thread has grown into a treasure chest of useful tips, in production it's often best to build the vehicles of our dreams on wheels already written and debugged where we can, rather than reinvent each wheel from scratch ourselves.

When we remain aware that arrays have no sense of order and cannot be displayed, our attention is nudged to focus on the method we use to display the array's elements. And when the display method is a DataGrid, we'll usually be quite happy with the range of its functionality and the thoroughness of its implementation.

Bonus tip for DGs: using dgText rather than dgData is in many cases much simpler to work with, bypassing the need to create and order the outer-level index array, with the extra benefit of being a form that can be easily viewed in any field if we find that useful during debugging. And delimited chunks are also easily sorted. ;)
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: Sort multidimensional array

Post by Klaus » Fri Mar 05, 2021 6:36 pm

A little, but not too much "offtopic"...

Unfortunately "system date" etc. is not supported on ANDROID!
No idea why this seems so difficult to implement.
In any case we cannot sort a datagrid -> System Datetime

And I have to explain this embarrassing fact to users of the german LC forum at least once a week!
I already reported this back in 2014: https://quality.livecode.com/show_bug.cgi?id=11726

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Sort multidimensional array

Post by FourthWorld » Fri Mar 05, 2021 6:52 pm

Klaus wrote:
Fri Mar 05, 2021 6:36 pm
A little, but not too much "offtopic"...

Unfortunately "system date" etc. is not supported on ANDROID!
No idea why this seems so difficult to implement.
In any case we cannot sort a datagrid -> System Datetime

And I have to explain this embarrassing fact to users of the german LC forum at least once a week!
I already reported this back in 2014: https://quality.livecode.com/show_bug.cgi?id=11726
Do you know offhand if that also prevents the "international" sort option from working?

Curious that it's taken so long. We can guess it's not because Google has no support or internationalization; might be some difficulty translating the Java-based APIs to LC's C interface.

I suspect this is one of the areas where the company's work on LCFM will come home to LC: Filemaker has very excellent internationalization support, and without this in place so much of LCFM's value will be unrealizable. Just a guess, but I'd wager that gets fixed within the next LC release or two.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Klaus
Posts: 13829
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: Sort multidimensional array

Post by Klaus » Fri Mar 05, 2021 7:02 pm

Do you know offhand if that also prevents the "international" sort option from working?
No, sorry, I created two datagrids with one column and entered some text with and without umlauts.
Sort type in DG 1 = INTERNATIONAL
sort tpye in DG 2 = TEXT
and both gave me the same result, so I'm not sure, what INTERNATIIONAL actually is supposed to do.
Curious that it's taken so long. We can guess it's not because Google has no support or internationalization; might be some difficulty translating the Java-based APIs to LC's C interface.
I suspect this is one of the areas where the company's work on LCFM will come home to LC: Filemaker has very excellent internationalization support, and without this in place so much of LCFM's value will be unrealizable. Just a guess, but I'd wager that gets fixed within the next LC release or two.
Hm, after more than seven years I won't hold my breath, Richard! 8)

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Sort multidimensional array

Post by FourthWorld » Fri Mar 05, 2021 7:07 pm

Klaus wrote:
Fri Mar 05, 2021 7:02 pm
Hm, after more than seven years I won't hold my breath, Richard! 8)
Often true with many things, but seven years ago the company wasn't heavily invested in delivering a premier Android user experience to the large Filemaker audience accustomed to taking internationalization for granted.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

stam
Posts: 2682
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Sort multidimensional array

Post by stam » Fri Mar 05, 2021 7:26 pm

Hi Richard,
FourthWorld wrote:
Fri Mar 05, 2021 6:28 pm
stam wrote:
Fri Mar 05, 2021 9:19 am
Thanks Richard -- while i completely agree, that also misses the point of my question a bit
Did it?

Let's see:
Alas, that was not my question.
I fear the speaking of 'arrays' has clouded the issue. What doesn't work is sorting a textual list of dates.

The issue is that

Code: Select all

sort [{lines | items} of] <container> [<direction>] [<sortType>] [by <sortKey> ]
seems to fail on sortType of dateTime

in simple field entering a list of dates and using the command

Code: Select all

sort lines of field "field 1" ascending dateTime 
fails to sort correctly. before and after sorting ascending:

01/01/2021.............16/04/1977
03/08/2017.............25/09/1999
07/11/2011.............15/04/1988
16/04/1977.............07/11/2011
25/09/1999.............03/08/2017
15/04/1988.............01/01/2021

As you can see, this just doesn't work -- nothing to do with 'arrays'; but this is what's stopping me from re-ordering arrays by date.

I understand the arrays in LC are dictionaries with no inherent sense of order; but neither does LC provide an ordered structure, so just trying to impose order on chaos ;)
I'm fully in love with the data grid and appreciate it's functions - sort does work for dateTime there. And while the data grid is pretty amazing there is a speed penalty (i noticed this when tinkering with a bullseye for display of cardiac function - when associating with a data grid to store/display numeric values there was visible long delay in updating all the segments of the bullseye, but switching to a simple table field made this near-instantaneous).

Considering scenarios sorting (not just displaying) may be important:
I may want to process an ordered list until I reach a certain endpoint - this could be a date for example. Or perhaps a comparison of 2 arrays may be needed.
Without an ordered list, I'd have to process the entire dictionary (or 'associative array') searching for any date earlier than the endpoint.
Ordering the list in the desired order makes that quicker/easier.

But while this approach works just fine for numeric and text sort types, it fails for dateTime type because the sorting of lines in a list (not the array) fails.

So i'll have to disagree, it didn't address my question - which is why sorting a container specifically by dateTime fails, when it works fine for numeric/text, and if this is a bug?

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”