Arrays vs Lists ?

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Arrays vs Lists ?

Postby Hans-Helmut » Sun Mar 19, 2017 8:12 pm

Currently, I am working on a little demo app trying to figure out the advantages or disadvantages of uses lists or array. WIth "lists" I mean simple structures blocks of text written to memory and accisable using simple chunk expressions ("item x of line z of myVariable"). Here I use a return-tab delimited list for storing text data.

I assume, in 90% of all cases, we are just working with text or numbers which are text but look like text.

I tried to make a test to compare small and increasingly larger arrays/lists
I am working with LiveCode latest version 9.0 (dp 6) on Windows 10, 64 bit.

Here is the testing script:

Code: Select all
global gTableArray, gListArray
local sTestResult

on mouseUp
   lock screen
   set cursor to busy
   testArrayVizList
end mouseUp

command testArrayVizList
   put empty into msg
   put empty into sTestResult
   set the itemdelimiter to tab
   put "50,500,5000,50000,500000,1000000,2000000" into tTestIterations
   put replaceText (tTestIterations , "," , tab ) into tTestIterations
   repeat with i = 1 to the number of items of tTestIterations
      put item i of tTestIterations into tIteration
      put tIteration &tab after sTestResult
      put buildArray ( tIteration ) &tab after sTestResult
      put buildList ( tIteration ) after sTestResult
      put cr after sTestResult
   end repeat
   delete last char of sTestResult
   put sTestResult into msg
end testArrayVizList

function buildList pIteration
   put the milliseconds into startSec
   put empty into gListArray
   put "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z" into tCol
   put the number of items of tCol*1 into tNum
   put 0+0 into i /* Init local variable i */
   repeat pIteration
      add 1 to i
      put item  random ( tNum ) of tCol into tChar
      put i &","& tChar &CR after gListArray
   end repeat
   delete last char of gListArray
   return ( the milliseconds - startSec ) / 1000
end buildList

function buildArray pIteration
   put the milliseconds into startSec
   put empty into gTableArray
   put "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z" into tCol
   put the number of items of tCol*1 into tNum
   put 0+0 into i
   repeat  pIteration
      add 1 to i
      put item  random ( tNum ) of tCol into tChar
      put tChar into gTableArray [ i ] [ "col" ]
   end repeat
   return ( the milliseconds - startSec ) / 1000
end buildArray


I used globals for testing reasons since I can access the global variables or the global array from anywhere, also from the message box. (Usually I would use script variables.)

Interesting enough, when smaller amounts of data are used, the list performs better than the array. With large amounts of data, the situation changes. In my case, 500,000 repeats are the border line. From there building a list takes more and more time, while the array starts to perform better.

Here are the results in seconds (in fractions of seconds as milliseconds)

Iterations Arrays and Lists.JPG
Iterations Arrays and Lists.JPG (18.2 KiB) Viewed 252 times


There is a problem: Building lists or arrays with something like 3-5 million repeats in this specific test case, LiveCode becomes non-responsive. On Windows it will show a turning icon. I thought it would regain focus, having the application run for several hours. But then I had to force-quit it. I found this situation in many cases when large amounts of data are loaded to memory. Is this a bug? Or is this something one should know beforehand?

Livecode has a pretty heavy system load. It uses 500 MB of memory, compared to Microsoft Excel with 24 MB, and Google Chrome with 130 MB. I was also wondering about this.

Questions:

Assumed that only text data is used (no images, no binary files to be stored), would simple lists not be better than using arrays? Or is this a matter of personal preference (style/choice) ?
Hans-Helmut
 
Posts: 35
Joined: Sat Jan 14, 2017 6:44 pm

Re: Arrays vs Lists ?

Postby SparkOut » Sun Mar 19, 2017 10:06 pm

I can't give you any advice about the array versus list choices- it is usually a matter of preference and expediency what to use. In the situation of a non-responsive repeat loop through, within the loop include a line
Code: Select all
wait 0 milliseconds with messages
This lets the engine grab a moment of as near zero duration as possible to allow it to react to gui events, redraw the screen and perform general housekeeping.
SparkOut
 
Posts: 1563
Joined: Sun Sep 23, 2007 4:58 pm

Re: Arrays vs Lists ?

Postby monki » Mon Mar 20, 2017 1:59 am

I asked a very similar question a while back. Got about 3 pages of replies. You might want to take a look.
viewtopic.php?f=7&t=28176

One of the key take aways was the fact that arrays can create "dynamic variables". So, you could do something like:

Code: Select all
function alphabetizeFld aString
repeat for each trueWord tWord in aString
put tWord after sWordListArray[getFirstCharacterOf(tWord)]
end repeat
end alphabetizeFld


...and create an alphabetical list of all the words in a text field. You could create a bunch of variables to do this. But you might not need all of them, and it would create a lot of script variables to deal with weather they are used or not. This away is cleaner, is created when needed, an it doesn't really matter which letters will be used because the containers will be created as needed.

You can expand this idea to any number of ideas.
monki
 
Posts: 41
Joined: Tue Dec 13, 2011 10:56 pm

Re: Arrays vs Lists ?

Postby Hans-Helmut » Mon Mar 20, 2017 11:05 pm

Yes, interesting discussion you pointed to from last year. I have to expand the idea myself more. Arrays look more elegant somehow, but are not that obvious as lists.

Thanks a lot!
Hans-Helmut
 
Posts: 35
Joined: Sat Jan 14, 2017 6:44 pm

Re: Arrays vs Lists ?

Postby monki » Tue Mar 21, 2017 12:00 am

Keys were a difficult thing to get my head around at first. But it's really simple. The array is a dataPair name/data, and the key is the name. So, when you "get the keys" of an array, your basically doing is...

put the name of dataPair & cr after tDataNames.

...only much more streamlined. :)

At that point you can do anything to tDataNames that you could do to any other container of lines: search, sort, filter, etc.

After that just run it back through the array, for example:


put keys of myArray into tDataNames
put filter tDataNames without tThingIDontWant into tDataNames
put myArray[tDataNames] into field "myFld"


or send an array to the message box so you can get a look at the contents
Code: Select all
command arrayToMessage aArray
   put the keys of aArray into sVar
   put "-- New Output --" into message box
   repeat for each line tKey in sVar
      put cr & tKey && ":" && aArray[tKey] after message box
   end repeat
end arrayToMessage

Neat :D
monki
 
Posts: 41
Joined: Tue Dec 13, 2011 10:56 pm

Re: Arrays vs Lists ?

Postby FourthWorld » Tue Mar 21, 2017 10:15 pm

Hans-Helmut wrote:There is a problem: Building lists or arrays with something like 3-5 million repeats in this specific test case, LiveCode becomes non-responsive. On Windows it will show a turning icon. I thought it would regain focus, having the application run for several hours. But then I had to force-quit it. I found this situation in many cases when large amounts of data are loaded to memory. Is this a bug? Or is this something one should know beforehand?

LiveCode will generally do what you ask of it: if you ask it to run in a tight loop, it'll run in a tight loop. It should not, however, trigger the Windows warning about being unresponsive. IMO that would be a bug. If you have a recipe for reproducing that you may consider filing a bug report:
http://quality.livecode.com/

Livecode has a pretty heavy system load. It uses 500 MB of memory, compared to Microsoft Excel with 24 MB, and Google Chrome with 130 MB. I was also wondering about this.

The core LiveCode engine takes very little RAM. For example, when run as a faceless process it can take as little as 6 MB. But as we add objects and data into its memory space, it can be larger roughly in proportion to what we're asking it to work on.

The standalones we make with LC often require far less memory than the LC IDE, because the IDE has a lot of stacks and data to maintain.

The IDE itself though, large as it is, doesn't account for such a large amount of RAM. 500 MBs is a lot; I've only seen that when working on unusually large amounts of data. On my Windows box the full LC IDE with just a couple modest stacks open takes less than 70 MB, and on Linux around 60 MB.

What sort of data were you working on at the time? And which version of LC and of Windows?

Assumed that only text data is used (no images, no binary files to be stored), would simple lists not be better than using arrays? Or is this a matter of personal preference (style/choice) ?

Both are very useful. Like a screwdriver and a wrench, they each have a place in your toolbox.

It may be helpful to think of "lists" as delimited strings, as it can help anticipate where they're efficient and where they're not.

When we use chunk expressions (the generic term for expressions that allow graceful traversal of strings, like "get word 2 of item 3 of line 4"), the engine needs to examine each character to find the delimiter in question, keeping count as it goes.

Chunk expressions make for wonderfully readable code, and are often quite efficient, certainly more so than counting delimiters and maintaining pointers on your own as we'd need to do in lower-level languages.

And for aggregate operations across an entire data set, they can often be faster. Using "repeat for each..." as you've done above lets the engine parse only one chunk each time through the loop, keeping track of where it left off, and often those are pretty darn fast.

Random access of specific elements within a data set, however, is where arrays can shine, sometimes offering an order-of-magnitude performance gain if the data set is large enough.

Arrays are essentially a collection of pointers to specific memory locations, in which each key is run through a simple hash to arrive at the address where the data resides. The hash used is a very simple one, so it tends to take few clock cycles, with the benefit being that once finished the engine knows exactly where to look for the contents associated with a given key.

Consider this chunk expression:
Code: Select all
get line 10000 of tSomeData

There the engine needs to traverse the entire data set, counting each CR as it goes, until it comes to the 10,000th one, and then it returns the last line found.

The consider this array expression:
Code: Select all
get tSomeDataA[10000]

There the key "10000" is hashed to obtain the address where the data associated with it is found, and it just grabs that and returns it.

So for any task that requires access to specific elements within the data, you'll find arrays are usually much faster. The amount of time needed to get a chunk from a delimited list is proportionate to the amount of the data preceding it, whereas time to access an array element is nearly constant regardless of how many elements are contained within it or the size of the data in those elements.

Which one is best will often depend on the nature of the task, and it considerations aren't limited to performance.

For example, array elements can hold any data, including data with tabs and returns, and even binary data. Strings, of course, need to contain only text, and you must remain mindful that a given segment doesn't contain any characters used to delimit it.

Strings are easy to conceptualize and work with - they can be easily displayed in a field any time you want to see what's in them. Since arrays are a collection of addresses, they have no single contiguous representation that allows us to see them as a whole in one readable form.

Strings are great for reading and writing to disk, easily interoperable with nearly any other program. If you use tab and CR, you can open your text in LibreCalc or Microsoft Excel as easily as dropping it into a LiveCode field.

Arrays can also be stored to disk, but first they need to be serialized, converted from the collection of memory addresses to a contiguous bytestream. In LC we do this with arrayEncode, and we can read the resulting binary data back into an array by converting it with arrayDecode. LC's encoded arrays use a unique format useful only in LC; they're structurally similar to MongoDB's BSON format, but not compatible with it. Great for exchanging data between LC processes, but not for exchanging data with other processes (though there are JSON translators available).

Both delimited strings and arrays are very powerful and useful. As you spend more time with LC, chances are you'll use both liberally.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/
FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
 
Posts: 5024
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles

Re: Arrays vs Lists ?

Postby AxWald » Wed Mar 22, 2017 1:16 pm

Hi,

arrays vs. lists - room for hour long discussions ;-)

I usually use lists - they are quite fast & easy to maintain. When there's a lot of data, I think of using arrays - as you showed there's a point when they become faster.

Special cases:
  • When having lists with many items where:
    1. a lot of different items need to be changed while processing,
    2. or when lines need to be added/ deleted,
    3. or when I'm not sure if the last item of a line might not be empty,
    I use arrays.
  • If I'd need to loop using "repeat with" (and have more than a few lines) I switch to arrays,
    and "repeat for each".
  • If I have a lot of code to write for the data, I switch to arrays: "myArray[23][42]" is less fingertip abrasive compared to "item 42 of line 23 of myLine". ;-))
  • In any case I often keep the original data untouched, write the changed data into a new variable, and replace it when done.

Extra special case:
Playing with really big data (in this thread) I found out that sometimes it's faster to build the replacement variable using a continuously appended file, while processing the data! Imagine - writing to a file can be faster than appending to a variable!

Btw., I had much fun re-reading the thread Monki linked above - there I learned to wrap my brain 'round arrays ;-) Since then I'm using them quite often, especially the form "myArray[23][42]" became a welcome tool.

Have fun!
Livecode programming until the cat hits the fan ...
AxWald
 
Posts: 250
Joined: Thu Mar 06, 2014 2:57 pm

Re: Arrays vs Lists ?

Postby monki » Wed Mar 22, 2017 9:39 pm

FourthWorld wrote:Random access of specific elements within a data set, however, is where arrays can shine, sometimes offering an order-of-magnitude performance gain if the data set is large enough.

Arrays are essentially a collection of pointers to specific memory locations, in which each key is run through a simple hash to arrive at the address where the data resides. The hash used is a very simple one, so it tends to take few clock cycles, with the benefit being that once finished the engine knows exactly where to look for the contents associated with a given key.


Really informative post, thanks. Something sort of related to this topic, I've been wondering about, is switch vs if then...else. I was wondering if the above is how the switch structure is able to jump directly to the matching case? I'm a bit late moving to the switch structure, but seems much faster when it can be used.
monki
 
Posts: 41
Joined: Tue Dec 13, 2011 10:56 pm

Re: Arrays vs Lists ?

Postby FourthWorld » Thu Mar 23, 2017 12:28 am

monki wrote:Really informative post, thanks.
Glad that was useful. Thanks for the feedback.

...is switch vs if then...else. I was wondering if the above is how the switch structure is able to jump directly to the matching case? I'm a bit late moving to the switch structure, but seems much faster when it can be used.

Conditionals are so quick to execute it can be difficult to measure differences between forms unless you take the time to carefully craft a good test case. IMO it may not even be worth the effort, as any difference would likely be minimal compared to the developer time using the one that best fits the logic you want to express.

That said, there is one thing about both forms of evaluation that may be helpful here: LC uses "lazy" evaluation, in that it will only evaluate as much as it needs until it determines that it doesn't need to do any more.

For example, if we have this:
Code: Select all
if ( "A" = "B"  AND "A" = "A") then
   DoSomething
else
  DoSomethingElse
end if

...only the first comparison ("A" = "B") will be evaluated. Once that's known to be false, it won't bother testing the second comparison because it can't make a difference in the outcome at that point.

Switch expressions are similar, evaluated in the order they're written in, and if using evaluation compound expressions in a case only as much of the evaluation as is true will be performed, otherwise it'll move on as soon as it can determine it'll no longer matter.

This can be helpful when deciding the order to use when arranging comparisons. If you put the least likely comparison first, it won't bother with anything beyond that so you'll save a few clock cycles.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/
FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
 
Posts: 5024
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles

Re: Arrays vs Lists ?

Postby monki » Thu Mar 23, 2017 1:38 am

FourthWorld wrote:This can be helpful when deciding the order to use when arranging comparisons. If you put the least likely comparison first, it won't bother with anything beyond that so you'll save a few clock cycles.


Ah, good to know. Thanks again :D
monki
 
Posts: 41
Joined: Tue Dec 13, 2011 10:56 pm


Return to Getting Started with LiveCode - Complete Beginners

Who is online

Users browsing this forum: No registered users and 3 guests