Q: Writing Data to Files

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Locked
dhurtt
Posts: 42
Joined: Sat Nov 09, 2013 7:37 pm
Location: Huachuca City, AZ

Q: Writing Data to Files

Post by dhurtt » Sat Jan 11, 2014 11:40 pm

My application has several arrays of data and I would like to write it out to a file and read it back in when I open the application back up. I understand the basics of encoding arrays with arrayEncode(), reading and writing binary files, and putting up dialogs to prompt the user to pick a file or provide a file name. The problem is more about the file's structure for reading and writing. I want to read and write more than one array to a single file, then read the data back into their respective arrays.

Conceptually the idea is something like (not real Livecode):

Code: Select all

put arrayEncode(array1) into binarray1
write binarray1 to file outputFile
put arrayEncode(array2) into binarray2
write binarray2 to file outputFile at ... [end? some byte position?]
It seems like there needs to be a separator between the first encoded array and the second, or else I need to know where each array begins and ends in the file. I see three possible approaches:

1. Get the size in bytes of each array and structure the file as something like:

[integer: number of arrays in file][long integer: length of first encoded array][long integer: length of second encoded array]...[binary blob: start of first encoded array][binary blob: start of second encoded array]

So the writing handler would need to encode each array, calculate the byte length of each encoded array, and then write to the file: the number of arrays, the byte size of each array, then each array. The reading handler would then need to read in the same order, extracting out each blob and decoding it.

If I use this method how do I:

A. Ensure that I write a number out as an integer?
B. Determine the exact size of an encoded array?

2. Write each encoded array followed by a marker, which in turn can be found when reading. This seems problematic as any character you choose as a marker could theoretically be in the encoding of the array.

3. Put all of the arrays into a single array before encoding.

Code: Select all

put array1 into megaArray[1]
put array2 into megaArray[2]
...
put encodeArray(megaArray) into megaBinArray
The issue I (think I) see with this is when the arrays get big enough copying the data from each array to the mega-array takes time and memory. If it does not do a copy, however, this might be the cleanest way.

So, my question for the Wed Q&A is to see an example of something like this or have a discussion about the concept. I figure that there is some function, command, or keyword I am unaware of that makes this tedious exercise in other languages trivial in Livecode. :)

Regards,

Dale

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am
Location: Palo Alto

Re: Q: Writing Data to Files

Post by Simon » Sun Jan 12, 2014 12:57 am

Hi Dale,
I'm not sure how big your megaArray is but I did find

Code: Select all

put array1 into megaArray[1]
put array2 into megaArray[2]
 put arrayEncode(megaArray) into url("binfile:megaArray.dat")
to work just fine.

Code: Select all

 put arrayEncode(array1) into url("binfile:array1.dat")
Works just as well but of course you end up with multiple files.
Just to finish this off

Code: Select all

 put url("binfile:array1.dat") into myArray
         put arrayDecode(myArray) into array1
Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

SparkOut
Posts: 2839
Joined: Sun Sep 23, 2007 4:58 pm

Re: Q: Writing Data to Files

Post by SparkOut » Sun Jan 12, 2014 4:01 pm

Not to detract from the interesting consideration of the task as described, but you may also store the arrays as properties of a stack which you then save. The file size and convenience of being able to read and write from and to your storage stack mean that it is a very useful way of storing data between application runs.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9802
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Q: Writing Data to Files

Post by FourthWorld » Sun Jan 12, 2014 6:50 pm

dhurtt wrote:The issue I (think I) see with this is when the arrays get big enough copying the data from each array to the mega-array takes time and memory.
When reading an encoded array from disk, the time and memory required to run it through arrayDecode are similar to what it would take to reconstruct the array using other means, like parsing chunks into specific array keys. It's faster than those chunk expressions of course, but not magically so, as the final output array will be a memory-specific structure that requires parsing the contents of the encoded array and putting the appropriate values into hashed memory slots.

As long as the array is still small enough to be used in memory, the one-time hit of reading it off disk to begin a session and again storing it to disk at the end of the session is usually quick enough that arrayEncode/arrayDecode is a good option.

And as SparkOut noted, many times you can benefit from the robust file structure of LiveCode stack files, where it can in some cases be slightly faster to obtain a single array from a custom property than having everything in one master array.

But since any stackfile is read into memory in its entirety, the savings is only a modest one in terms of performance; the memory requirement will remain the same, or perhaps be slightly larger due to copying.

If the array is too large to be used in memory, there are methods for storing key-value pairs on disk and retrieving them in ways that are faster for that one task than even SQLite, and far more memory efficient. But they're very complex to write, and even the simplest of them will likely require far more disk space than alternatives.

So SQLite is a good option: if the data is too large to be handled in memory, using an existing database can make short work of that task. SQLite is reasonably efficient, and is public domain so there are no licensing restrictions of any kind for any use. SQLite is a great solution for storing large data sets, and easy enough to handle with LiveCode's included support for it.

There's a tutorial on using SQLite included in the Resources stack in the LiveCode IDE.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Q: Writing Data to Files

Post by jacque » Sun Jan 12, 2014 11:41 pm

You can also write each array to a file separately if you use the more manual way of opening and writing to a file instead of the url method. This would allow you to write to the file without needing enough memory for both arrays at once.

Code: Select all

open file "storage" for append
write arrayEncode(array1) to file "storage"
write null to file "storage"
write arrayEncode(array2) to file "storage"
close file "storage"
I'm guessing that nulls aren't ever included in arrayEncode, and are a reliable marker. To read it back, you can grab the entire content and separate it again (which would require enough RAM to accomodate both arrays) or you can use the read command to find the null and just get the half you need. To get the first half:

Code: Select all

open file "storage"
read from file "storage" until null -- puts the first half into the "it" variable
close file "storage"
If you need the second half, just do a repeat read. The second read will automatically start at the byte where the first one left off. The content of the second read will replace the first content in the "it" variable:

Code: Select all

open file "storage"
read from file "storage" until null -- puts the first half into the "it" variable
read from file "storage" until eof -- puts the second half into "it"
close file "storage"
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9802
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Q: Writing Data to Files

Post by FourthWorld » Mon Jan 13, 2014 1:12 am

jacque wrote:I'm guessing that nulls aren't ever included in arrayEncode, and are a reliable marker.
Unfortunately that's not the case. In the values returned from arrayEncode, NULLs are used as a delimiter between the key name and the value associated with that key. Also, if a key's value contains binary data then any single character risks being present there, making delimiter characters problematic in general.

You could instead store the length of the encoded array in a header for the file. If using decimal you'd want to pad it to 10 chars to make it a fixed length; alternatively you could use binaryEncode to create a four-byre expression of that number. That would give you the length of the data for the first array, and you could do the same after that for the second, and third. etc.:

<length1>
<array1Data>
<length2>
<array2Data>
...
<lengthN>
<arrayNData>

But having done that sort of thing before, it's a lot of work to keep straight. :) SQLite is much easier for larger-than-memory stores.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Q: Writing Data to Files

Post by jacque » Mon Jan 13, 2014 4:12 am

Ah. Okay. So much for that trick then. :-)
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

dhurtt
Posts: 42
Joined: Sat Nov 09, 2013 7:37 pm
Location: Huachuca City, AZ

Re: Q: Writing Data to Files

Post by dhurtt » Mon Jan 13, 2014 4:18 pm

SparkOut wrote:Not to detract from the interesting consideration of the task as described, but you may also store the arrays as properties of a stack which you then save. The file size and convenience of being able to read and write from and to your storage stack mean that it is a very useful way of storing data between application runs.
Except when dealing with a standalone app; you cannot "save a stack". You have to write out all properties to data files, and hence this question.

Which brings up an interesting point: if you deal in the world of standalone applications, you really need to approach the application design differently.

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Q: Writing Data to Files

Post by jacque » Mon Jan 13, 2014 5:09 pm

Right, but you can save a separate storage stack. I do that to save preferences frequently, for example. The more I think about it, the better I think Sparkout's idea is.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

dhurtt
Posts: 42
Joined: Sat Nov 09, 2013 7:37 pm
Location: Huachuca City, AZ

Re: Q: Writing Data to Files

Post by dhurtt » Mon Jan 13, 2014 5:22 pm

It turns out I did post this to the wrong forum; there is a hidden forum I was supposed to use for the idea2app program, not this one.

That said, this has been some good discussion, even if my original questions were not all answered. If I find the answers on the other forum, I will post them here. By the way, the original questions, buried in my message, were:

1. If I use the method of writing an array's length as an integer and then writing the array, how do I:

A. Ensure that I write a number out as an integer?
B. Determine the exact size of an encoded array?

2. Is there any reliable marker to use to make this method work? (This was answered in that \00 is not it and an encoded binary stream could theoretically have any character, so probably not.)

3. Do the Livecode statements in the original post copy data from array1 and array2 to megaArray, or are they simply pointers? (I had already tested that the method works, but I suppose I could test the theory by updating array1 and seeing if the data in megaArray is also updated, but that would lead to an implied answer. I was hoping for a definitive one.)

Thanks for the responses.

dhurtt
Posts: 42
Joined: Sat Nov 09, 2013 7:37 pm
Location: Huachuca City, AZ

Re: Q: Writing Data to Files

Post by dhurtt » Mon Jan 13, 2014 5:29 pm

jacque wrote:Right, but you can save a separate storage stack. I do that to save preferences frequently, for example. The more I think about it, the better I think Sparkout's idea is.
That brings up an interesting point, but unfortunately reveals my bias. I started in Hypercard, dropped out of that world (and the Macintosh world in general) in the mid-1990s, and have only recently come back to this post-Hypercard successor world. :) The one bad experience I had with Hypercard was corrupted stacks. Now I realize that encoding arrays into binary format results in a file not as readable as writing out the data as text, but it seems less susceptible to corruption than the old Hypercard stacks did. I have read references to stack corruption on these forums, so I am concerned that Sparkout's suggestion is less "safe". Is that a real concern, or do I need to let my old Hypercard bias go? (It was a great stack, you would have been angry too!)

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9802
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Q: Writing Data to Files

Post by FourthWorld » Mon Jan 13, 2014 7:21 pm

File corruption is possible with any binary format, even SQLite files and LiveCode stacks.

But corruption of LiveCode stack files is extremely rare, esp. compared to HC. This isn't to suggest that HC was inherently faulty, but just used a very different scheme, one much more complex:

HC pages card records from disk periodically, and for anything but very small stacks many cards would remain on disk. This paging in and out of memory is complex stuff, requiring a lot of very sophisticated code to keep cards intact. Sometimes modified card record would be written to its old location, and if there wasn't enough space it would be appended to the file, all the while updating the pointers that let HC know where to find it.

In contrast, LC uses a much simpler method: everything in a stack file is loaded into memory whenever a stack is accessed. When a stack is saved, the entire stack is written fresh to disk each time.

Like all things in computing, this involves a trade-off: why LC is inherently more robust because it does complete writes, it also requires more memory. Thankfully modern systems have enough RAM that it's usually only a problem for very large stack files (>~100MB).

In fact, most of the claims of stack file corruption I've seen with LC aren't true file format corruption. The stack is reported to have problems opening, but these are usually because of either attempting to open a newer format in an older version of LC, or attempting to open a password-protected stack in the Community edition, or a simple script error in a preOpenCard or preOpenStack handler.

Over the years I've been working with LC, I've only seen a handful of examples of true file corruption. It's possible, but very rare.

One thing to keep in mind when a file is reported as corrupted: following Unix traditions, before a stack file is saved the old copy is first renamed - same name, but preceded with "~". Only after the newly-saved copy has been confirmed as successfully saved is that temporary older copy deleted. So if anything happens during save, check that directory for a "~" copy - if it's there you'll have everything all the way up to your last successful save.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7215
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Q: Writing Data to Files

Post by jacque » Mon Jan 13, 2014 8:12 pm

In all the years I've been working with this engine I have never once had a corrupted stack. I have also never seen a real corrupted stack from anyone else . Even though sometimes people report them here, it's almost always for the reasons Richard explains. The file isn't bad, only the technique used to open it.

One reason you may see corruption reports here is because when a stack can't be opened by the engine for any reason, the dialog says "stack is corrupted". Only it virtually never is; that really just means the engine couldn't open it, which is most frequently due to incompatibilities in the file format when opening a new stack with an old engine. So yeah, HC experiences don't apply here.
1. If I use the method of writing an array's length as an integer and then writing the array, how do I:

A. Ensure that I write a number out as an integer?
B. Determine the exact size of an encoded array?
The engine is forgiving about whether a character is an integer or text and it's very rare that you need to worry about it. In this case, if you are writing it to a text file, then it's going to be text anyway. A script can add 10 to an integer or a text representation without any trouble, it will know what you mean. But if you really, really want to force an integer just add 0 to the number.

The size of an encoded array will be its length: put len(tEncodedArray)
2. Is there any reliable marker to use to make this method work? (This was answered in that \00 is not it and an encoded binary stream could theoretically have any character, so probably not.)
I have just learned that the answer is probably "no". And here I thought I was on to something...
3. Do the Livecode statements in the original post copy data from array1 and array2 to megaArray, or are they simply pointers? (I had already tested that the method works, but I suppose I could test the theory by updating array1 and seeing if the data in megaArray is also updated, but that would lead to an implied answer. I was hoping for a definitive one.)
MegaArray will contain a copy. You can pass parameters to handlers as pointers, but placing content into a new variable will copy it.
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am
Location: Palo Alto

Re: Q: Writing Data to Files

Post by Simon » Tue Jan 14, 2014 12:42 am

jacque wrote: ...the dialog says "stack is corrupted".
Yeah, how many times has someone opened a stack in a text editor then saved it?? :shock:
Too often.

Simon
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

Locked

Return to “Summer School 2013”