Why DB's and external files?

dunbarx · Post by **dunbarx** » Thu Nov 17, 2022 3:20 pm

I am fairly narrow minded.

Very simply, what are the advantages of saving data to either an external file or a database, as opposed to storing that data inside LC itself, perhaps as a custom property or even in one or more fields.

Certainly if databases offer additional functionality in their own right, I get it. But since there is no size limit to the amount or type of data that LC can hold, why ever use an external file? Security of some sort? Speed? Some advantage in simply separating the two?

There must be very simple answers to this question that I just never appreciated.

Craig

RCozens · Post by **RCozens** » Thu Nov 17, 2022 7:05 pm

Hi Craig,

The short answer to your question is external databases (a) allow different applications (and/or users) to access shared data and (b) free programmers from the need to script common database functions such as storing/retrieving individual records, record locking, enforcing access privileges based on authority level, defining a common record structure (including field names, data types, & editing criteria) and (for relational DBs) navigating the data via multiple indexes.

HyperCard was essentially the first database to include logic for capturing/displaying data in the same file as the data itself. Any LC stack that consist of multiple iterations of similar data on a common background is a database: each card is a record and the card name is the key. If one creates a library of handlers to perform the functions listed in (a) and (b) above, multiple stacks can share the data without scripting stack-specific logic.

Shameless plug: years ago I created SDB (Serendipity Database-Binary) in HyperCard, and I still use the LC version today. It's a library and associated stack format that supports all of the functions I listed above except for multiple index paths.

Cheers!

Rob

dunbarx · Post by **dunbarx** » Thu Nov 17, 2022 7:16 pm

Thanks, Rob.

I assumed that DB's, which seem like grown-up sorts of gadgets, had that native functionality I thought they might.

But what about external files? They are stupid, no? But I suppose they also can be shared, so is that their main advantage?

As for hypercard, just about the only HC stack I still support is one on our network. A dozen users access it, read and write. I go around record locking by accessing a local "mirror" card (a copy of the main) on each client. People read from the "main" card but write to their local mirror. The HC server stack sees who is trying to write at any moment, and prevents collisions. Each mirror will write to the main card on the server as soon as the coast is clear.

A kludge and a half.

Craig

FourthWorld · Post by **FourthWorld** » Thu Nov 17, 2022 7:38 pm

For me, the journey to externalized data storage began in HyperCard, the moment a client wanted to make a second version of his product.

In the first version we did what HyperCard uniquely encourages developers to do: just leave it in the UI. Seemed simple enough.

But V2 incorporated a great many changes to the UI - where the data was stored. So replacing the UI meant replacing the data.

So we did what HC folk did in those days, at least the subset of HC devs with products successful enough to merit a major upgrade:

We encumbered the user with two extra steps. Before upgrading they were required to export the data, and then after upgrading they were required to re-import it to restore it where they'd left off.

As my interest in development grew, I found I wasn't alone. Experts had been writing about software development for many years. I read them, as many as I could make time for.

I found a common theme among many professional developers: a desire to maintain a separation of concerns. Code, UI, and data can be managed separately, and when done well it means changes to any one of them has minimal impact on the other two.

Consider the humble word processor: you can change the app, which may change the entire presentation of the data in amazing new ways, and the data just goes along for the ride smoothly.

Whether the separation of the application from user data is explicitly part of the workflow (as with a word processor) or automated (as with an iPhoto index), that separation is always present, allowing one to be changed without affecting the other.

Similarly, separating core business logic from an app allows it to be separate from the UI.

I once wrote an app with an embedded search engine. Though initially delivered on CD-ROM, eventually it became a web product. And because the core logic was written with an awareness of separation of concerns, we just picked up the search engine part, moved it to the server, and added an HTML wrapper for the output. The rest of it just went along for the ride.

Code, UI, and data: wherever practical, separation yields flexibility for maintenance and enhancement.

--

Later in my journey I encountered data sets large enough to impose memory constraints. This was in the Mac Classic days, before macOS have us NeXT's UNIX underpinnings, with all the superior memory management UNIX offers. "Out of Memory" issues were a thing in those days.

We had to separate the data because even though the addressing within the engine allowed to to 4GBs in a field or other container, with everything else going on in the machine we simply didn't have 4GBs available.

So I began reading what I've come to learn is what most CS literature is historically about: the tradeoffs between disk and RAM, and the smoothest ways to move data between the two.

External data can use as much space as storage allows. RAM is for the subset we care about in the moment.

Your can have any number of word processor documents on disk, but right now the one that matters is the novel you're writing at the cafe.

--

Then we needed to find stuff, to query collections to get the subset we're interested in. And that brought me to indexes.

Indexing is a vast topic in itself, but for here consider the speed difference between iterating through an entire collection of addresses to the those with a given zip code, and having an index that's already sorted by zip code so you can get all those records in just one step.

Indexing is a fascinating world of reading, and I can't recommend it strongly enough for geeks who enjoy discovering inventive solutions.

And it's not just performance, though it's hard to beat complied object code purpose-built for a task. It's also flexibility, for finding, extracting, and presenting the found set.

A good query language lets you specify what criteria you records to meet, along with which fields you want to display once they're found.

Sure, we can write these things in xTalk, and I have (wrote a nifty lib years ago for working with tab-delimited files once, fun to do and useful for what I was doing).

But with a database engine you don't need to write that. They've already done it. And they've done it in highly optimized complied object code, so it's much faster than anything any scripting language can do. And by using a popular DB engine you get the work of many hundreds of specialists, so the code is generally more robust than anything a single individual would be able to design, test, and debug working alone.

--

And then the Internet happened.

Before that, applications were generally designed for a single user to run on a single computer.

The introduction of laptops introduced something we'd never had to think about before: keeping data in synch between multiple computers.

And with networking the opportunity was even bigger: collaboration between multiple users.

Those who'd already cultivated habits centered around maintaining a separation of concerns found that it makes relatively little difference to the application whether it pulls data from a local storage device or a remote server. Once it's loaded it's all the same.

But oh what a difference it makes for workflows.

I started this reply on my phone, and when it got verbose (pardon the length; you know how I can get when telling stories) I finished it on my workstation. I don't even know exactly where this text lived while I was working on it, where the server is physically located. I don't need to. All I know is I can work on it anywhere, from any computing device I happen to have in my hands at the moment.

Separation of concerns has many, many benefits.

--

All that said...

...deep at the heart of every Linux system is a database so critical that if munged there's a good chance your machine won't boot. And though we may hold in our hands computers with other OSes, we use them to work on remote machines which are usually Linux (iCloud is a Linux farm, for just one example, and these forums are run on Linux as another), so this bit of trivia affects all of us even if we don't identify as "Linux users":

fstab is the File System Table, listing storage devices and mount points.

And it's a simple space-dimilited text file.

Sometimes even the most critical data can benefit from simplicity.

So for all the talk about size, performance, and complexity, there remains a solid and rather wide range of use cases favoring simple data storage.

Flat files have a place. Collections of text files drive many powerful CMSes, and delimited text is a wonderful option for things that lets lend themselves to memory-bound work that fits well with a row-and-column format.

Even better: you can open text files with a wide range of programs. If you've been in the biz long enough to see a favorite app with a proprietary format reach end of life, data longevity becomes important. And even before then, flexibility with data editing is rarely a bad thing.

Data need not be fancy to even enjoy the benefits of the cloud: file synching like Dropbox, Google Drive, iCloud, Nextcloud, and others can allow you to work anywhere without having to craft a custom synch solution yourself.

And for simple things for personal use, there's little penalty for storing data in the same stack file where it's displayed. The convenience is hard to beat. I do it all the time.

But for professional works delivered to others, I just do what the rest of the world does: maintain a separation of concerns that supports maintenance and enhancement.

I've been bitten by choosing otherwise, and have enjoyed many benefits from following the guidance of professionals on this.

dunbarx · Post by **dunbarx** » Thu Nov 17, 2022 7:45 pm

Richard.

Can you be more responsive, please?

Craig

dunbarx · Post by **dunbarx** » Thu Nov 17, 2022 7:46 pm

I know how to use LC, but clearly I work in a much smaller space than others here.

I did mention narrow mindedness...

Craig

FourthWorld · Post by **FourthWorld** » Thu Nov 17, 2022 7:48 pm

dunbarx wrote: ↑
Thu Nov 17, 2022 7:45 pm
Richard.

Can you be more responsive, please?

Craig

That's another thread.

https://forums.livecode.com/viewtopic.p ... 58#p219475

FourthWorld · Post by **FourthWorld** » Thu Nov 17, 2022 7:52 pm

dunbarx wrote: ↑
Thu Nov 17, 2022 7:46 pm
I know how to use LC, but clearly I work in a much smaller space than others here.

I did mention narrow mindedness...

The sum of human knowledge is vast. No matter how much we devote our time to learning something, that's time we haven't spent learning something else.

Everyone is narrow minded in many ways.

I took up board game design during the pandemic. Oh boy did I learn just how little I know.

RCozens · Post by **RCozens** » Fri Nov 18, 2022 6:23 pm

dunbarx wrote: ↑
Thu Nov 17, 2022 7:16 pm
But what about external files? They are stupid, no? But I suppose they also can be shared, so is that their main advantage?

Yes, external files can also be shared; but more importantly they can be modified and replaced without recompiling the applications that use them.

If, for example, icon images are referenced rather than imported, I can change one by replacing the GIF, JPEG, or PNG file and the compiled app will reflect the change without recompiling. Or I can change them all and create a new look to my app without requiring existing users to replace existing code. I could also create an app that displays help messages from an external text file and build multiple versions of the file in different languages for different installations.

The downside is the app won't work as expected if any external files are missing from a particular installation.

Rob

dunbarx · Post by **dunbarx** » Sat Nov 19, 2022 1:41 am

Thank you both.

My entire LC world is always about a stack, perhaps including substacks.

I do save data to external files, because my engineers use LC to create worksheets and LC-generated drawings, and often have to recover and rebuild certain past work at a later time. But that is not using external files to hold data particularly, though it touches on it, since a single folder holding all their collective work is accessible to all.

I doubt I will ever move out of that space.

Craig

stam · Post by **stam** » Sat Nov 19, 2022 8:24 pm

dunbarx wrote: ↑
Thu Nov 17, 2022 7:16 pm
But what about external files? They are stupid, no? But I suppose they also can be shared, so is that their main advantage?

They are not stupid, no. It depends on what you’re building though. Not sure sharing even enters into it… if you need that it should be trivial to write an export function…

As Richard said it depends on your use case. If you are writing a small app that won’t need many queries and is only for you, there’s probably not much benefit to a database (whatever form that may take).

In general you will have hugely more flexibility separating data from UI. So that’s a good reason.

Many of the external formats (eg couchDB, LiveCloud, SQL for example) will offer highly optimised / speedy ways to query the data in complex ways. If you need this functionality then rolling your own won’t ever be as good. Probably.

You may need of utilising the same data source live in other apps - may of the options for file storage can be used by any platform (that’s not the same as “sharing” as the data source can still be the same for any number of apps).

Then there’s security. Some options like LiveCloud will fully encrypt your data and secure authentication will be needed to access.

So it’s not one-size-fits-all and you’ll choose what suits your needs.

But in general (as anyone who has authored FileMaker Pro solutions for distribution will tell you!) separating data from interface makes things a lot easier down the line… even if that’s just to another LiveCloud stack used for data storage….

S.

stam · Post by **stam** » Mon Nov 21, 2022 9:46 pm

Just as an extra comment on this, I rolled my own snippet manager based on SQLite as an external data store. Should I wish to write a whole different app in the future based on the same data, it's just a matter of using the same db file. There are many ways to do this but it was good practice to brush up on those rusty SQLite skills (liveCloud has made me lazy...)

Have a look here: https://forums.livecode.com/viewtopic.p ... 42#p219642

LiveCode Forums

Why DB's and external files?

Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?

Re: Why DB's and external files?