Reading and writing text to / from a file

stam · Post by **stam** » Sun Mar 10, 2024 2:07 am

That's impressive Bernd! Very clever

If I understand your algorithm correctly, this requires knowing approximately how many chars per line there are, and that all lines have a more or less similar number of chars.
In the context of what is being asked your solution works extraordinarily well.

Wild variation in the number of chars per line probably would not occur in this context, but what if it did?

bn · Post by bn » Sun Mar 10, 2024 9:06 am

stam wrote: ↑
Sun Mar 10, 2024 2:07 am
If I understand your algorithm correctly, this requires knowing approximately how many chars per line there are, and that all lines have a more or less similar number of chars.
In the context of what is being asked your solution works extraordinarily well.
Wild variation in the number of chars per line probably would not occur in this context, but what if it did?

I am surprised by the results for "open file" myself. Even if the number of chars per line would vary wildly you could repeat the search with varying numbers of assumed size and should be able to have a hit for the last 200 lines pretty soon at the speed of "open file".
Although I have read a book of about 150 pages that consisted of ONE sentence. But that may be an outlier...

Kind regards
Bernd

FourthWorld · Post by **FourthWorld** » Sun Mar 10, 2024 4:43 pm

stam wrote: ↑
Sat Mar 09, 2024 11:15 pm
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.

a) Safer in what way?

b) Quicker only for certain operations, and slower for others.

There's a reason log files are commonly flat files written with append. RDBMSs are optimized for indexed queries, at the cost of complexity. Review the b-tree structures of RDBMSs to appreciate the large number of steps needed to write and read. Meanwhile flat-file append is optimized all the way through the chain from app to OS to storage device driver to be as close to a single operation as can happen in a modern file system. Being the fastest write mode is what makes it the logging go-to.

I once wrote a storage system in LiveCode Script that could read and write whole records across a large text file using a fixed-width index about 15% faster than those two operations in SQLite.

But of course SQLite blows the pants off it for any more complex operations, which is why RDBMSs remain popular for general purpose storage that needs to support multi-ctiteria searching.

There's a reason the computing world has so many different ways to solve problems. Each use case has a different mix of requirements and there is no one-size-fits-all answer.

This is why I often ask for details from an OP here before rushing to an answer. I'm grateful bbalmerTotalFluency was among the few who responded with more details. Knowing the requirements well helps us identify truly useful solutions.

@bbalmerTotalFluency: how frequently is new data added, and how much on average is added in an update? And at what point does the file roll over to a new one?

stam · Post by **stam** » Sun Mar 10, 2024 4:57 pm

FourthWorld wrote: ↑
Sun Mar 10, 2024 4:43 pm

stam wrote: ↑
Sat Mar 09, 2024 11:15 pm
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
a) Safer in what way?

b) Quicker only for certain operations, and slower for others.

Safer because it’s less prone to random errors. Deleting 1 char in a text file (a column or record delimiter) can alter the contents of the entire file but the same in a database at most can alter 1 record. Plus SQLite is highly compressed and is effective file based storage.

Slower for others? Possibly there are circumstances where cleverness with a text file can produce slightly quicker results, but I would postulate that the majority of times SQLite is much quicker. But like everything there are code optimisations that can be performed in SQL as well. In your example wouldn’t to ensure indexing was adequate for example.

FourthWorld · Post by **FourthWorld** » Sun Mar 10, 2024 5:31 pm

stam wrote: ↑
Sun Mar 10, 2024 4:57 pm

FourthWorld wrote: ↑
Sun Mar 10, 2024 4:43 pm

stam wrote: ↑
Sat Mar 09, 2024 11:15 pm
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
a) Safer in what way?

b) Quicker only for certain operations, and slower for others.
Safer because it’s less prone to random errors. Deleting 1 char in a text file (a column or record delimiter) can alter the contents of the entire file but the same in a database at most can alter 1 record. Plus SQLite is highly compressed and is effective file based storage.

In both methods, "safe" depends on which bytes iare deleted.

Try randomly deleting bytes from both file types and see which one breaks faster.

The structural housekeeping of RDBMSs is not trivial.

Slower for others? Possibly there are circumstances where cleverness with a text file can produce slightly quicker results, but I would postulate that the majority of times SQLite is much quicker. But like everything there are code optimisations that can be performed in SQL as well. In your example wouldn’t to ensure indexing was adequate for example.

Of course, it's all about the use case.

When you need a relational system, use a relational DB. When storage needs involve more discrete whole records with minimal relationality, consider MongoDB or CouchDB. Where associations are more important than lookups, consider a graph DB. And for a wide range of other uses, the humble flat file has stood the test of time for many good reasons.

Why rush to a stent when some patients only need a cleaner diet and some exercise? Of course the answer will depend on the patient.

stam · Post by **stam** » Sun Mar 10, 2024 6:50 pm

FourthWorld wrote: ↑
Sun Mar 10, 2024 5:31 pm
Why rush to a stent when some patients only need a cleaner diet and some exercise? Of course the answer will depend on the patient.

Poor example I'm afraid - one option is treatment and the other prevention - you can't prevent with treatment and you can't treat with prevention - you can only prevent. But I appreciate the sentiment

I wasn't suggesting deleting random bytes of a file - which file format can withstand that and in what context (outside of hard drive failure) would this mythical event happen?
I was suggesting changing the content even by 1 char. A prime example of the pain this can cause is trying to a TSV file where the field contents contain carriage returns. Records become misaligned etc. It doesn't take much at all to do this accidentally do this with a simple text file, but much more difficult to do with an SQLite file.

And to be clear I'm not comparing text files with RDMS's - I'm comparing text file storage with SQLite file storage (not an RDMS). Ease of coding for both is similar. Both can be used interchangeably in many situations - but many LC users simply don't realise how blazingly fast SQLite can be... and it's an excellent storage medium.

SQLite can be used as a substitute for ZIP archives or Tarballs. An archive of files stored in SQLite is only very slightly larger, and in some cases actually smaller, than the equivalent ZIP archive. Think what that means if your text file uncompressed is > 1 Gb. And an SQLite archive features incremental and atomic updating and the ability to store much richer metadata.

So yeah so people prefer text files and tableFields, others prefer SQLite files and datagrids. You can use both, but in both cases, truly large amounts of data favour the latter - in my mind anyway.

SparkOut · Post by **SparkOut** » Sun Mar 10, 2024 10:21 pm

The point is also that there needs to be a pertinent assessment of the data and its handling.
If there are reasons why a database is not being used then those reasins can be respected. However, I really don't understand why only one single massive textfile that is holding millions of records for historic archival purposes should also be the working data file.
Work on a daily file. If need be, you can append that to the monolithic archive file - although I am not sure that would be needed.
If there is an argument along the lines of "but we have other applications which look into this data with a hard coded filename" then it's one thing to accept that "because of ... we have to..." but another thing completely to blame LiveCode or any other system for inefficiency or inelegance. What you will find (eg Bernd's example) is that LiveCode can offer some very clever and elegant ways of dealing with "problem scenarios" but requires some inventiveness that would be unnecessary if the data structure was thought out in a more appropriate manner.
My thoughts anyway.

FourthWorld · Post by **FourthWorld** » Mon Mar 11, 2024 12:53 am

stam wrote: ↑
Sun Mar 10, 2024 6:50 pm

FourthWorld wrote: ↑
Sun Mar 10, 2024 5:31 pm
Why rush to a stent when some patients only need a cleaner diet and some exercise? Of course the answer will depend on the patient.
Poor example I'm afraid - one option is treatment and the other prevention - you can't prevent with treatment and you can't treat with prevention - you can only prevent. But I appreciate the sentiment

My layperson's imprecision with that metaphor is useful: I know you're an accomplished specialist, and would never presume that my limited knowledge in an area only peripherally related to my vocation could come anywhere close to the dedicated study and practice of a specialty over a good many years.

I wasn't suggesting deleting random bytes of a file - which file format can withstand that and in what context (outside of hard drive failure) would this mythical event happen?

You might be surprised what some folks do with their files. I've seen people attempt all sorts of things, and then express surprise that they broke the file. But even when used as intended:

I was suggesting changing the content even by 1 char. A prime example of the pain this can cause is trying to a TSV file where the field contents contain carriage returns. Records become misaligned etc. It doesn't take much at all to do this accidentally do this with a simple text file, but much more difficult to do with an SQLite file.

Any structured data is best read from and written to through a provided interface. Sanitizing inputs is a useful practice with most file formats (mind those semicolons).

And to be clear I'm not comparing text files with RDMS's - I'm comparing text file storage with SQLite file storage (not an RDMS).

SQLite is a relational database system.

Ease of coding for both is similar. Both can be used interchangeably in many situations - but many LC users simply don't realise how blazingly fast SQLite can be... and it's an excellent storage medium.

There are many storage options, each with their strengths and weaknesses. There is no one-size-fits-all solution. SQLite is quite capable and well suited for many things, but I can't recommend strongly enough the value of studying its internal structures and algorithms before choosing it for a given task, doubly so if tempted to recommended it to others for an undefined range of tasks.

SQLite can be used as a substitute for ZIP archives or Tarballs.

I would not make that recommendation. And given that both Zip and SQLite have grown in popularity over the years without either replacing the other, it seems a good many systems and application engineers see very different benefits in each.

So yeah so people prefer text files and tableFields, others prefer SQLite files and datagrids.

Flat files can be displayed in datagrids, and SQL query results can be displayed in columnar fields. I use both storage methods, and both display methods, each for different reasons.

You can use both, but in both cases, truly large amounts of data favour the latter - in my mind anyway.

At the heart of finding optimal solutions for a given task is understanding the system as a whole. Most applications can be seen as boxes where stuff comes in and other stuff comes out, and knowing what's happening on either end of that process chain will inform the choices within the box we're writing.

Logging is most commonly done with append, because it's the most efficient method commonly available (which is also why CouchDB chose an append-only model). Tail reads are so commonly needed most shells include a "tail" command for it. And it's not surprising that a system as flexible as LC includes scripting interfaces for most common file I/O operations ("seek", "read at", etc), in addition to higher-level conveniences like "put url".

With bbalmerTotalFluency's system, I've asked him for details, and SparkOut further reinforced the value of understanding the whole system.

It may be that rewriting all file I/O to use a relational database could be a good fit. It may be that the relationality isn't needed. It may even be that these are log files arriving from other sources over which he has no control.

I'm disinclined to suggest he rewrite anything until I know more about it. In the meantime, tailing is what he asked for, and with Bernd's example we see LC is once again up to the task.

stam · Post by **stam** » Mon Mar 11, 2024 3:51 am

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am
SQLite is a relational database system.

No one said it wasn't a relational database; it's not an RDMBS. It's a file based system that manages 1 database. As opposed to RDBMS's that manage a whole lot more and are not file file based. That is the distinction I draw, rather than it uses SQL syntax, as this makes it comparable to a text file (the only difference being what commands you use to access the contents). And you seem to be sticking on 'relationality', even though 'relationality' doesn't even come into it. The storage would be as flat as a text file - but it's the data access that changes.

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am

SQLite can be used as a substitute for ZIP archives or Tarballs.
I would not make that recommendation.

Well that is of course your prerogative.

I literally lifted that verbatim from SQLite home page on use-cases for SQLite: https://www.sqlite.org/whentouse.html.
Look down to the section File archive and/or data container to see this from the horse's mouth. I'll be honest, I have faith in the team that created this wonderful thing and would trust them explicitly. If you're interested, they also give scenarios where SQLite should not be used and again I'd trust them on that as well.

But that's besides the point. I wasn't suggesting we all drop zip and start using sqlite like you seem to be suggesting. I really shouldn't have to explain this, but it seems it have to:
What I'm saying is that there is inherent compression and efficiency with the file format - the 1 Gb text file may become < 200 Mb. That was the point I was making.

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am
In the meantime, tailing is what he asked for, and with Bernd's example we see LC is once again up to the task.

No doubt - and that's why we all use it. And Bernd's code is magic.
But that's a creative solution that had to be invented tailored to the exact scenario at hand. And would need to be modified palpably to match other use-cases (eg if 1 line has 3 chars and 1 line has 300 chars).

An SQLite approach would be more generic because no such inventiveness is needed. It just works and works fast. It would be interesting to see how the read would perform in SQLite - not because it can perform better (hard to beat 1 ms!) but if comparable it just goes to show it can be applied to any approach, without the need to invent clever solutions every time a new situation arises.

And to be clear I'm not suggesting and do not in the remotest expect the OP drops everything he's doing and switch to this - this extrapolates to a generic discussion because ideally any clever solution should be re-usable in many scenarios.
Granted, in this special case Bernd got remarkable results, but that approach is specific to the problem being solved and can't be applied freely to other contexts.

And there is a need for bringing up stuff that is not considrerd inherently "LiveCode" because many good options are ignored at best, discouraged at worst, because of non-familiarity. A prime example would be regex - it's always fun to see people inventing complex nested loops when a 1-liner in regex would do.

Anyway this is pointless discussion until I or someone else can actually do a direct comparison between methods.

bbalmerTotalFluency · Post by **bbalmerTotalFluency** » Mon Mar 11, 2024 4:53 am

OP here:

A lively and useful discussion. Thanks.

I don't disagree with anyone saying use a db - not an option right now in this particular case. Otherwise, yes, that is the simple solution.

Bernd's test address the heart of my issue. put URL =. 190ms is fine. And in my case, the files are smaller and I can load the whole think in 72ms - which is fit for purpose. However, 190ms vs 1ms is an interesting difference and I was interested that since the problem must be standard that there is not a one line solution. I thought maybe there was and I was unaware of it.

The comment "the clever thing is not to have that many records in one file" is spot on. And as I said - I can do clever. I was attempting to do elegant and simple and use the tools we have been given rather than creating a solution that may not need to exist if the feature was somewhere in the OS already.

I'm sorry, I'm not a big user of forums and no good at it. I want to post in-line replies but they always just appear at the bottom. If anyone can tell me how to, I'll reply to posts individually.

BERND - brilliant solution and super clearly demonstrated. I hope it is of use to many other people. Thanks. The magic (summarised) is this line "read from file tfilePath at - tCharsNeeded until eof". I had not understood that such a thing could work without traversing the file and thus not really having a time advantage. I see now why the feature I imagine should exist does not exist; it is not required if you use "tail".

RICHARD GASKIN - asked about the info, how often added etc. In my case, I'm happy with a very inelegant but simple solution for now. It's quick enough and the code is extremely simple and maintainable. I was asking mostly because I could see a general case.

RICHARD GASKIN - made an extremely good point about RDBMS. In my life I try not to develop prejudices that blind me to more subtle perceptions. His point is that RDBMS are awesome (why they exist) but they are not ALWAYS the right answer (why alternatives exist). My situation is closer to dealing with a log file than the sort of problems for which RDBMS were created. A lot of writing, very little reading, a max file size of 500,000 rows. No need to connect to more than one other file.

RICHARD GASKIN "I'm disinclined to suggest he rewrite anything until I know more about it. In the meantime, tailing is what he asked for, and with Bernd's example we see LC is once again up to the task." - YUP! You got it!! 100%

BTW - in case it is of interest to anyone for perspectives sake. I am not personally against SQLITE and RDBMS - in fact, they are one of the loves of my life. Their elegance stuns me. Their speed amazes me. Their combination of clarity, speed and elegance makes a form of art (in my view). And I've always loved great art.

In short - "horses for courses" is a very correct Uk Idiom (has it travelled over the pond?). In my current case "tailing" is what I needed and what I had not quite understood in terms of elegance.

Reversing my data 500K records takes only 26ms. So I could simply put URL, reverse, and skim the first n records that I need. It all happens in under one tenth of a second on a very much bottom of the line M1 mac. Fast enough for my requirements. However, tailing means that I don't have to worry if my files do, for some strange reason, start to become larger.

I'm avoiding splitting the files only to keep time writing and maintaining code - to a minimum. If it should become a problem then file splitting is the first step I could take and it would be fine. In a deeply irrational way - I feel guilty about making computers work hard when they don't have to. Madness because they don't care but I accept that to be human is to flirt daily with the edge of insanity.

But given the MASSIVE difference for large files - which Bernd so clearly demonstrated, it seems a feature (one liner) worth having and I'm surprised it doesn't exist. I had also wondered if the solution of reading backwards from the end of the file really started at the end of the file or would it have to traverse the file first from the start, thus creating no speed advantage. I am now one day older and considerably wiser.

I'd also say that when I started dealing with these text files I thought a db was ALWAYS the right way to go. I've developed (just lately) an appreciation for text files for simple problems. There are things I can do more efficiently by running full page scans and taking out multiple disparate pieces of info in a single pass. So I am less dismissive of text files than I once was. I wouldn't have been had circumstances not forced my hand.

Perhaps I can share a relevant joke as a thank you. I'll give you the short version.

A rabbi in Moscow walked to his synagogue every day at 9am. A KGB agent having watched this for sometime, intercepted the Rabbi and said "where are you going?" The Rabbi said "I don't know." The agent said "don't give me that. We both know where you are going. Tell me where are you going." The Rabbi said "I don't know". The agent said "Screw this, you are coming to the police station with me" and the Rabbi said "you see, I really didn't know."

Thanks to everyone who put their 10 cents' worth in

Bruce

FourthWorld · Post by **FourthWorld** » Mon Mar 11, 2024 5:33 am

stam wrote: ↑
Mon Mar 11, 2024 3:51 am

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am
SQLite is a relational database system.
No one said it wasn't a relational database; it's not an RDMBS.

You're fine with "RDBS" but not "RDBMS"? What did "M" do to raise your ire?

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am

SQLite can be used as a substitute for ZIP archives or Tarballs.
I would not make that recommendation.
Well that is of course your prerogative.

I don't run the world, I just report what I see in my travels across it.

I literally lifted that verbatim from SQLite home page on use-cases for SQLite: https://www.sqlite.org/whentouse.html.

LC says LC makes a great web authoring solution. Technology providers are proud of their work.

FourthWorld wrote: ↑
Mon Mar 11, 2024 12:53 am
In the meantime, tailing is what he asked for, and with Bernd's example we see LC is once again up to the task.
No doubt - and that's why we all use it. And Bernd's code is magic.
But that's a creative solution that had to be invented tailored to the exact scenario at hand.

Bruce was looking for a tail operation, and was delighted to find LC's file I/O is complete enough to provide that along with seek, append, and the other file system directives scripters of many languages enjoy.

He's content with that, and I'm content to continue enjoying SQLite along with Zip, InnoDB, BSON, LSON, JSON, XML, flat files, and the rest of the stuff the world uses. May you find your contentment as well.

stam · Post by **stam** » Mon Mar 11, 2024 2:21 pm

FourthWorld wrote: ↑
Mon Mar 11, 2024 5:33 am
You're fine with "RDBS" but not "RDBMS"? What did "M" do to raise your ire?

Didn't think I'd need to further clarify, but since you explicitly ask:
SQLite is file storage and while accessed with similar commands to RDBMS's, itself is not an RDBMS. Neither would it be "relational" in this context, it would be a pure, 1 table/1 field flat file alternative for the context described. At least no more 'relational' than a text file.

And may I remind you, it was you that it was you that conflated this by bringing this up as an argument against SQLite, not me:

FourthWorld wrote: ↑
Sun Mar 10, 2024 5:31 pm
The structural housekeeping of RDBMSs is not trivial.

It's a non-argument you chose to argue - SQLite has no "structural housekeeping" requirement - at least no more than a spreadsheet or a text file. Because it's not an RDBMS.

SQLite does however confer an inherent compression (approx 80% compression of ascii text without having to zip/unzip) - the fact you are obviously unaware of this is not an argument.
And lets not conflate the technical specifications from SQLite's website with LiveCode's marketing blurb. It's not a "boast" from SQLite.org, and contrary to LC's marketing blurb, it also provides a long and detailed listing where SQLite should not be used.

Bruce's requirement is an edge case and thanks to Bernd, there is an exceptional solution.
But it's not transferable without modification.

SQLite is a more re-usable solution (obviously not for Bruce, as he has a text file requirement).
Generic approach and re-usability are always my starting points - It's not about oneupmanship, neither is it about "my contentment" (please do try to keep the patronising to a minimum).

In case you are wondering, retrieval of the last 200 records in a >127,000 record, very complex medical imaging database with BLOB image data took an average 93 milliseconds on 10 reads. I suspect that the read time will be of this order of magnitude no matter what the size of the database, and possibly faster if actually optimised indexes etc.
So as a generic solution this stands.

The SQL I used is simple and probably not the fastest:

Code: Select all

SELECT * FROM (
  SELECT * FROM <table> ORDER BY <column> DESC LIMIT 200
  )
ORDER BY <column> ASC;

My point is that without having to invent a creative solution, there is an immediate, fast generic solution, for anyone else that may need this, even if it's not the fastest for a specific solution (obviously this does not apply to the OP's requirement of a text file).

I am usually quite content, thank you for asking; I hope you are too.

LiveCode Forums

Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file