Given the ways XML bounds data with tags, parsing file too large for practical use in RAM will be very slow. If the data is read-only you could traverse it once and create a list of pointers into the data, but then the trick is how to access the pointer you need - lineoffset works well with thousands of items, but can be slow with tens of thousands, and at that point reintroduces the RAM concern.
On the dev list a few years ago I floated the idea of LC providing something like a disk-based array, a simple key-value store that would be efficient to get and put values by a given unique key, but unlike arrays it would leave most of the data on disk.
The suggestion from nearly everyone in that thread was to use SQLite, and given SQLite's ever-growing popularity and its rich feature set (now even richer in LC since the SQLite component was updated recently), it's a very good choice indeed.
But even the collective wisdom of the community never stopped me from foolish experimentation.
Curious about how b-trees work (the underlying mechanism of nearly every large data store, from the various SQL implementations to most of the NoSQL DBs like CouchDB and MongoDB), I set about exploring whether I could make an efficient b-tree in LiveCode.
Not surprisingly, I can't. B-trees are complex, so much so that relatively few ever write their own; most DBs are based on a handful of b-tree implementations that have been floating around for years, reused in project after project, occasionally revised but rarely attempted from scratch. And even if I could, it would be slow in LC compared to the highly-optimized C-based versions commonly in use.
But I did stumble across something else, a sort of hash table that more or less splits the difference between what LiveCode does well and what it doesn't do so well, arriving at the beginnings of a library I call libDiskArray.
The goal with this was to provide a simple key-value store optimized for CGI use, probably the most demanding environment we use LC in since everything it does runs from a cold-start with each HTTP request, and since most folks use LC Server on shared hosts both CPU time and RAM are critical considerations. Since it can work on files up to 4GB (and with a modest revision could accommodate several terabytes), it may be useful in some desktop contexts as well.
The good news is that if the only thing you need to do is store or retrieve a given value by key, it outperforms even SQLite, and it's all in just native LC in just a few hundred lines of code so it's easily embeddable with minimal overhead. In fact, aside from the value of a key itself, the largest read it does is only 8k, and touches the file only 4 to 5 times, so it's reasonably light on system resources.
But the bad news is that it only does that one thing, setting and getting a value by key. If you need to do any aggregate operations across the data set, like queries, you're left with iterating across the data set, as indexing is a level of complexity I won't be exploring with this.
Since it's extremely rare that I need only that one feature, I've set libDiskArray aside to focus more on SQLite. It really is a better choice. The performance was gratifying, but really somewhat modest (only about 20% faster than an equivalent lookup in SQLite), and with only one feature compared to SQLite's richness it was a good learning experience but not suitable for most practical uses.
Pardon the long post, but in short:
Working with large collections of data from disk is inherently complex, and most useful solutions use b-trees at their core, which are too complex to bother attempting in LC. The engine could do this, but since RunRev has already provided access to SQLite there's almost no reason to, since SQLite has efficient store-and-retrieve along with a wider range of features than anything the team could make time to come up with themselves.
If you absolutely need only a key-value store, have fewer than 1 million keys, and all of your data plus the keys totals less than 4GB, I might be motivated to consider finishing libDiskArray.
But first I would encourage you to consider SQLite. It's a great toolkit, and relatively easy to learn given the power and flexibility it provides.
I found this book enormously helpful in getting started with SQLite:
Using SQLite
Small. Fast. Reliable. Choose Any Three.
By Jay A. Kreibich
http://shop.oreilly.com/product/9780596521196.do