Ways to Sort and Get Data, Two Ideas I have

Creating desktop or client-server database solutions?

Moderators: Klaus, FourthWorld, heatherlaine, robinmiller, kevinmiller

Post Reply
trailboss
Posts: 112
Joined: Sat Dec 13, 2008 4:55 pm

Ways to Sort and Get Data, Two Ideas I have

Post by trailboss » Sat Dec 13, 2008 6:18 pm

I've been writing in the runrevlist, but I thought I'd try this out because my email is so hard to navigate. Mark Schonewille and others answered some questions about this. I have 17757 records for birds with items Birdname,Date,Place. That look like this:

Common Snipe,10/20/2008,Lazy River
Common Snipe,1/20/2007,Zamora Mexico
Common Snipe,11/20/1999,Albuquerque
Common Moorhen,11/26/1990,Lazy River


I was advised to make a card for each different bird (500 or so) and to use the applicationstack to draw data from the invisible substack to make reports and things so I don't have to destroy my front end if I ever switch to SQL or something.

A card per bird would be fine, but, you know, I also could draw data from the field with 17757 lines.

HERE ARE MY QUESTIONS ABOUT TWO IDEAS I HAVE:

To manipulate the data in a 17757-line field, I have two really fast tools:

1: In Rev, sorting by date or item in the field is super smokin' fast even with 17757 lines.
2: In Rev, the FIND string feature is super fast too. I type in "find string Lazy River" and It finds the first instance of Lazy River immediately though it may be in line 10,000 of the field. It puts a little box around it.

IDEA ONE

Thus, all I need is one more thing to be able to create any report fast! It is this:

I need to know the the field LINE NUMBER of the word that Rev finds. Then any calculations no longer involve 17757 lines, but far fewer because I know where to start sorting in the field and I've only got twenty records for Lazy River.

THUS: Is there a way that I can find out what line number a found string of text is in upon Rev's finding the string in a field?

OR HERE'S ANOTHER WAY:

I will also be invincible If I can expand my sorting and it remains fast.

Let's say I sort by date and then by place.

Albuquerque is on the top of the field and Lazy River is somewhere in the middle -- say, line 8000. Can I have the sort put Lazy River at the top of the field? You know begin the sorting by L or Lazy? I would then know that the date of Lazy River's first record would be in line number ONE of the field. Then I could manipulate data like mad.

Please tell me there's a way to do either of these things! :lol:

Thanks very much!

Trailboss Tom

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5138
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Post by Mark » Sat Dec 13, 2008 11:19 pm

Hi Tom,

To find something quickly in a bunch of data, regardless of whether it is in a field, a custom property or in a variable, you can use the offset function.

Code: Select all

put lineoffset("Lazy River",fld "Bunch of Data") into myLineNr
You can also start searching later in the data.

Code: Select all

put lineoffset("Lazy River",fld "Bunch",myStartLine) into myNewLine
add myStartLine to myNewLine -- the line containing the string
Sorting might be slightly slower, in this particular case. You will probably need a function like this.

Code: Select all

on specialSort1 theString
  set the itemDel to comma
  put fld "Bunch of Data" into myData
  sort lines of myData by item 3 of each
  put lineoffset(theString,myData) into myFoundLine
  put line myFoundLine to -1 of myData & cr & line 1 to myFoundLine - 1 of myData into myFoundData
  return myFoundData
end specialSort
This function returns the data with the requested string in the first line and the remaining lines sorted alphabetically.

Best regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

trailboss
Posts: 112
Joined: Sat Dec 13, 2008 4:55 pm

Fun Big Time

Post by trailboss » Mon Dec 15, 2008 1:11 pm

Thanks! The offset works great. I have a front end now that does some great junk. With 17757 lines in my field, I can do the small sorts really fast, but the sorting that has to extract hundreds of records from thousands of lines requires that I put the records in individual fields in the datastack. For example, I'll need to make 500 bird cards to help speed things up. I take it that a relational database would go faster, but this will have tons of redundancy. For example, I will have all the records in one field and also the same info on bird cards and even special cards for places, so I can sort them and see what I saw in a place. Whenever I enter a bird, the bird name, date, and place will go to several fields on cards to make things easier to sort out. Is this what people do, or am I a light weight? Also, in the RevListServ I was told that I should keep the datastack separate, so I can keep the front end should I later connect to an SQL databse. But I would have the same funky sorting scripts so I don't know what the SQL database does to help. Shows what I know, but I'm sure having fun with this.

teks
Posts: 7
Joined: Thu Dec 18, 2008 1:31 am

Post by teks » Thu Dec 18, 2008 2:16 am

@trailboss: perhaps a relational database backend would be better suited for storing, searching and retrieving this data for you. There are free, open source alternatives which you can use with Revolution, and if your dataset is reasonably large, and your searches reasonably complex, then this may be a better way to go.

You may have to give us a more detailed explanation of what information you want to store, and what kind of searches you will need to perform, before you get more precise advice. In order to be able to give you an example, let's assume that:

1) I want to store information about BIRDS - such as scientific name, common name, picture, etc.
2) I want to store information about LOCATIONS - such as name, address, rating, notes, etc.
3) I need to store information about BIRDSPOTTING - this will include a BIRD, a LOCATION, as well as other info such as date when bird was spotted, notes about the spotting, etc.

This would, indeed, be very simple to setup in a database. All you would need is 3 tables: 'birds', 'locations', 'spottings'. Each table might have the following fields:

Code: Select all

-------------------------------------
birds
-------------------------------------
id (integer, primary key, autoincrement)
scientific_name (text)
common_name (text)
picture (blob)
call (blob)


-------------------------------------
locations
-------------------------------------
id (integer, primary key, autoincrement)
name (text)
address (text)
rating (integer)
notes (text)


-------------------------------------
spottings
-------------------------------------
id (integer, primary key, autoincrement)
location_id (integer)
bird_id (integer)
date (date)
notes (text)
The fields "location_id" and "bird_id" in the 'spottings' table actually point to an existing record in the "locations" and "birds" table respectively. As you can see, there would be no duplication of data. Accessing, searching, and retrieving custom lists from this data (using SQL) would be very fast, and quite easy to do. I have just tested a sample database (SQLite) with 50,000 randomly generated names, and in my (not-so-new) computer it takes Revolution about 20 miliseconds to search the entire list, and place the results in a field. That means, that I would have to have about 5,000,000 records before the search would take 0.5 second.

The other advantage, is that by moving leaving your data storage into a database, your Revolution-based interface will be much simpler. Instead of having hundreds of cards - ie., one card for each bird - you will just need to have one single card, which displays the currently selected bird record from the database. You will probably need just a 'Birds' card, a 'Locations' card, a 'Spottings' card, and a 'Search' card. All in all, this type of design might be easier to implement and maintain, in the long run.

I hope this helps.

trailboss
Posts: 112
Joined: Sat Dec 13, 2008 4:55 pm

Post by trailboss » Mon Dec 22, 2008 5:09 pm

Dear Teks:

Thanks for your help on this. Yes. I think it is time for me to do the database right.

You are about right on with your tables. My HyperCard Stack has a card for each bird and each location. There are some 500 bird cards and perhaps 100 locations. There are 2830 cards because everytime I go somewhere there is a fresh card for the location.

Each bird card has a list with sighting, date, and notes on that particular bird that I saw that day. If on a bird card, I click on any of the sightings of that bird, it takes me to a location card which has all the birds seen there on that date and has a journal entry. This is a tremendous amount of fun because I can reminisce. This is the main job of the stack.

However, I also sort to make all kinds of reports. I make reports on locations or, say, how many black-throated sparrows I have seen at Pima Canyon, or on what day I recorded the greatest number of species of bird or how many trips I made in 1998 and where I went and what I saw. I have an "Explore Time" report in which I get a list of all the trips I took in a year and when I click on any of the dates, I'm taken to that day with its birds and journal entry.

In many reports, the info comes slowly because I have to lock the screen and have the script go to each bird card or location card one at a time to check for the data. I have made more than 700 trips to a favorite street corner, and so sorts for that favorite location are too slow to be practical, though they work.

I have a graphing feature, too, which graphs all sightings of a bird by year or place or whatever. I can see its migratory pattern that way. I know I can add this to the new db later.

Well, as you can see, there's nothing too complicated about my sorts. I'll I have a button for what we call "Life List" that lists all the species I have ever seen by the first date I saw them, but this is slow with my set-up, and listing by taxonomy is a pain too. The simple tables you describe solve all of this.

This database is one of my real joys in life and if I could get a simple sample of a db in Rev linked to a free SQL db, I could proceed. I'm quite good at writing script for Rev and hope to be able to use the skill to manipulate the data. Can I?

How do I get a free SQL backend?

Is there a sample db attached to SQL that I could alter to make my new db?

Thanks a million!

Tom

bangkok
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 834
Joined: Fri Aug 15, 2008 7:15 am

Post by bangkok » Mon Dec 22, 2008 5:40 pm

trailboss wrote:Dear Teks:
How do I get a free SQL backend?
SQLite

http://www.sqlite.org/

I think it would be the perfect choice for you.

But you need a good software to create, manage your SQlite database.

This one is nice :
http://sqliteadmin.orbmu2k.de/

You can easily create your DB, your tables, and the columns in your tables.

The advantage of SQlite is that there aren't several files to deal with, nor a server to set up.

It's just one file. One database = one file (with everything inside, design of your DB, and the datas).

You want to give your DB to someone ? Just give the file. :)

marksmithhfx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 487
Joined: Thu Nov 13, 2008 6:48 am
Location: Winnipeg, Manitoba

Post by marksmithhfx » Sun Apr 05, 2009 4:16 pm

Bangkok, I read elsewhere on the net that Runrev had purchased altSqlLite (I think I have the spelling correct). Do you know if it was ever integrated with runrev?

Janschenkel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 977
Joined: Sat Apr 08, 2006 7:47 am
Location: Aalst, Belgium
Contact:

Post by Janschenkel » Sun Apr 05, 2009 7:26 pm

The sqlite database driver is indeed integrated with Revolution Studio/Enterprise since version 2.8.1

Jan Schenkel.
Quartam Reports & PDF Library for LiveCode
www.quartam.com

Bernard
Posts: 184
Joined: Sat Apr 08, 2006 10:14 pm
Location: London, England

Post by Bernard » Fri Apr 10, 2009 2:57 pm

Well, I'm all for technology, but then again I'm all for simplicity too :)

I think the fundamental thing here is that for large (or even largish) amounts of data, storing stuff on cards may end up being unacceptably slow.

However, by putting stuff in a relational db, all you are fundamentally doing is separating your interface from your data, and using card(s) to present the data that you've hived off into the database.

You can go that way, but you will end up having to learn the vagaries of some database system and learn SQL. If you want to learn those things, go ahead. SQLite is just about the simplest form of 'backend' database.

If you have experience with Hypercard, then you could just increase your knowledge of Rev, and hive off your data from your interface. But instead of saving the data in the database, you could just save it in Rev arrays (or dictionaries as they are known in other languages).

Effectively a Rev array is a table, where things are stored by unique key. So taking the example of the relational tables someone suggested earlier by teks, you could store each item of your bird data in 3 rev arrays. In fact, given the new developments in Rev (multi-dimensional arrays) there are ways to store the data in just 1 array.

The only time when I would say that a relational database is the best way to do things is a) when the amount of data is so huge it would exhaust the memory of your computer (open stacks are all memory-resident), or b) when you need transactions or need to be able to rollback, or c) when the data needs to be accessed simultaneously by many users, or d) you need some other feature offered by an advanced database (full-text indexing, replication, etc).

In one test I did I had a query that had to traverse the table structure of a relational database repeatedly. No matter which database engine I used, no matter how I worked the SQL, no matter which special features of the database I used, the results were always returned unacceptably slowly. When I performed the same test using Rev arrays as 'tables' and performed the same recursive queries, it was 100 times faster than searching any database.

So it may be that by learning more about Rev, and thinking about how you can keep your data and your interface separate, you might be able to do the whole thing completely satisfactorily inside Rev.

marksmithhfx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 487
Joined: Thu Nov 13, 2008 6:48 am
Location: Winnipeg, Manitoba

keeping it simple, but secure

Post by marksmithhfx » Mon Apr 20, 2009 5:41 pm

Hi Bernard, in one application I am thinking about we would like to eventually deploy over the net. I understand 4.0 is going to provide this extensibility (ie. a fourth platform). However, it is health related data and there is strong concern about privacy. Hence, we don;t want to have the whole underlying data table flying all over the internet. Ideally we would like just the result of some process to be presented on the clients computer. I am wondering if in this client/server scenario a rev table would be loaded entirely into the clients computer? (ie. do rev arrays as you describe them get loaded into memory at the beginnnig of each session, or are there specific commands for opening/accessing/closing the data stores? As you can probably tell I am not a rev programer and I'm just exploring rev as one possible implementation option). It might be that for privacy reasons it would be better to store in a database?

Any thoughts you might have would be helpful. -- Mark

PS Jan, thanks for you feedback on sqlite. That was what I thought we might need until I read Bernards message. Now I'm not so sure.

Bernard
Posts: 184
Joined: Sat Apr 08, 2006 10:14 pm
Location: London, England

Post by Bernard » Tue Apr 21, 2009 7:49 am

hi Mark,

You don't have to wait until Rev 4.0 until you deploy a client-server Rev application.

From your description I can see two options that come immediately to mind.

1) Keep as much as possible of the data on the server, and only present the necessary data to the client. You could consider methods such as XML-RPC which Rev supports.

2) If you need to have local data storage on the client (even if only for a subset of the data e.g. for when there is no network connection), you can store that information in a Rev stack outside your application. However with this method you end up with issues to do with reconciling updates (e.g. data updated offline and updated on the server by someone else).

The enterprise version of Rev has SSL included, which would mean that you could use it to encrypt communications between the client and the server in scenarios 1) and 2). Furthermore, you can use SSL to encrypt the local data stored in 2) so that even if the user lost their computer, the data could not be read because it was encrypted (provided of course that the password used to decrypt the data is not stored along with the data).

I would think for this kind of application SSL is a must. You could also use SQLite for the local data storage, but once again I would think the db needs to be decrypted on use. By default SQLite databases are not only not encrypted but they are not even password protected.

(BTW in case it is relevant, the installation/configuration of SSL on Rev for Linux is broken - http://quality.runrev.com/qacenter/show_bug.cgi?id=7149 ).

marksmithhfx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 487
Joined: Thu Nov 13, 2008 6:48 am
Location: Winnipeg, Manitoba

Post by marksmithhfx » Mon Apr 27, 2009 12:47 am

Hi Bernard, thanks for this. I didn't know the client/servr scenario over the internet was possible in runrev (reading there material a lot seems left unstated?) and I didn't know that SSL encryption was available... lots to learn yet :-) But this does sound like the way to go. At the back end we'll have millions of rows but the user query might result in a bar graph with 6 bars, or a pie with 5 slices, or a table with (6 or 5 respectively) items they can graph with another product if the prefer. In other words, the "raw" data is protected, but the user gets to "look at it" in very aggregate/summary form. Thanks for your suggestion. It is going to be fun (and I suspect a steep journey) learning the ins and outs of rev programming.

-- Mark

trailboss
Posts: 112
Joined: Sat Dec 13, 2008 4:55 pm

I'm Back

Post by trailboss » Thu May 14, 2009 7:20 pm

Gosh,

I didn't know everyone was still in here talking about databases. I've been away, but I'd still like to make that bird database with sql, or, as someone suggested, by staying in rev.

I just have bird,place,date,birdnotes,placenotes With these five fields, I can do great things, but I want to use my Rev skills to make nice, pretty cards on which the sorts of the 17000 bird sightings and notes are done. I'm still confused about how to access a db and manipulate the data using my familiar rev scripting language.

Is there a sample stack hooked up to a database that I can just alter? Or could I really stay in rev?

Thanks,
Trailboss

Post Reply

Return to “Databases”