Various ways of matching a line

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Post Reply
MichaelBluejay
Posts: 245
Joined: Thu Jul 01, 2010 11:50 am

Various ways of matching a line

Post by MichaelBluejay » Thu Oct 10, 2019 5:57 pm

It's easy to accidentally match the wrong line. Here I'll offer some solutions, and invite alternatives. (Note: I'm adding good alternatives that were suggested in subsequent posts to this initial post, for one-stop shopping, since the whole point is to make this post a resource for those trying to solve this problem.)

Let's say we have a variable "tData" that contains some lines, with each line being a tab-delimited list, with the first column being an ID.
(The "t" that starts the variable name is standard LiveCode convention to identify temporary variables. You don't have to follow that convention, but you'll see it a lot on the forum and in the documentation.)

Code: Select all

1  heather founder
42 mary    engineer
3  john    intern
2  pat     CEO
5  bob     accountant
If we want to retrieve the line of data for id #5, this would work: put line lineoffset(5, tData) of tData into theLine -- returns line #5

But if we try to get the line for id #2, we get the wrong line: put line lineoffset(2, tdata) of tData into theLine -- returns line #2

Similarly, if any of the other columns contain the string we're searching for, we'll accidentally match the first of those lines as well.

There are various solutions to this problem.

(1) Prepend zeroes. When putting the data in the variable, prepend zeroes to all IDs less than a certain number, e.g. 001, 002, 003. Then match with: lineoffset(002 & tab, tData)

This won't accidentally match 42, and won't match any other line that might happen to use the phrase 002, because we're searching for only 002 followed by a tab.

(2) Match on the cr and a tab. The return character in LC is referenced by either "cr" or "return". So we could try: lineoffset(cr & 2 & tab, cr & tData). That ensures that we don't match the line that starts with 42. By including the tab in our search, we avoid matching lines that start with 21, 25, 28, etc. To ensure that we can match the first line, which doesn't have a return character before it, we prepend "cr &" to the text we're searching. That also solves the problem where the line number returned would be the line *before* the data we want (we match the line with the return character, which is the previous line): by adding the "cr" to the data we're searching for, we've increased the number of lines in the search target by 1. [thanks to klaus for the "cr &" trick.)

(3) Use LC's "filter" command. Filter returns only the stuff from a container that you want. If you don't put the result into a new container, it destroys data in the original container.

Code: Select all

filter lines of fld "data" matching "cat*" -- the field will be modified, with non-matching lines erased
filter items of tData matching "cat" into results -- original variable not modified; item must match "cat" exactly since we didn't use *
We can use filter to get the data for our original problem:

Code: Select all

filter lines of tdata matching ("2" &tab& "*") into tResults
or

Code: Select all

filter lines of tData with regex pattern "2\t" into tResults
(Thanks to bwmilby for pointing out the filter command.)

(4) Loop through the lines. By looping through the lines, we can test for an exact match:

Code: Select all

put tExampleStringAbove into tContainer
put findLineOfId(2,tContainer) into lineNumber

function findLineOfId idNum,container
   repeat with loopLineNum = 1 to the number of lines in container
      if line loopLineNum of container begins with idNum & tab then
         return loopLineNum
      end if
   end repeat
end findLineOfId
For a function that will match *any* field (not just the first one), see AxWald's post below.

(5) Put the data into an associative array. An associative array is an array where the keys are strings rather than integers. For example:

put myArray["color"] = "blue"

We could write a function to loop the data, storing it into an associative array for future use.

Code: Select all

set the itemDelimiter to tab -- default is comma
repeat for each line theLine in tData
   put item 2 of theLine into myArray[item 1 of theLine]["name"]
   put item 3 of theLine into myArray[item 1 of theLine]["job"]
end repeat
Given an ID, we can then access the data like so:

put myArray[5]["name"] into tName
put myArray[4]["job"] into tJob

There are probably other ways to do this.
Last edited by MichaelBluejay on Sat Oct 12, 2019 5:03 pm, edited 14 times in total.

Klaus
Posts: 14249
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: Various ways of matching a line

Post by Klaus » Thu Oct 10, 2019 6:14 pm

(2) Match on the cr also. This method isn't ideal because it comes with more problems. The return character in LC is referenced by either "cr" or "return". So we could try: lineoffset(cr & 2, tData). That ensures that we don't match the line that starts with 42. However, we would accidentally match lines that started with 21, 25, 28, etc. Also, we could never match the first line, because there's no <cr> before it. We'd have to have a blank line at the start. Finally, when matching (cr & something), the line number returned is the line *before* the one that has the data we want, so we'd have to add 1 to the result.
The trick to avoid the addition is:

Code: Select all

...
put lineoffset(cr & 2, CR & tData) into tLineNumber
...

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4184
Joined: Sun Jan 07, 2007 9:12 pm

Re: Various ways of matching a line

Post by bn » Thu Oct 10, 2019 11:35 pm

Hi Michael,

here is an example to turn your data into an array.
It supposes that your data is in a field. I take it as it is as words separated by spaces

Code: Select all

1  heather founder
42 mary    engineer
3  john    intern
2  pat     CEO
5  bob     accountant

Code: Select all

on mouseUp
   local tData, tArray
   put field 1 into tData
   
   put tData into tArray
   
   split tArray by return and space
   
   repeat for each line aLine in tData
      put word 2 of aLine into tArray[word 1 of aLine]["name"]
      put word 3 of aLine into tArray[word 1 of aLine]["job"]
   end repeat
   
   answer tArray[2]["name"] & cr & tArray[2]["job"]
end mouseUp
Kind regards
Bernd

bwmilby
Posts: 463
Joined: Wed Jun 07, 2017 5:37 am
Contact:

Re: Various ways of matching a line

Post by bwmilby » Fri Oct 11, 2019 2:36 am

I would also check out the filter command. Converting to an array would be good if you needed to repeatedly go into your list and pull out values. If it is more of a one shot type of thing, filter may be faster. You can use a regex to specify a number at the beginning of a line. It can either be destructive (change the initial variable) or put the result into a new container.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Various ways of matching a line

Post by Thierry » Fri Oct 11, 2019 9:01 am

Hi Michael,

Here is a naive but functionnal example,
using the matchText() function without transforming your input datas
and putting cols value in respective variables...

Code: Select all

/*
1  heather founder
42 mary    engineer
3  john    intern
2  pat     CEO
5  bob     accountant
*/

on test_CSV_MichaelBluejay
   local T
   put line 2 to 6 of the script of me into T
   answer search( T, "id", 2)
   answer search( T, "name", "mary")
   answer search( T, "function", "accountant")
end test_CSV_MichaelBluejay

function search @T, labelColumn, whichOne
   local C1,C2,C3 -- columns
   switch labelColumn
      case "id"
         if matchText( T, "(?m)^" & whichOne & "\s+([^\s]+)\s+(.*?)$", C2, C3) then \
               return format("for id %d get:\nname: %s\nfunction: %s",whichOne, C2,C3)
         break
      case "name"
         if matchText( T, "(?m)^(\d+)\s+" & whichOne & "\s+(.*?)$", C1, C3) then \
               return format("for name %s get:\nid: %d\nfunction: %s",whichOne, C1,C3)
         break
      case "function"
         if matchText( T, "(?m)^(\d+)\s+([^\s]+)\s+" & whichOne & "$", C1, C2) then \
               return format("for function %s get:\nid: %d\nname: %s",whichOne, C1,C2)
         break
      default
         return "Don't understand: " & labelColumn
   end switch
   return "for " &labelColumn &": "& whichOne & " found nothing!"
end search
Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Various ways of matching a line

Post by AxWald » Fri Oct 11, 2019 10:32 am

Hi,
MichaelBluejay wrote:
Thu Oct 10, 2019 5:57 pm

Code: Select all

put line lineoffset(5, tData) into theLine -- returns 5
Nope. Not at all. Didn't you test this? ;-))
  1. First, "put line [aNumber] into theLine" will throw "compilation error (Chunk: missing chunk)" - line x of what?
  2. Second, "line lineoffset(5, tData) of tData" would return "5 bob accountant" - not "5".
MichaelBluejay wrote:
Thu Oct 10, 2019 5:57 pm
There are probably other ways to do this.
Yep. When working with tabular data for some weeks you'll soon realize that "lineOffset()" has only limited use.
As you found out, it returns the number of the first line containing your search string anywhere in it - which is quite useless with short & ubiquitous search strings.
What you'll rather want is a function that finds you a complete match in a certain "field" of your data, like this:

Code: Select all

function findLine what, theStrg, theField, theDelim
   if theField is empty then put 1 into theField     -- field number, default: 1
   if theDelim is empty then put tab into theDelim   -- itemdelimiter, default: tab
   set itemdel to theDelim
   put 0 into myCnt
   repeat for each line myLine in theStrg            -- find first occurence
      add 1 to myCnt
      if (item theField of myLine) = what then
         return myCnt
      end if                                         -- and return the line number
   end repeat
   return empty                                      -- or report "Nada!"
end findLine
Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10386
Joined: Wed May 06, 2009 2:28 pm

Re: Various ways of matching a line

Post by dunbarx » Fri Oct 11, 2019 4:05 pm

Michael.

This is not the first time you have posted thoughtful musings on the innards of LC. Keep it up!

Craig

MichaelBluejay
Posts: 245
Joined: Thu Jul 01, 2010 11:50 am

Re: Various ways of matching a line

Post by MichaelBluejay » Fri Oct 11, 2019 5:22 pm

Yes, AxWald, I absolutely should have tested the code before I posted it. Usually I do, I forgot in this case, my bad, I'll try to remember for the future.

Thank you everyone for the thoughtful ideas. I incorporated them into the original post, to make it more like an article.

MichaelBluejay
Posts: 245
Joined: Thu Jul 01, 2010 11:50 am

Re: Various ways of matching a line

Post by MichaelBluejay » Wed Jan 13, 2021 8:37 am

I'm facing a problem which reminded me of this thread. I'm unsure whether to use arrays or lineOffset, and unfortunately the documentation for multidimensional associative arrays is pretty meagre.

Say I have a table of data (a string):

Code: Select all

1 apple 3
2 banana 5
3 cherry 4
4 date 2
The columns are ID, name, and quality.

To get the ID from the name, or the name from the ID, I could use associative arrays, looping through each line of the string:

Code: Select all

   put "1 apple 3" &return& "2 banana 5" &return& "3 cherry 4" &return& "4 date 2" into data
   repeat for each line theLine in data
      put word 1 of theLine into myArray[word 2 of theLine]
      put word 2 of theLine into myArray[word 1 of theLine]
   end repeat
   answer myArray[2] -- returns banana
   answer myArray[cherry] -- returns 3
The trick is, how do I get the quality (last column) from the ID? If I try to put that into the associative array with this extra line in the loop:

Code: Select all

put word 3 of theLine into myArray[word 1 of theLine][word 2 of theLine]
...then that kills the ability to get the name by using myArray[2], it returns empty.

I could set up the associative array so that I always have to specify two fields into order to select any third field, as in my last code example above. That seems a little cumbersome, though. When I'm coding and I want to grab the quality field given an ID #, I won't know the name off the top of my head, and likewise, when I want to grab the quality field based on the name, I won't know the ID # off the top of my head.

Seems like the best solution is to make some functions (accountNameToID, accountIDtoName, accountIDtoQuality, accountNameToQuality), and have the function use lineoffset to extract the data and return it.

How would you all handle a problem like this?

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4184
Joined: Sun Jan 07, 2007 9:12 pm

Re: Various ways of matching a line

Post by bn » Wed Jan 13, 2021 2:35 pm

Hi Michael,
The trick is, how do I get the quality (last column) from the ID? If I try to put that into the associative array with this extra line in the loop:

Code: Select all

put word 3 of theLine into myArray[word 1 of theLine][word 2 of theLine]
...then that kills the ability to get the name by using myArray[2], it returns empty.
here myArray[2] is a subarray. Now "banana" is the key to 5.

below is a way to get from index to quality and from fruit to quality.

Code: Select all

on mouseUp
   put "1 apple 3" &return& "2 banana 5" &return& "3 cherry 4" &return& "4 date 2" into data
   repeat for each line theLine in data
      put word 1 of theLine into myArray[word 2 of theLine]
      put word 3 of theLine into myArray[word 1 of theLine][word 2 of theLine] -- changed
   end repeat
   
   -- from index to quality
   put 2 into tIndex
   answer the keys of myArray[tIndex] -- returns banana since "banana" is the key to the subarray
   put the keys of myArray[tIndex] into tFruit  -- tFruit contains banana
   answer myArray[tIndex][tFruit] -- returns 5
   
   answer myArray[2] -- returns nothing since it is an subarray now
   answer myArray["cherry"] -- returns 3
   
   -- from fruit via index to quality
   put "banana" into tFruit
   put myArray[tFruit] into tIndex -- search via fruit name -> index
   answer myArray[tIndex][tFruit] -- returns 5
end mouseUp
Kind regards
Bernd

MichaelBluejay
Posts: 245
Joined: Thu Jul 01, 2010 11:50 am

Re: Various ways of matching a line

Post by MichaelBluejay » Wed Jan 13, 2021 2:55 pm

Thank you very much. I can see that it works, but I'm having a hard time wrapping my head around it. Before you posted, I went ahead and made the functions with lineOffset, so at least that works too.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10386
Joined: Wed May 06, 2009 2:28 pm

Re: Various ways of matching a line

Post by dunbarx » Wed Jan 13, 2021 3:02 pm

Michael.

I would never have even thought of using an array. I am sure that your dataSet is rather larger than the four line example you gave, but with this:
1 apple 3
2 banana 5
3 cherry 4
4 date 2
already in a field 1, I would have just:

Code: Select all

on mouseUp
   put fld 1 into data
   answer "Get which quality?" with "1" or "2" or "3" or "4'"
   answer word 3 of line lineOffset(it, data) of data
end mouseUp
You will, of course, not use an answer dialog, but rather concoct a finding gadget to select your ID from a larger list. But after you have the ID, the process of getting the quality from the dataSet is the same. You have to make sure that your ID's are unique.

Craig

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4184
Joined: Sun Jan 07, 2007 9:12 pm

Re: Various ways of matching a line

Post by bn » Wed Jan 13, 2021 3:11 pm

MichaelBluejay wrote:
Wed Jan 13, 2021 2:55 pm
Thank you very much. I can see that it works, but I'm having a hard time wrapping my head around it. Before you posted, I went ahead and made the functions with lineOffset, so at least that works too.
Michael,
which ever way you solve the problem I would take some moments to get the logic of the array approach. It may not be the best way to do what you want but it shows the structure of a multi-level array.
When I started to use multi-level arrays it took me quite some time to get a grip on them. But once you get the hang of it you have yet another option to choose from.

Kind regards
Bernd

bn
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 4184
Joined: Sun Jan 07, 2007 9:12 pm

Re: Various ways of matching a line

Post by bn » Wed Jan 13, 2021 3:30 pm

Michael,

I try to break this down a bit more hoping to make it a bit more clear

when you code

Code: Select all

put word 3 of theLine into myArray[word 1 of theLine][word 2 of theLine]
you create an array with 2 keys and 1 value.
The keys are myArray[index][fruit], the value is quality: myArray[index][fruit]quality

If you ask: put myArray[index] into tVariable then you get an array in tVariable. It looks like this
tVariable[fruit]quality
That is the reason why the "keys" of tVariable in this case is "fruit" and you use this key to get to the value of "quality".

Some people can explain better than I do but I hope to give you an idea.

Kind regards
Bernd

MichaelBluejay
Posts: 245
Joined: Thu Jul 01, 2010 11:50 am

Re: Various ways of matching a line

Post by MichaelBluejay » Wed Jan 13, 2021 5:04 pm

It seems like the advantage of using an associative array, if there is one, is to make the code more readable. Referring to the cells this way makes sense:

Code: Select all

fruit[id] >> returns name, where "id" is a variable
fruit[fName] >> returns id, where "fname" is a variable
fruit[id]["quality"] >> returns quality, where "id" is a variable and "quality" is a literal, quoted string
fruit[id]["otherAttribute1"] >> same as previous
fruit[id]["otherAttribute2"] >> same as previous
Then again, these function calls are also easy to grok:

Code: Select all

getNameFromID(id)
getIDfromName(fName)
getQuality[id]
Or, use a single function instead of several, by passing in two arguments, like:

Code: Select all

getAttribute(id, "fName")
getAttribute(id, "quality")
getAttribute(id, "otherAttribute1")
So, it seems like the choices are:

(1) Write a script to populate the array from the data string, or
(2) Make one or more functions to access the data from the string.

I'm leaning towards the latter.

Post Reply