Offset and wildcards

GregWills · Post by **GregWills** » Sat Aug 03, 2019 6:27 am

Hi All

Perhaps a simple question, but can you use wildcards with an Offset?

I want to search for variations of a word.

For example: offset(charsToFind, stringToSearch)
charsToFind - Good-Smith
stringToSearch - a database of last names

I am searching each line of client data line by line looking for possible Good-Smith combinations, such as; Good, Goods, Smith, Good Smith, Good-Smiths, Goodsmith, etc.

Cheers

Greg

[-hh] · Post by **[-hh]** » Sat Aug 03, 2019 9:52 am

You could use matchChunk or matchText.
Moreover the new filter "where clause" may apply.

dunbarx · Post by **dunbarx** » Sat Aug 03, 2019 2:20 pm

Why not something like (pseudo):

repeat for each line tLline in yourData
if "Good" is in tLine or "Smith" is in tLine then put tLine & return after linesOfInterest

This will find "basic" strings, and any other appended chars will tag along.

Craig

GregWills · Post by **GregWills** » Sun Aug 04, 2019 5:58 am

Thank you for your thoughts. Does this mean that wildcards can’t be used with Offset?

This is the script that I am currently using. It does the job - mostly - but does miss variations of some names. I think the script is clunky and thought there would be a more elegant way of looking for as many variations as possible in a large database.

Code: Select all

put item 3 of tLine2 into tIDSearch -- Last Name of attendee 
      
      repeat for each line tLine in tBigHolder -- Database
         if tIDSearch begins with item 3 of tLine or tIDSearch ends with item 3 of tLine or  \
               item 3 of tLine begins with tIDSearch or item 3 of tLine ends with tIDSearch or  \
               word 2 of item 3 of tLine contains tIDSearch or  \ 
               word 2 of tIDSearch contains item 3 of tLine
         then
            if item 3 of tLine is not empty then               
               put tLine & return after tSum 
            end if
         end if
      end repeat

Is there a better way to do this type of search?

Cheers

Greg

dunbarx · Post by **dunbarx** » Sun Aug 04, 2019 2:37 pm

Why, yes there is.

Use the simple handler I mentioned above. No wildcards needed, because LC's very own "is in" does all the work for you. Assuming your dataSet is in a variable called "yourData":

Code: Select all

on mouseUp
  repeat for each line tLine in yourData
    if "good" is in tLine or "smith" is in tLine then put tLine & return after accum
    end repeat
  answer accum
end mouseUp

Craig

[-hh] · Post by **[-hh]** » Sun Aug 04, 2019 3:28 pm

It is quite unclear what are possible "Good-Smith combinations".

GregWills wrote "Good, Goods, Smith, Good Smith, Good-Smiths, Goodsmith, etc."

Now, for example, which of the following are possible (specify "etc.")?

(*) Blacksmith, Goodman, Gold, Glod, God, Simth, Smitt.

May be, more general, a proximity search?
For X in "Good,Smith,Good Smith,Good-Smith,Goodsmith":
(1) 1 char of X changed
(2) 1 char of X omitted
(3) 1 char inserted into X
(4) Any two adjacent chars of X interchanged

This would include (*) except the first two ...

dunbarx · Post by **dunbarx** » Mon Aug 05, 2019 1:27 am

Hermann.

Unless I misunderstand, the OP is not trying to find a "wildcard" solution as per:

Code: Select all

<wildcard shenanigans> for  "smith" would find "smitx"

I am taking the apparent intent in the original post. trying to make the point that using the 'is in" operator cuts through a great deal of sweat and code. It finds the strings "good" and "smith" however they may be encumbered with conjoined, concatenated, appended or prepended strings. This operator seems, to me at least, to do all the OP wants, without having to jump through any hoops at all.

So, Greg, am I right? Because if so, please test my very short offering and report back. (Watch it, Lagi)

Craig

GregWills · Post by **GregWills** » Tue Aug 06, 2019 12:26 am

Thanks Craig and Hermann for the suggestions. Always great to get other perspectives.

I had tried the “is in” code in the script in an earlier version, and it sort of works. The problem came about when there was a student in the database lastName “Good”. However on the Attendance sheet (that is being imported) their name was “Good-Smith” (who knows why!), but it was the same person - from address and email data.

I wanted to cast the net as wide as I could without getting every record in the database! The operator would then select which record to import. It is surprising how many variations people use on Attendance sheets: extra spaces, hyphenated, not hyphenated, shortened name.

The reason why I asked the question was because I thought there might be a way of using a wildcard in the search. I don’t necessarily need to know what the word is, as I will be selecting the whole line that the match is in.

Perhaps filter will be more suited (now that I better understand the variable/wildcard syntax).

filter tDBSearch with regex pattern tLastName & "*" into tMatchingName

I'll see how that goes.

Cheers

Greg

[-hh] · Post by **[-hh]** » Tue Aug 06, 2019 12:29 pm

You could also try:

Code: Select all

repeat for each line tLine in tBigHolder
  if check(item 3 of tLine,tIDSearch)
  then put tLine & return after tSum
end repeat

With 9.5 this is the same (and may speed up a little bit) by using filter:

Code: Select all

filter tBigHolder where \
      check(item 3 of each,tIDSearch) into tSum

Both use:

Code: Select all

function check x,y
  repeat for each trueword w in x
    if w is among the truewords of y then return true
  end repeat
  return false
end check

With that you find, besides the exact search string, for example also
Good *, * Good, Good-*, *-Good, Good/*, */Good (where * is a trueword)
but not Goodman, Feelgood and Twogoodies
when searching for Good.

dunbarx · Post by **dunbarx** » Tue Aug 06, 2019 5:06 pm

Hi.

So are you saying that the "is in" method is too inclusive?

So when it found this guy "Good" it also found, in another portion of the dataSet, "Good-Smith", and this was redundant?

I don't know how, broadly, any method could determine that these are the same person, and accept only one entry.

Craig

GregWills · Post by **GregWills** » Fri Aug 09, 2019 3:42 am

Sorry for the slow reply, I have been using the program to update the database. It is all working pretty nicely … as I look for glitches.

After a day of data input (needed to be done), I ran a version with your script -hh. Beautiful. It gathers all the entries that are close to the name in the database.

The first version I ran was a bit generous with inclusions, as it searched all the text in the line

, so I got some interesting Last names included. With a slight tweak, it now runs very nicely.

Thank you ALL for your interest and time with this, it is much appreciated.

Cheers

Greg

LiveCode Forums

Offset and wildcards

Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards

Re: Offset and wildcards