Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
rodneyt
Posts: 128
Joined: Wed Oct 17, 2018 7:32 am

Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by rodneyt » Sat Jul 24, 2021 3:18 am

Hi everyone,

I'm wondering if knowledgable people here are aware of any good text processing libraries for Livecode.

The first problem I'm interested in is processing address information - where a user may have copied name, address and URL information, and (as reliably as possible) breaking this up into constituent elements.
Examples include recognising cities, countries, phone numbers, rules for breaking name strings etc. I'd also be interested in ability to identify and extract URIs from a text (which might be in html format or plain text). Website URLs, email addresses, twitter handles - that sort of thing.

A lot of this string pattern matching, and I can think of lots of ways of doing this, but it occurs to me it's a pretty standard problem, so it's likely there is an existing solution.

Perhaps there is a more general text processing library that allows one to specify a set of rules and actions (e.g. processing a set of rules and building up results into a property array).

I can think of ways of doing all of this, but before I start rolling my own solution I thought it worth checking.

~ Rodney

MaxV
Posts: 1579
Joined: Tue May 28, 2013 2:20 pm
Location: Italy
Contact:

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by MaxV » Thu Aug 12, 2021 9:54 am

Hello,
this code extract all email adresses from a field:

Code: Select all

on MouseUp
   put field 1 into testo
   repeat forever
      if  matchText(testo, "((\w|\.)+@(\w|\.)+)" , trovato) then 
         put trovato & return after listaEmail
         put matchChunk(testo, "((\w|\.)+@(\w|\.)+)" , inizio, fine)
         put char fine to -1 of testo into testo
      else 
         exit repeat
      end if
   end repeat
   put ListaEmail
end MouseUp
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9249
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by richmond62 » Thu Aug 12, 2021 10:37 am

I don't think you need any libraries, after all:

1. email addresses always have an ampersand (@) in them.

2. URLs always have "www." in them.

3. Telephone numbers usually contain multiple digit numbers.

4. Addresses almost always contain "street"/"avenue"/"boulevard"/"square"/"plaza"/"place"
or their abbreviations.

So . . . if you have, say, comma-delimited text strings containing these things in random order
running each line through a SWITCH statement and then reordering those items in a list field should not be
unduly difficult.

stam
Posts: 2599
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by stam » Thu Aug 12, 2021 2:06 pm

richmond62 wrote:
Thu Aug 12, 2021 10:37 am
I don't think you need any libraries, after all:

1. email addresses always have an ampersand (@) in them.

2. URLs always have "www." in them.

3. Telephone numbers usually contain multiple digit numbers.

4. Addresses almost always contain "street"/"avenue"/"boulevard"/"square"/"plaza"/"place"
or their abbreviations.

So . . . if you have, say, comma-delimited text strings containing these things in random order
running each line through a SWITCH statement and then reordering those items in a list field should not be
unduly difficult.
erm... ampersand = "&", not "@" ;)

Re: emails - all emails contain the '@' but not all '@' signify an email.
You probably not only want to detect the "@" but also assess the validity of the email format (for example stam@gmail is not a valid email - or sometimes people will address someone with an @ handle, for example @Richmond - not a valid email ;)). I've 'borrowed' the algorithm generously provided with the liveCloud starter solutions which works well:

Code: Select all

function isValidEmailFormat pEmail
    # PURPOSE : returns boolean describing valilidty of  email provided
    return matchText(pEmail,"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$")
end isValidEmailFormat
URLs do not always have 'www' in them. The venerable mothership's URL is livecode.com.

And addresses can vary significantly and not include any of the keywords you mention, so that's not reliable either (for example my street address only consists of the name of a hill in London with no other qualifiers).
Not so straightforward once you dig into the detail...

And besides, i think the OP was asking if there was a ready made solution or should he roll his own...

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9249
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by richmond62 » Thu Aug 12, 2021 3:10 pm

erm... ampersand = "&", not "@"
Erm, Yes: the 'at' sign; at least in Bulgarian it has
a name: кломба.

stam
Posts: 2599
Joined: Sun Jun 04, 2006 9:39 pm
Location: London, UK

Re: Processing name, address, phone & URLs - text (pattern) manipulation libraries for Livecode?

Post by stam » Thu Aug 12, 2021 4:25 pm

That's all hungarian to me...

there's a world for it in Greek as well: Παπάκι, which means duckling - don't ask me why it's the name of the 'at' sign...

Post Reply

Return to “Talking LiveCode”