itemDelimiter misbehaving

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

Klaus
Posts: 14196
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: itemDelimiter misbehaving

Post by Klaus » Wed Oct 23, 2019 6:32 pm

I wrote:
sentence 1 of the_above_mentioned_example_text = If I might offer any apology for so exaggerated a fiction as the
the_above_mentioned_example_text:
-----------------------------------------------------------------
If I might offer any apology for so exaggerated a fiction as the
Barnacles and the Circumlocution Office, I would seek it in the
common experience of an Englishman, without presuming to mention the
unimportant fact of my having done that violence to good manners, in the
days of a Russian war, and of a Court of Inquiry at Chelsea. If I might
make so bold as to defend that extravagant conception, Mr Merdle, I
would hint that it originated after the Railroad-share epoch, in the
times of a certain Irish bank, and of one or two other equally
laudable enterprises. If I were to plead anything in mitigation of the
preposterous fancy that a bad design will sometimes claim to be a good
and an expressly religious design, it would be the curious coincidence
that it has been brought to its climax in these pages, in the days of
the public examination of late Directors of a Royal British Bank. But,
I submit myself to suffer judgment to go by default on all these counts,
if need be, and to accept the assurance (on good authority) that nothing
like them was ever known in this land.
-------------------------------------------------------------------
So:

Code: Select all

...
put sentence 1 of the_above_mentioned_example_text 
...
Results in:
If I might offer any apology for so exaggerated a fiction as the

And not:
If I might offer any apology for so exaggerated a fiction as the
Barnacles and the Circumlocution Office, I would seek it in the
common experience of an Englishman, without presuming to mention the
unimportant fact of my having done that violence to good manners, in the
days of a Russian war, and of a Court of Inquiry at Chelsea.

What Richmond needs here!

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10099
Joined: Fri Feb 19, 2010 10:17 am

Re: itemDelimiter misbehaving

Post by richmond62 » Wed Oct 23, 2019 6:39 pm

Results in:
If I might offer any apology for so exaggerated a fiction as the

And not:
That was not a huge problem: as I indicated above, I stripped out EOL chars as well as CR ones
and the text sample ended as one great, long "sentence" broken up with periods,
which was exactly what I required for the next step.

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10099
Joined: Fri Feb 19, 2010 10:17 am

Re: itemDelimiter misbehaving

Post by richmond62 » Wed Oct 23, 2019 6:47 pm

I still don't understand how it makes a case for writing new scripts
Well, let's think:

1. I don't know what sort of texts ICU works with: possibly ICU has not been tested out on stuff written in Indian scripts (which tend not
to show word boundaries, and indulge in Sandhi [ https://en.wikipedia.org/wiki/Sandhi ]).

2. At present I am working something out for my wife who wants to do some terribly complicated
analysis of prefixes and suffixes in Anglo-Saxon (Old English) and compare them with a similar
analysis of Old Church Slavonic (Old Bulgarian).

3. I'm bloody-minded insofar as I know that if I code software to do what I want, even if it does
no better than other software coded by someone else I have half a chance to understand what
is going on so I can, should I wish, modify things as I go along.

4. One of my feet is a bit smaller than the other, so, now I earn a bit more money than I did previously,
I get my shoes made for me by a soutar, rather than buying ones that are either too small for one of my feet,
or too big for the other one.

5. By putting myself through this exercise I might be a better teacher, especially when teaching what we might like to
term LCSP (LiveCode for Specific Purposes).

Oh, and by-the-by re #5, LiveCode lends itself extremely well to the needs of people who want to make some sort of tool
to serve their needs without having to take a 3 years course in computer programming: something that LiveCode (the company)
would do well to take tent of and make a 'thing' of it in their advertising.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10049
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: itemDelimiter misbehaving

Post by FourthWorld » Wed Oct 23, 2019 7:00 pm

Thanks, Klaus. Now I understand the example.

The problem there isn't in parsing natural language, but repairing text formatting unrelated to natural language, specifically hard-returns that have been inserted through mechanical means.

Richmond seems aware of that, and notes that he's already done a replacement of CRs with space to accommodate that.

Once the formatting is repaired, all other tasks with parsing natural language do appear well supported by LC's ICU-based chunk expressions.

And it's a much simpler and faster approach: it's easier to replace one character (CR) than many (punctuation and other patters defining sentence boundaries).
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Klaus
Posts: 14196
Joined: Sat Apr 08, 2006 8:41 am
Contact:

Re: itemDelimiter misbehaving

Post by Klaus » Wed Oct 23, 2019 7:08 pm

Richard,

we already have been there! 8)

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10049
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: itemDelimiter misbehaving

Post by FourthWorld » Wed Oct 23, 2019 7:09 pm

richmond62 wrote:
Wed Oct 23, 2019 6:47 pm
Oh, and by-the-by re #5, LiveCode lends itself extremely well to the needs of people who want to make some sort of tool
to serve their needs without having to take a 3 years course in computer programming: something that LiveCode (the company)
would do well to take tent of and make a 'thing' of it in their advertising.
Ever visit livecode.com, or read their marketing materials? The ease-of-learning and lower cognitive load are central themes throughout their external communications.

Indeed, the front page of the site leads with:
Develop Apps Yourself.
We’ve developed LiveCode so you can develop software. Yourself.
...with details supporting that theme provided throughout product description pages.

They've even made more than a few references to this study of cognitive load in learning programming, in which LC was seen as measurably favorable over even more widely-used alternatives:
https://www.researchgate.net/publicatio ... evelopment
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10049
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: itemDelimiter misbehaving

Post by FourthWorld » Wed Oct 23, 2019 7:09 pm

Klaus wrote:
Wed Oct 23, 2019 7:08 pm
Richard,

we already have been there! 8)
Then it would seem we're all on the same page.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10099
Joined: Fri Feb 19, 2010 10:17 am

Re: itemDelimiter misbehaving

Post by richmond62 » Wed Oct 23, 2019 7:41 pm

Then it would seem we're all on the same page.
Possibly.

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: itemDelimiter misbehaving

Post by [-hh] » Wed Oct 23, 2019 11:34 pm

@Richmond.

You could try the following.
  • Use the linedelimiter instead of itemdelimiter. Then you can use the itemdelimiter for substrings of your lines.
  • Since LC 7 delimiters can be multiple chars.
  • To use MULTIPLE LINEDELIMITERS (each of them can be multiple chars) use the following technique.
    1. Choose a uniqueDelimiter that replaces all your multiple delimiters, e.g.
      the comma delimited list ".,?,!,:" as multiple delimiters and "••" as unique delimiter .
    2. Use the attached helper function multiRegex() to convert to the unique delimiter in your inputText.
      Use "comma" for comma , "quote" for quote, "cr" for return in the multi delimiter lists.
  • To use MULTIPLE ITEMDELIMITERS (each of them can be multiple chars) use the same function/technique.
  • Work on the lines (delimited by uniqueLineDelimiter) and their items (delimited by uniqueItemDelimiter).
Example

Input text:

Code: Select all

If I might offer any meaningful apology for so exaggerated a fiction as the
Barnacles and the Circumlocution Office I would  seek it hopefully in the
commonful  experience of an Englishman, without presuming to be mentionful: the
unimportant fact of my having done grateful that violence to good manners,
in the days of a Russian war, and of a Court of Inquiry at Chelsea hopefully.
The following takes LINES as substrings ending with an item of ".,?,!,:", followed by one or more of space, tab, return, newline, vertical tab, formfeed.
And it takes ITEMS as substrings of that lines ending with one of "ful,fully", followed by one or more of space, tab, return, newline, vertical tab, formfeed.

The inputText will be converted to (=tText2 of 'mouseUp' below):

Code: Select all

If I might offer any meaning∆_apology for so exaggerated a fiction as the
Barnacles and the Circumlocution Office I would  seek it hope∆_in the
common∆_experience of an Englishman, without presuming to be mention∆_••the
unimportant fact of my having done grate∆_that violence to good manners,
in the days of a Russian war, and of a Court of Inquiry at Chelsea hope∆_••
Now using "••" as lineDelimiter and "∆_" as itemDelimiter the expected output for
word -1 of item 1 of line 2 of tText2 is "grate":

Code: Select all

on mouseUp
  put ".,?,!,:" into multiLineD; put "••" into uniLineD
  put "ful,fully" into multiItemD; put "∆_" into uniItemD
  put multiRegex(fld "IN",multiLineD,uniLineD) into tText
  set lineDelimiter to uniLineD
  repeat for each line L in tText
    put multiRegex(L,multiItemD,uniItemD)&uniLineD after tText2
  end repeat
  set itemDelimiter to uniItemD
  -- use lines of tText2 (delimited by uniLineD) and
  -- items in that lines (delimited by uniItemD)
  put word -1 of item 1 of line 2 of tText2 into fld "out"
end mouseUp

#-- The regex helper function:
function multiRegex pText,multiDel,uniqueDel
  put ("cr" is among the items of multiDel) into hasCR
  replace comma with numToChar(0) in multiDel
  repeat for each char c in "\^$.?|*+()[{"
    replace c with "\"&c in multiDel
  end repeat
  replace "quote" with quote in multiDel
  replace "comma" with comma in multiDel
  set itemDel to numToChar(0)
  repeat for each item I in multiDel
    put "|"&I after s0
  end repeat
  put "(" & char 2 to -1 of s0 & ")(\s+|.*$)" into rgx
  if hasCR then
    replace "|cr" with empty in rgx
    put "(" & rgx & ")|(\r\n|\r|\n)" into rgx
  end if
  return replaceText(pText,rgx,uniqueDel)
end multiRegex
shiftLock happens

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 10099
Joined: Fri Feb 19, 2010 10:17 am

Re: itemDelimiter misbehaving

Post by richmond62 » Thu Oct 24, 2019 7:32 am

jacque wrote:
Wed Oct 23, 2019 5:13 pm
Handy hints:

You can use "begins with" and "ends with"
Oh, the pain, the pain . . .

You may have shortened my life by several years!

That's just marvellous. :D

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: itemDelimiter misbehaving

Post by Thierry » Thu Oct 24, 2019 1:43 pm

Hi Hermann,

I find interesting your approach and this pushes me
doing the same exercise but using my sunnYrex library.
Here is what I've done in 15 minutes:


Screenshot 2019-10-24 at 14.38.29.png

and of course, the result of getword(..) is "grate"

Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: itemDelimiter misbehaving

Post by [-hh] » Thu Oct 24, 2019 7:25 pm

Hi Thierry,

if everybody would buy and use sunnYrex then it would be much easier to give solutions to such "dreams" as multiDelimiters.
And some people wouldn't be afraid of regex any more...
shiftLock happens

Post Reply