Not using quotes for words

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
andrewferguson
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 184
Joined: Wed Apr 10, 2013 5:09 pm

Not using quotes for words

Post by andrewferguson » Fri Oct 17, 2014 12:56 pm

Hi,

I am trying to use the "word" keyword, but I have run into a problem. I discovered that

Code: Select all

answer the number of words of (quote & "hello world")
would actually answer 1, and not 2 like I expected. I looked in the dictionary and realised that this is intentional. This was a problem for me as I needed quotes to be ignored while processing "words".

I then tried changing my code to use "the number of items of", and setting the itemDelimiter to " ". However this presented another problem as

Code: Select all

answer the number of items of "LiveCode         Forums"
answers 10, and not 2 like I need. Additionally, I have run into problems with using the itemDelimiter as space and two words not separated by a space, but instead seperated by a line break.

So, I was wondering if there was a way to "disable" the quotes part of the "word" keyword, or maybe have more than one itemDelimiter at the same time?

Or is there another way I can do this?

Andrew

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Not using quotes for words

Post by Thierry » Fri Oct 17, 2014 1:18 pm

I was wondering if there was a way to "disable" the quotes part of the "word" keyword, or maybe have more than one itemDelimiter at the same time?
Or is there another way I can do this?
Andrew,

without the exact context, it's a bit hard to know the *best* to do for you,
but what about copying your original text and doing a set of replace to drop all your disturbing chars?
i,e:

Code: Select all

replace quote with space in yourText
replace return with space in yourText
..
HTH,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10058
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Not using quotes for words

Post by FourthWorld » Fri Oct 17, 2014 3:19 pm

Although it's still in testing it may be helpful to note that v7.0 introduces the new "trueWord" chunk type, which uses Unicode rules for determining words not only independent of white space but also of punctuation.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Not using quotes for words

Post by [-hh] » Fri Oct 17, 2014 3:52 pm

Hi Andrew and Thierry,

I think what Andrew wants is not a replacement but a correct counting (yes, Andrew?)

Here: the number of chunks that are delimited by whitespace (the regex "\s").
where the number of such chunks = 1+ the number of occurences of contiguous whitespace.

I would try to walk trough the startString s0 (for each char) and add 1 to a counter N for
every char that is not a whitespace char. Then return length(s0)-N+1.

What do you think about this? Not this easy, perhaps use of regex gives a better solution?
Hermann

@Craig: Think about enlargeing your 'itemdelimiter'-feature-request to enable: set itemdelimiter to whitespace?
Then for example a contiguous mix of 4 spaces, 3 tabs and 42 newlines would count as one delimiter.


@ FourthWorld: Saw your post late, after editing mine. Wouldn't it be better to have influence on a set of chars that is delimiting, say "word-breaking" set, than to adjust with several new version after knowing what is currently in (or not in) the 'word-separator' set?
shiftLock happens

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10058
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Not using quotes for words

Post by FourthWorld » Fri Oct 17, 2014 3:59 pm

Another option would be to just remove the quotes when counting, e.g.:

Code: Select all

on mouseUp
  put BetterWordCount(fld 1)
end mouseUp

function BetterWordCount s
   replace quote with space in s
   return the number of words of s
end BetterWordCount
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Not using quotes for words

Post by Thierry » Fri Oct 17, 2014 4:02 pm

[-hh] wrote:Hi Andrew and Thierry,

I think what Andrew wants is not a replacement but a correct counting (yes, Andrew?)
Well, I think so and therefore
I did suggest to erase all the terrorist chars so he can counts true words after that :)
What do you think about this? Not this easy, perhaps use of regex gives a better solution?
Sure, regex will be helpful here..

Regards,

Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Not using quotes for words

Post by [-hh] » Fri Oct 17, 2014 4:33 pm

@FourthWorld: betterWordCount("A" &quote & "B")=2?
shiftLock happens

andrewferguson
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 184
Joined: Wed Apr 10, 2013 5:09 pm

Re: Not using quotes for words

Post by andrewferguson » Fri Oct 17, 2014 4:39 pm

Hi everyone,

Thanks for the replies.
Initially I was against replacing the quotes, but when I thought about it again I realised that using the replace command to replace all the double quotes (") with single quotes (') would allow me to go back to using "the number of words of", and it wouldn't affect the text too much. (For various reasons I cannot replace all the single quotes back to double quotes at the end.)

Andrew

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10058
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Not using quotes for words

Post by FourthWorld » Fri Oct 17, 2014 7:55 pm

[-hh] wrote:@FourthWorld: betterWordCount("A" &quote & "B")=2?
Should two words separated by a quote be considered one word?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm

Re: Not using quotes for words

Post by [-hh] » Fri Oct 17, 2014 8:34 pm

Independent of what I'm thinking about current word separators:
Yes, the string "A"&quote&"B" *is* one word. Following the dictionary for the word definition
Docs wrote:... or if enclosed by quotes.
The impact is here on "enclosed". This means for me:
Everything that is between *a pair* of quotes is one word. Inside the quotes, on a next "level" there my be again several words.

"Incorrect" answers on the number of words are not due to the definition of "word" but due to the definition of "number of" (we already know from items) because the engine puts a closing quote after a string with a single double quote at start of string to avoid an open group at level 1 (its looking for 'pairs' of quotes).
The logic for that is as good or as bad (depending on your point of view) as the current definition of "the number of items".
shiftLock happens

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10058
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Not using quotes for words

Post by FourthWorld » Fri Oct 17, 2014 9:41 pm

[-hh] wrote:Independent of what I'm thinking about current word separators:
Yes, the string "A"&quote&"B" *is* one word. Following the dictionary for the word definition.
True, but Andrew's request was for something different from the Dictionary's definition of the "word" chunk type, something closer to natural language.

Even then, the simple function I provided won't account for everything. For example, "This.And.That" would still be counted as a single word.

Most common indexing methods strip all punctuation and other special characters, which could be done in script but if that level of effort it needed it may be useful to consider getting started with v7 to take advantage of the Unicode support for such things.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Post Reply