Get Number of Words Within Quotation Marks

LiveCode is the premier environment for creating multi-platform solutions for all major operating systems - Windows, Mac OS X, Linux, the Web, Server environments and Mobile platforms. Brand new to LiveCode? Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

tonymac
Posts: 23
Joined: Thu Jan 05, 2012 9:17 pm

Get Number of Words Within Quotation Marks

Post by tonymac » Thu Oct 27, 2016 7:46 pm

I appeal to the forum experience and intellect...

Task: I need to download a .txt file from my website, put it in a variable, and then show the text "one WORD at a time". The text has hundreds and hundreds of quotes with quotation marks in it. Livecode treats all text between quotation marks as a single word. This does not work for me.

If Livecode comes across this text in a variable, "This is a quoted comment", it sees all the words between the quotes as a single word. I need to figure out how to get Livecode to see this as five separate words ("This is a quoted comment"). While I can see in some circumstances this behavior might be useful seeing it as a single word, if you type the text above into a Microsoft Word document, it shows there are five words which I think should be the default.

While I am still working on it, I have not had success. Tried treating the text in the variable as items and a space as the delimiter but that did not work.

Appreciate any thoughts or comments on what I could try.

shaosean
Posts: 906
Joined: Thu Nov 04, 2010 7:53 am

Re: Get Number of Words Within Quotation Marks

Post by shaosean » Thu Oct 27, 2016 7:50 pm

Look at the keyword "token" (or maybe "tokens")..
Grab the quoted string as a word and then go over each token of that word..

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Get Number of Words Within Quotation Marks

Post by FourthWorld » Thu Oct 27, 2016 7:53 pm

See the trueWord chunk type in the Dictionary.

Token is useful for parsing LiveCode expressions, but will likely return a much higher number of elements than there are actual words.

TrueWord takes advantage of IBM's natural language algorithms in the standard Unicode libraries to provide a good work count that takes punctuation into account.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9663
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Get Number of Words Within Quotation Marks

Post by dunbarx » Thu Oct 27, 2016 8:00 pm

Hi.

The old-fashioned way might be to replace all spaces with some obscure character, set the itemDel, and count the items. You would have to eliminate all empty items, though, resulting from consecutive spaces in the original text.

But as you see, there are more modern methods...

Craig Newman

EDIT, of course, you could use space as a delimiter too, I suppose. I am still in v6x, so there are no trueWords yet. Setting the itemDel to space will count "trueWords".

tonymac
Posts: 23
Joined: Thu Jan 05, 2012 9:17 pm

Re: Get Number of Words Within Quotation Marks

Post by tonymac » Thu Oct 27, 2016 9:24 pm

Well the problem with using the keyword "trueword" to determine the number of words between quotation marks is that it returns the first and last words between the quotes without the quotation mark itself.

Example: "Here is some-text between quotes."

The first Livecode trueword is: Here... not "Here I need the quotation mark to stay with the word... it's not separated by a space anyway so it should go with the word. Also even though "some-text" is not separated by a space, trueword shows this as two words ignoring the dash between some and text.

Text editors use the space character in determining the number of words. I was hoping I wasn't going to have to massage the text before displaying words... one word at a time in a field... separated by spaces.

Continuing to search for a solution...

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9663
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Get Number of Words Within Quotation Marks

Post by dunbarx » Thu Oct 27, 2016 9:42 pm

Hmmm.

Does the old-fashioned way then seem more apt? Quotes would travel with any chars between spaces, since they are just chars, after all. Spaces, again making sure you condense any strings of continuous spaces, don't care what is around them.

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Get Number of Words Within Quotation Marks

Post by FourthWorld » Thu Oct 27, 2016 9:45 pm

Whether punctuation surrounding a word is also part of the word may differ among software, but linguistically it would be seen as separate.

If IBM's Unicode libraries for natural language parsing won't cover your use case, it may be specific enough to expect to write a custom solution.

Do you want the number of words, or the words (or words+punctuation) themselves?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9663
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Get Number of Words Within Quotation Marks

Post by dunbarx » Thu Oct 27, 2016 10:20 pm

Richard.

Old fashioned aside, doesn't massaging the text with space delimiters cut through any possible issue? Punctuation can be explicitly excluded by replacing specific chars with empty, and consecutive spaces can be readily discovered and shortened. What remains is a string containing a custom definition of a "word", which can then be processed as desired.

Craig

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Get Number of Words Within Quotation Marks

Post by [-hh] » Thu Oct 27, 2016 10:42 pm

Hi all.
TMHO, "token" (shaosean's post) is clearly a way to go. But "tokens" are hard to understand and to remember, one uses the keyword rather seldom.

Yet another method could be:

Code: Select all

replace quote with numToChar(1) in str
-- act on words in str
replace numToChar(1) with quote in str
@Craig. Sadly numToChar(42) is not usable without additional efforts because there may be more "*" in str ...
shiftLock happens

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 875
Joined: Wed Nov 22, 2006 3:42 pm

Re: Get Number of Words Within Quotation Marks

Post by Thierry » Fri Oct 28, 2016 8:35 am

Hi tonymac,

Not sure if this will make you happy, but following your idea of space as an item delimiter,
here is a quick and dirty working test:

in fld 1 (with some extra spaces and a return ):

Code: Select all

Example:    "Here is some-text between quotes."
Does this help?
the script:

Code: Select all

on mouseUp
   local T, R
   put replaceText( fld 1 , "[\t\s\r\n]+", space) into T
   set the itemdel to space
   repeat with n=1 to the number of items in T
      put n &": " & item n of T &cr after R
   end repeat
   put R into fld 2
   answer "Find " & n & " words."
end mouseUp
Result:

Code: Select all

1: Example:
2: "Here
3: is
4: some-text
5: between
6: quotes."
7: Does
8: this
9: help?
Thierry
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9386
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Get Number of Words Within Quotation Marks

Post by richmond62 » Fri Oct 28, 2016 11:52 am

Strip out the quotation marks, then Livecode won't have the problem you described.
anti-quote2.png
anti-quote.livecode.zip
Here's the stack
(4.74 KiB) Downloaded 233 times

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Get Number of Words Within Quotation Marks

Post by [-hh] » Fri Oct 28, 2016 1:13 pm

Hi Richmond,
as I understand, the OP wants after the partitioning into words the quotes back to where they were before, for example
"Here I am" should translate to the three 'parts': <"here> and <I> and <am!">.
shiftLock happens

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9663
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Get Number of Words Within Quotation Marks

Post by dunbarx » Fri Oct 28, 2016 2:02 pm

Old fashioned.

Code: Select all

on mouseUp
   get fld 1 --with the text in it
   set the itemDel to space
   repeat with y = the number of chars of it down to 1
      if char y of it = space and char y - 1 of it = space then delete char y of it
   end repeat
   repeat for each item tItem in it
      put titem & return after temp
   end repeat
   answer temp
end mouseUp
Now this may take a bit of time with a large body of text. I bet there is a regex that can winnow out multiple spaces. That still makes it old-fashioned.

There is a hidden issue if a lone quote is surrounded by multiple spaces on both sides, or if two quotes sit together. Just another line or two to fix that.

Craig

Edit. Even with a half meg of text, only a second or two is required to finish.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9837
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Get Number of Words Within Quotation Marks

Post by FourthWorld » Fri Oct 28, 2016 3:40 pm

dunbarx wrote:Richard.

Old fashioned aside, doesn't massaging the text with space delimiters cut through any possible issue?
I don't know. We'll need to hear back from tonymac on my question about what specifically he needs to do. His OP mentions needing the retain the punctuation surrounding words, but as outcomes me mentions only word counts.

If he's just looking for word counts it'll be hard to beat the simplicity and efficiency of leveraging the industry-standard Unicode parsing available to us with trueWords.

If he needs something else, the best solution will depend on what that something else is.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Get Number of Words Within Quotation Marks

Post by AxWald » Fri Oct 28, 2016 9:12 pm

Craig.
dunbarx wrote:I bet there is a regex that can winnow out multiple spaces.
Bah. Brute force for the win!

Code: Select all

function killSpaces MyStr
   repeat until offset("  ", MyStr) = 0
      replace "  " with " " in MyStr
   end repeat
   return MyStr
end killSpaces
"A clockwork orange", converted from epub to text:

Code: Select all

Kill Spaces: Initial size: 319316 Bytes; Reduced by: 5334 Bytes; Milliseconds used: 38
;-)

Have fun!

PS: Found no peace until I tried this word list thingie. Again, poor "A clockwork orange" had to suffer:

Code: Select all

Kill Spaces: Initial size: 319316 Bytes; Reduced by: 5334 Bytes; Milliseconds used: 39
RubbishRem: Initial size: 313982 Bytes; Reduced by: 9245 Bytes; Milliseconds used: 166
ListMake: Initial size: 304737 Bytes; Resulting words: 5412; Milliseconds used: 3060
-----------------------------------------------------------------------------------
Time spent for all of this: 3.265 Seconds
List looks like:

Code: Select all

"Aaaaaaarhgh"
"About
"After
"Ah
"Ah"
"Aha
"Alekth
"All
"Am
...
Stack attached.
Attachments
AWA_WordListMaker.zip
a little demo
(1.84 KiB) Downloaded 230 times
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

Post Reply

Return to “Getting Started with LiveCode - Experienced Developers”