Get Number of Words Within Quotation Marks
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
Get Number of Words Within Quotation Marks
I appeal to the forum experience and intellect...
Task: I need to download a .txt file from my website, put it in a variable, and then show the text "one WORD at a time". The text has hundreds and hundreds of quotes with quotation marks in it. Livecode treats all text between quotation marks as a single word. This does not work for me.
If Livecode comes across this text in a variable, "This is a quoted comment", it sees all the words between the quotes as a single word. I need to figure out how to get Livecode to see this as five separate words ("This is a quoted comment"). While I can see in some circumstances this behavior might be useful seeing it as a single word, if you type the text above into a Microsoft Word document, it shows there are five words which I think should be the default.
While I am still working on it, I have not had success. Tried treating the text in the variable as items and a space as the delimiter but that did not work.
Appreciate any thoughts or comments on what I could try.
Task: I need to download a .txt file from my website, put it in a variable, and then show the text "one WORD at a time". The text has hundreds and hundreds of quotes with quotation marks in it. Livecode treats all text between quotation marks as a single word. This does not work for me.
If Livecode comes across this text in a variable, "This is a quoted comment", it sees all the words between the quotes as a single word. I need to figure out how to get Livecode to see this as five separate words ("This is a quoted comment"). While I can see in some circumstances this behavior might be useful seeing it as a single word, if you type the text above into a Microsoft Word document, it shows there are five words which I think should be the default.
While I am still working on it, I have not had success. Tried treating the text in the variable as items and a space as the delimiter but that did not work.
Appreciate any thoughts or comments on what I could try.
Re: Get Number of Words Within Quotation Marks
Look at the keyword "token" (or maybe "tokens")..
Grab the quoted string as a word and then go over each token of that word..
Grab the quoted string as a word and then go over each token of that word..
-
- VIP Livecode Opensource Backer
- Posts: 9842
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Get Number of Words Within Quotation Marks
See the trueWord chunk type in the Dictionary.
Token is useful for parsing LiveCode expressions, but will likely return a much higher number of elements than there are actual words.
TrueWord takes advantage of IBM's natural language algorithms in the standard Unicode libraries to provide a good work count that takes punctuation into account.
Token is useful for parsing LiveCode expressions, but will likely return a much higher number of elements than there are actual words.
TrueWord takes advantage of IBM's natural language algorithms in the standard Unicode libraries to provide a good work count that takes punctuation into account.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
-
- VIP Livecode Opensource Backer
- Posts: 9669
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: Get Number of Words Within Quotation Marks
Hi.
The old-fashioned way might be to replace all spaces with some obscure character, set the itemDel, and count the items. You would have to eliminate all empty items, though, resulting from consecutive spaces in the original text.
But as you see, there are more modern methods...
Craig Newman
EDIT, of course, you could use space as a delimiter too, I suppose. I am still in v6x, so there are no trueWords yet. Setting the itemDel to space will count "trueWords".
The old-fashioned way might be to replace all spaces with some obscure character, set the itemDel, and count the items. You would have to eliminate all empty items, though, resulting from consecutive spaces in the original text.
But as you see, there are more modern methods...
Craig Newman
EDIT, of course, you could use space as a delimiter too, I suppose. I am still in v6x, so there are no trueWords yet. Setting the itemDel to space will count "trueWords".
Re: Get Number of Words Within Quotation Marks
Well the problem with using the keyword "trueword" to determine the number of words between quotation marks is that it returns the first and last words between the quotes without the quotation mark itself.
Example: "Here is some-text between quotes."
The first Livecode trueword is: Here... not "Here I need the quotation mark to stay with the word... it's not separated by a space anyway so it should go with the word. Also even though "some-text" is not separated by a space, trueword shows this as two words ignoring the dash between some and text.
Text editors use the space character in determining the number of words. I was hoping I wasn't going to have to massage the text before displaying words... one word at a time in a field... separated by spaces.
Continuing to search for a solution...
Example: "Here is some-text between quotes."
The first Livecode trueword is: Here... not "Here I need the quotation mark to stay with the word... it's not separated by a space anyway so it should go with the word. Also even though "some-text" is not separated by a space, trueword shows this as two words ignoring the dash between some and text.
Text editors use the space character in determining the number of words. I was hoping I wasn't going to have to massage the text before displaying words... one word at a time in a field... separated by spaces.
Continuing to search for a solution...
-
- VIP Livecode Opensource Backer
- Posts: 9669
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: Get Number of Words Within Quotation Marks
Hmmm.
Does the old-fashioned way then seem more apt? Quotes would travel with any chars between spaces, since they are just chars, after all. Spaces, again making sure you condense any strings of continuous spaces, don't care what is around them.
Craig
Does the old-fashioned way then seem more apt? Quotes would travel with any chars between spaces, since they are just chars, after all. Spaces, again making sure you condense any strings of continuous spaces, don't care what is around them.
Craig
-
- VIP Livecode Opensource Backer
- Posts: 9842
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Get Number of Words Within Quotation Marks
Whether punctuation surrounding a word is also part of the word may differ among software, but linguistically it would be seen as separate.
If IBM's Unicode libraries for natural language parsing won't cover your use case, it may be specific enough to expect to write a custom solution.
Do you want the number of words, or the words (or words+punctuation) themselves?
If IBM's Unicode libraries for natural language parsing won't cover your use case, it may be specific enough to expect to write a custom solution.
Do you want the number of words, or the words (or words+punctuation) themselves?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
-
- VIP Livecode Opensource Backer
- Posts: 9669
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: Get Number of Words Within Quotation Marks
Richard.
Old fashioned aside, doesn't massaging the text with space delimiters cut through any possible issue? Punctuation can be explicitly excluded by replacing specific chars with empty, and consecutive spaces can be readily discovered and shortened. What remains is a string containing a custom definition of a "word", which can then be processed as desired.
Craig
Old fashioned aside, doesn't massaging the text with space delimiters cut through any possible issue? Punctuation can be explicitly excluded by replacing specific chars with empty, and consecutive spaces can be readily discovered and shortened. What remains is a string containing a custom definition of a "word", which can then be processed as desired.
Craig
-
- VIP Livecode Opensource Backer
- Posts: 2262
- Joined: Thu Feb 28, 2013 11:52 pm
- Location: Göttingen, DE
Re: Get Number of Words Within Quotation Marks
Hi all.
TMHO, "token" (shaosean's post) is clearly a way to go. But "tokens" are hard to understand and to remember, one uses the keyword rather seldom.
Yet another method could be:
@Craig. Sadly numToChar(42) is not usable without additional efforts because there may be more "*" in str ...
TMHO, "token" (shaosean's post) is clearly a way to go. But "tokens" are hard to understand and to remember, one uses the keyword rather seldom.
Yet another method could be:
Code: Select all
replace quote with numToChar(1) in str
-- act on words in str
replace numToChar(1) with quote in str
shiftLock happens
Re: Get Number of Words Within Quotation Marks
Hi tonymac,
Not sure if this will make you happy, but following your idea of space as an item delimiter,
here is a quick and dirty working test:
in fld 1 (with some extra spaces and a return ):
the script:
Result:
Thierry
Not sure if this will make you happy, but following your idea of space as an item delimiter,
here is a quick and dirty working test:
in fld 1 (with some extra spaces and a return ):
Code: Select all
Example: "Here is some-text between quotes."
Does this help?
Code: Select all
on mouseUp
local T, R
put replaceText( fld 1 , "[\t\s\r\n]+", space) into T
set the itemdel to space
repeat with n=1 to the number of items in T
put n &": " & item n of T &cr after R
end repeat
put R into fld 2
answer "Find " & n & " words."
end mouseUp
Code: Select all
1: Example:
2: "Here
3: is
4: some-text
5: between
6: quotes."
7: Does
8: this
9: help?
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
SUNNY-TDZ.COM doesn't belong to me since 2021.
To contact me, use the Private messages. Merci.
!
-
- Livecode Opensource Backer
- Posts: 9388
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Get Number of Words Within Quotation Marks
Strip out the quotation marks, then Livecode won't have the problem you described.
-
- VIP Livecode Opensource Backer
- Posts: 2262
- Joined: Thu Feb 28, 2013 11:52 pm
- Location: Göttingen, DE
Re: Get Number of Words Within Quotation Marks
Hi Richmond,
as I understand, the OP wants after the partitioning into words the quotes back to where they were before, for example
"Here I am" should translate to the three 'parts': <"here> and <I> and <am!">.
as I understand, the OP wants after the partitioning into words the quotes back to where they were before, for example
"Here I am" should translate to the three 'parts': <"here> and <I> and <am!">.
shiftLock happens
-
- VIP Livecode Opensource Backer
- Posts: 9669
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: Get Number of Words Within Quotation Marks
Old fashioned.
Now this may take a bit of time with a large body of text. I bet there is a regex that can winnow out multiple spaces. That still makes it old-fashioned.
There is a hidden issue if a lone quote is surrounded by multiple spaces on both sides, or if two quotes sit together. Just another line or two to fix that.
Craig
Edit. Even with a half meg of text, only a second or two is required to finish.
Code: Select all
on mouseUp
get fld 1 --with the text in it
set the itemDel to space
repeat with y = the number of chars of it down to 1
if char y of it = space and char y - 1 of it = space then delete char y of it
end repeat
repeat for each item tItem in it
put titem & return after temp
end repeat
answer temp
end mouseUp
There is a hidden issue if a lone quote is surrounded by multiple spaces on both sides, or if two quotes sit together. Just another line or two to fix that.
Craig
Edit. Even with a half meg of text, only a second or two is required to finish.
-
- VIP Livecode Opensource Backer
- Posts: 9842
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Get Number of Words Within Quotation Marks
I don't know. We'll need to hear back from tonymac on my question about what specifically he needs to do. His OP mentions needing the retain the punctuation surrounding words, but as outcomes me mentions only word counts.dunbarx wrote:Richard.
Old fashioned aside, doesn't massaging the text with space delimiters cut through any possible issue?
If he's just looking for word counts it'll be hard to beat the simplicity and efficiency of leveraging the industry-standard Unicode parsing available to us with trueWords.
If he needs something else, the best solution will depend on what that something else is.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Re: Get Number of Words Within Quotation Marks
Craig.
"A clockwork orange", converted from epub to text:
;-)
Have fun!
PS: Found no peace until I tried this word list thingie. Again, poor "A clockwork orange" had to suffer:
List looks like:
Stack attached.
Bah. Brute force for the win!dunbarx wrote:I bet there is a regex that can winnow out multiple spaces.
Code: Select all
function killSpaces MyStr
repeat until offset(" ", MyStr) = 0
replace " " with " " in MyStr
end repeat
return MyStr
end killSpaces
Code: Select all
Kill Spaces: Initial size: 319316 Bytes; Reduced by: 5334 Bytes; Milliseconds used: 38
Have fun!
PS: Found no peace until I tried this word list thingie. Again, poor "A clockwork orange" had to suffer:
Code: Select all
Kill Spaces: Initial size: 319316 Bytes; Reduced by: 5334 Bytes; Milliseconds used: 39
RubbishRem: Initial size: 313982 Bytes; Reduced by: 9245 Bytes; Milliseconds used: 166
ListMake: Initial size: 304737 Bytes; Resulting words: 5412; Milliseconds used: 3060
-----------------------------------------------------------------------------------
Time spent for all of this: 3.265 Seconds
Code: Select all
"Aaaaaaarhgh"
"About
"After
"Ah
"Ah"
"Aha
"Alekth
"All
"Am
...
- Attachments
-
- AWA_WordListMaker.zip
- a little demo
- (1.84 KiB) Downloaded 230 times
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!