Count # of Unique Two-Word Phrases (Strings/Items)

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

theotherbassist
Posts: 115
Joined: Thu Mar 06, 2014 9:29 am
Location: UK

Count # of Unique Two-Word Phrases (Strings/Items)

Post by theotherbassist » Thu Apr 07, 2016 11:40 pm

I'm using the following code to tally the number of times unique words appear in some text.

Code: Select all

on mouseUp
   createwordfrequencyanalysis field "list"
end mouseUp

on createwordfrequencyanalysis pTexts
   get listFrequencies(pTexts) 
   put it into fld "freqlist"
end createwordfrequencyanalysis

function listFrequencies pText
   local tArray, tList, tTotal
   repeat for each trueWord tWord in pText
      add 1 to tArray[tWord] -- add 1 to the word count of each word
   end repeat
   repeat for each key tKey in tArray
      if length(tKey) < 4 then next repeat
      put tKey & comma & tArray[tKey] & cr after tList -- put each unique word and its frequency into tList
   end repeat
   sort tList descending numeric by item 2 of each
   return line 1 to -1 of tList
end listFrequencies
How can I modify this so that I'm tallying the number of unique *two word* phrases? Arrays calculate quickly, and I want to do it that way. I'd rather not use x and y variables with if statements to add a word to the subsequent word to form a string a hundred thousand times.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by dunbarx » Fri Apr 08, 2016 4:43 am

Hi.

Someone more clever than I will undoubtedly come up with a solution along the lines you asked for.

I am the one who touts "a little analysis is worth a million lines of code" But after a little while, I resorted to an ordinary brute-force method. Maybe it will be of some use to you. I randomly sprinkled "my dog has fleas" 2000 times in a block of text of 26000 words. This in a field 3. in a button:

Code: Select all

on mouseUp
   put fld 3 into tText
   put "dog" into firstWord
   put "has" into secondWord
   put 0 into CharsToSkip
   repeat      
      put wordoffset(firstWord,tText,charsToSkip) into foundNumber
      if foundNumber <> 0 then
         put foundNumber & comma after accum
         put sum(accum) into charsToSkip
         put charsToSkip & comma after wordIndexes
      else exit repeat
   end repeat
   
   repeat for each item tItem in wordIndexes
      if word (tItem + 1) of tText = secondWord then put tItem & comma after foundList
   end repeat
   put foundlist into fld "allThewordPairOffsets"
end mouseUp
Took nine seconds to find all the pairs, arranged as a list of the word offsets in the text. You could merely count the instances, of course, slightly easier.

theotherbassist
Posts: 115
Joined: Thu Mar 06, 2014 9:29 am
Location: UK

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by theotherbassist » Fri Apr 08, 2016 9:34 am

That's similar to what I had been thinking I'd do if I couldn't figure out some way to do it with an array. I appreciate the code! My reservation about going the "brute force" method is that I want to find the freq of every single word pair in the field. When I do it for individual truewords, it takes mere milliseconds.

I'm basically looking to display the most common keywords after I gather RSS news feed data from up to 50 feeds that average about 30 articles each at any given time. I've got all the XML legwork done, and the feeds import fine. The single-word keyword method is impressively quick, considering the massive amount of text that I throw into the field. Ultimately, the XML node reading takes a couple seconds per feed already, so I'm looking to cut down on any lengthy analytics so the whole app doesn't just become a waiting game.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by dunbarx » Fri Apr 08, 2016 2:16 pm

Hi.

Check out a post I made in the "Feature Request" area, "Custom Chunk"

Craig

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by dunbarx » Fri Apr 08, 2016 5:34 pm

I am still in v. 6.7, so if you have read the thread in the "Feature Requests" pane, the newest itemDel is your friend. It can be the "custom chunk".

I am not sure where the itemDelimiter is no longer restricted to single characters, but rather to any string (I think this is right), so if you set the itemDel to two words and use the itemOffset, the code will become much more compact, and you can run the array counter directly.

Craig

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by [-hh] » Fri Apr 08, 2016 5:56 pm

... was wrong, sorry ...
Last edited by [-hh] on Mon Apr 11, 2016 9:12 pm, edited 3 times in total.
shiftLock happens

theotherbassist
Posts: 115
Joined: Thu Mar 06, 2014 9:29 am
Location: UK

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by theotherbassist » Fri Apr 08, 2016 6:25 pm

Wow, this thread blew up all at once. I actually engineered a solution in the meantime involving functions that place ordered truewords into another variable and THEN into the array. I've got functions now that do this for up to four-word combinations. I'm on my phone now, but I'll post code soon. Good posts guys. Appreciate it.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9823
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by FourthWorld » Fri Apr 08, 2016 7:35 pm

-hh wrote:kernel[0]: process LiveCode-Indy[846] thread 82336 caught burning CPU! It used more than 50% CPU (Actual recent usage: 91%) over 180 seconds. thread lifetime cpu usage 90.046699 seconds, (89.323192 user, 0.723507 system) ledger info: balance: 90005959970 credit: 90010116731 debit: 4156761 limit: 90000000000 (50%) period: 180000000000 time since last refill (ns): 97917706200
ReportCrash[857]: Invoking spindump for pid=852 thread=83578 percent_cpu=73 duration=124 because of excessive cpu utilization
Did you file a bug report on that? I'm sure the team would want to fix that if they have the crash info and the sample code.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by [-hh] » Fri Apr 08, 2016 9:31 pm

... There is a bug, but my logic was wrong, sorry ...
Last edited by [-hh] on Mon Apr 11, 2016 9:11 pm, edited 1 time in total.
shiftLock happens

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by Mark » Sat Apr 09, 2016 8:41 am

I made a text analysis tool, a long time ago. Here's how I count the words:

Code: Select all

repeat for each word myWord in myText
  if myArray[myWord] is empty then
    put 1 into myArray[myWord]
  else
    add 1 to myArray[myWord]
  end if
end repeat
Before starting the count, you might want to remove all non-alphanumeric characters except for white space. You could also compare this with the other solutions give in the above and see which is faster.

Kind regards,

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by [-hh] » Sat Apr 09, 2016 12:55 pm

... was wrong, sorry ...
Last edited by [-hh] on Mon Apr 11, 2016 9:10 pm, edited 1 time in total.
shiftLock happens

TerryL
Posts: 78
Joined: Sat Nov 23, 2013 8:57 pm

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by TerryL » Sat Apr 09, 2016 6:36 pm

Filter would be another fast method, especially to display the lines that contained the search string, but it wouldn't accurately count occurrences if the search string appeared more than once/line. For that first filter then use the offset() routine to count occurrences with the smaller subset of lines. The new itemDel string and number(items) seem best for > v6. Terry

Code: Select all

on mouseUp  --keep lines containing search string
   put fld "A" into tTemp
   filter tTemp with ("*"& "my dog" &"*")  --lines containing
   --filter tTemp with ("my dog" &"*")  --lines beginning with
   --filter tTemp with ("my dog")  --lines only of
   put number(lines in tTemp) &&"lines" &cr& tTemp
end mouseUp
Beginner Lab (LiveCode tutorial) and StarterKit (my public stacks)
https://tlittle72.neocities.org/info.html

Mark
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 5150
Joined: Thu Feb 23, 2006 9:24 pm
Contact:

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by Mark » Sat Apr 09, 2016 10:23 pm

hh,

Why do you use 3 repeat loops and even a nested repeat loop? That seems superfluous to me.

Mark
The biggest LiveCode group on Facebook: https://www.facebook.com/groups/livecode.developers
The book "Programming LiveCode for the Real Beginner"! Get it here! http://tinyurl.com/book-livecode

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by dunbarx » Sat Apr 09, 2016 11:17 pm

If all the OP wanted to do was count two-word phrases, and not find them in their offsets as well, Then why not just(in v.7 and above):

Code: Select all

set the itemDel to "has fleas"
answer the number of items of fld "yourField" - 1
Rereading the very first post, this is all that was asked for. The new ItemDel can do much, freed from a single char.

Craig

[-hh]
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 2262
Joined: Thu Feb 28, 2013 11:52 pm
Location: Göttingen, DE

Re: Count # of Unique Two-Word Phrases (Strings/Items)

Post by [-hh] » Sat Apr 09, 2016 11:43 pm

... was wrong, sorry ...
Last edited by [-hh] on Mon Apr 11, 2016 9:09 pm, edited 1 time in total.
shiftLock happens

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”