Stemming algorithms in LC?

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
grimaldiface
Posts: 43
Joined: Tue Apr 08, 2008 9:56 pm

Stemming algorithms in LC?

Post by grimaldiface » Thu Oct 24, 2013 9:18 pm

Has anyone written any stemming algorithms in lc? (example, porter stemmer). I tried translating the porter algorithm a while back, but it didn't work quite right and I gave up. I eventually started just piping text through an external perl script to stem, but it would make distribution of my program a lot simpler if I could just include the stemmer directly in my stack.

I found this thread from a while back: http://mail.on-rev.com/forums/viewtopic ... a3fb#p1536 . The links no longer work and nothing else came up in my searches.

Any help would be much appreciated! Thanks!

-Phil

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10354
Joined: Wed May 06, 2009 2:28 pm

Re: Stemming algorithms in LC?

Post by dunbarx » Thu Oct 24, 2013 9:24 pm

Hi.

There might be something already written, but this is a simple thing to do in LC./

That is, if I understand what you want, to take a list such as:

stem
stemmed
stemming
distemper

and extract the common string among them, that is, "stem"

Craig Newman

EDIT.
Cavalier always. Simple, but may take a bit of thinking and effort.

grimaldiface
Posts: 43
Joined: Tue Apr 08, 2008 9:56 pm

Re: Stemming algorithms in LC?

Post by grimaldiface » Thu Oct 24, 2013 9:32 pm

Yes, stemmers extract the stem of a word. More info here: http://tartarus.org/martin/PorterStemmer/

And a description of the algorithm here: http://tartarus.org/martin/PorterStemmer/def.txt

The most basic algorithm is the porter. I tried doing it a while ago. It was pretty good, but my output didn't always produce the correct result. I honestly found it quite difficult, and don't feel like doing it again if someone else has already gone through the trouble

Simon
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 3901
Joined: Sat Mar 24, 2007 2:54 am

Re: Stemming algorithms in LC?

Post by Simon » Thu Oct 24, 2013 10:49 pm

Hi Phil,
The stack in that post was made by Eric Chatonet who has since passed away.
I have been able to find traces of communications about it but no actual stack. The last I find is from Richard Gaskin (Fourthworld) in 2012. In it he says he contacted Eric's estate asking for an MIT licence version but the thread ends there. He also mentions a second stack made by someone called "Andrew", the stack apparently is English only (Eric's was about 5 languages). I'm not clear but Andrews maybe Porter2.

Luckily Richard show up here often so he may chime in.

Simon
Edit: Richard is one of this forums moderators, so you can just click fourthworld up at the top and email him.
I used to be a newbie but then I learned how to spell teh correctly and now I'm a noob!

paul_gr
Posts: 319
Joined: Fri Dec 08, 2006 7:38 pm

Re: Stemming algorithms in LC?

Post by paul_gr » Thu Oct 24, 2013 11:43 pm

Found this 2005 PorterStemmer stack in an old archive.
Is this the one you are looking for?

Paul
Attachments
PorterAlgorithm.zip
(161.14 KiB) Downloaded 241 times

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 10058
Joined: Sat Apr 08, 2006 7:05 am
Contact:

Re: Stemming algorithms in LC?

Post by FourthWorld » Thu Oct 24, 2013 11:58 pm

Good find, Paul. Where did you turn that up? I had to write Ken for a copy last year when I was exploring this.

Ken's is pretty good, but it's Porter1 rather than the newer Porter2. And unlike Eric's, it doesn't have Porter's Romance language stemmers, just US English.

If any of you have time to bring it up to Porter2, and/or add the Romance stemmers, please post it.

Porter1 isn't bad as-is; Porter himself notes that it only improves about 5% of words, which is largely why he's put so little effort into it since.

Anyone know of any other stemmers in the xTalk world, such as Lovins or Paice/Husk? In my circumstance speed is more important than reduced index set size (up to a point, of course <g>), and I wonder if Lovins' is faster - Ken's Porter implementation has few opportunities for optimization, and offhand I didn't see any that could speed it up as much as I need for frequently-added data.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

paul_gr
Posts: 319
Joined: Fri Dec 08, 2006 7:38 pm

Re: Stemming algorithms in LC?

Post by paul_gr » Fri Oct 25, 2013 2:03 am

FourthWorld wrote:Good find, Paul. Where did you turn that up? I had to write Ken for a copy last year when I was exploring this.
Found it on an old drive gathering dust; unfortunately no links.

Paul

grimaldiface
Posts: 43
Joined: Tue Apr 08, 2008 9:56 pm

Re: Stemming algorithms in LC?

Post by grimaldiface » Fri Oct 25, 2013 3:08 am

Awesome! Thank you Paul!

Post Reply