Page 1 of 1

Creating hunspell library?

Posted: Sat Jan 02, 2016 7:12 pm
by trevordevore
Do we have everything in place that we would need to wrap hunspell up as a library? If so this might be an interesting project for the community to work on. I have a hunspell external that Monte created for me a while ago which I use in my projects. It has a wrapper which makes it easy to use the flaggedRanges to mark words as misspelled. If someone could set up an initial project that imports hunspell and has an example or two I could work on fleshing out the API.

Re: Creating hunspell library?

Posted: Sun Jan 03, 2016 12:42 am
by monte
Hmm... you should be able to wrap the C API in hunspell.h but I'm not sure how we handle char *** for suggestions etc? Perhaps ZStringUTF8Array (List?) would need to be implemented first.

Re: Creating hunspell library?

Posted: Mon Jan 04, 2016 3:44 pm
by trevordevore
Thanks for looking into that Monte. If we do need the API updated in order to make it happen then it looks like it will have to wait a while unless a community member is willing to update the engine.

My opinion, (for the time being at least), is that it is best to wait until the builder syntax is a little more polished before venturing down the road of creating an extension or widget for the general public to use.

Re: Creating hunspell library?

Posted: Mon Jan 04, 2016 3:46 pm
by peter-b
I think you may be able to use some low-level libfoundation functions to do pointer arithmetic, so as to handle the char *** unpacking. I've got some similarly horrible code in my poll(2) wrapper in undergrowth (I haven't looked at or touched that code for months though).

Re: Creating hunspell library?

Posted: Mon Jan 04, 2016 10:16 pm
by monte
I've looked at your struct packing and unpacking in undergrowth and ran away screaming... In the end if we need to do stuff like that it is much simpler just to write an external. Which luckily we already have for hunspell ;-)

Re: Creating hunspell library?

Posted: Tue Jan 05, 2016 8:35 pm
by LCMark
Looking at the Hunspell API, the crucial piece currently missing from engine APIs is the ability to unpack / pack a pointer to/from a native type. If we had that then wrapping Hunspell is actually quite trivial (certainly as easy as writing bindings for it in other high-level languages, and you would avoid writing any C).

Here is a simple (not running / not finished) example of a way to bind to it assuming the 'notyetpossible' module existed:

Code: Select all

library module org.runrev.hunspell

type Hunhandle as optional Pointer
type Hunslist as optional Pointer

foreign handler Hunspell_create(in pAffPath as UTF8CString, in pDPath as UTF8CString) returns Hunhandle
foreign handler Hunspell_create_key(in pAffPath as UTF8CString, in pDPath as UTF8CString, in pKey as UTF8CString) returns Hunhandle
foreign handler Hunspell_destroy(in pHunspell as Hunhandle)
foreign handler Hunspell_spell(in pHunspell as Hunhandle, in pWord as UTF8CString) returns CInt
foreign handler Hunspell_get_dic_encoding(in pHunspell as Hunhandle) returns UTF8CString
foreign handler Hunspell_suggest(in pHunspell as Hunhandle, out rSLst as Hunslist, in pWord as UTF8CString) returns CInt
foreign handler Hunspell_analyze(in pHunspell as Hunhandle, out rSLst as Hunslist, in pWord as UTF8CString) returns CInt
foreign handler Hunspell_stem(in pHunspell as Hunhandle, out rSLst as Hunslist, in pWord as UTF8CString) returns CInt
foreign handler Hunspell_stem2(in pHunspell as Hunhandle, out rSLst as Hunslist, in pDesc as Pointer, in pDescCount as CInt) returns CInt
foreign handler Hunspell_generate(in pHunspell as Hunhandle, out rSLst as Hunslist, in pWord as UTF8CString, in pWord2 as UTF8CString) returns CInt
foreign handler Hunspell_generate2(in pHunspell as Hunhandle, out rSLst as Hunslist, in pWord as UTF8CString, in pDesc as Hunslist, in pDescCount as CInt) returns CInt
foreign handler Hunspell_free_list(in pHunspell as HUnhandle, inout rSLst as Hunslist, in pSlstCount as CInt) returns nothing

--------

private variable sHandle as optional Hunhandle

public handler hunspellInitialize(in pAffPath as String, in pDPath as String) returns nothing
	if sHandle is not nothing then
		throw "Hunspell already initialised"
	end if
	put Hunspell_create(pAffPath, pDPath) into sHandle
end handler

public handler hunspellFinalize() returns nothing
	if sHandle is nothing then
		return
	end if
	Hunspell_destroy(sHandle)
	put nothing into sHandle
end handler

public handler hunspellSpell(in pWord as String)
	__hunspellEnsure()
	return Hunspell_spell(sHandle, pWord) is not 0
end handler

public handler hunspellSuggest(in pWord as String) returns List
	__hunspellEnsure()
	
	variable tSuggestions as optional Hunslist
	variable tSuggestionCount as CInt
	put Hunspell_suggest(sHandle, tSuggestions, pWord) into tSuggestionCount

	variable tList as List
	put notyetpossible.UnpackPointerAsArrayOfString(tSuggestions, tSuggestionCount) into tList

	Hunspell_free_list(sHandle, tSuggestions, tSuggestionCount)

	return tList
end handler

end module
So, we do need to do a little more work to make this kind of thing possible, but not perhaps as much as it appears (at first sight at least - famous last words!).

Re: Creating hunspell library?

Posted: Tue Jan 05, 2016 9:50 pm
by monte
Would be neater if we could just add an optional List after the foreign type in the foreign handler declaration and it would automatically pack and unpack for us.

Code: Select all

foreign handler Hunspell_suggest(in pHunspell as Hunhandle, out rSLst as UTF8CString List, in pWord as UTF8CString) returns CInt

Re: Creating hunspell library?

Posted: Tue Jan 05, 2016 10:26 pm
by LCMark
@monte: The problem with that is that this doesn't explain how many elements are in the returned list. C APIs tend to return the number of elements in a native array as either the return value, or as another out parameter (i.e. pointer to slot) - this would need to be encoded in the declaration somehow. Perhaps something like:

Code: Select all

-- The Hunspell API as it is
foreign handler Hunspell_suggest(in pHunspell as Hunhandle, out rSLst as UTF8CString[result], in pWord as UTF8CString) returns CInt
-- A modified version which returns the number of elements as an out parameter
foreign handler Hunspell_suggest(in pHunspell as Hunhandle, out rSLst as UTF8CString[rSLstCount], out rSLstCount as CInt, in pWord as UTF8CString) returns nothing
The idea here is that the '[]' annotation would indicate it was a 'native' C array of the given type - which would bridge to a List. Indeed, in this case, it would be nice to be able to indicate that the rSLstCount parameter was 'silent' in the LCB binding.

Of course, things get more complicated when you have to start considering exceptions / error return. Imagine an API which returns true if it succeeds, or false if it fails - here if 'false' is returned the out parameters (which map to ptr-to-type) are untouched:

Code: Select all

foreign handler GetMyStrings(out rStrings as UTF8CString[rStringCount], out rStringCount as CInt) returns CBool
In this case there needs to be a way to express the relationship between a return value of false and an exception to ensure that out parameters which aren't well defined are not touched.

I've actually been pondering whether the foreign handler declaration needs to be multi-line to allow greater richness in specifying 'safe' bindings - not just for array counts, but also ownership / lifetime annotations.

Of course the advantage of finding a way to specify foreign handler bindings in a high-level fashion (without having to use Pointer and friends) is that it is 'safer' in the sense that greater type and range checking can be done at runtime; however, in lieu of that, we can get a fair way by using small 'wrapper' functions along with some Pointer manipulation functions I think.

Re: Creating hunspell library?

Posted: Wed Jan 06, 2016 12:56 am
by monte
Ah, yes, sorry, too early in January for me to be thinking straight...

By silent in the binding do you mean there's no need to include it in the parameter list because it's all just handled for you? Not sure how common it would be but how would you handle the array size coming from an extra parameter? Maybe:

Code: Select all

foreign handler GetMyStrings(out rStrings as UTF8CString[in pStringCount as CInt])

variable tStrings as Array
variable tSize as Integer
put 5 into tSize
GetMyStrings(tStrings[tSize])
In cases where out params are untouched they should still be nil though right? I expect you are already checking for nil before trying to set the LCB variable which should leave exception handling up to the calling code.

Multi-line foreign handler declarations would reduce the number of very long lines I suppose and perhaps look a bit like lcidl...