Hi,
I'm happy you have use for my "finger exercise" :)
Just be sure to check the new version - it's much more clear, and reliable! The first version used a lot of ugly "work-arounds" due to quick& lazy coding ...
A few answers:
lewis wrote:... you were not using 'set the htmlText' at all ...
Yup. I wanted to have this usable for any kind of text. And, to be honest, I'd even have to lookup "htmlText" - I don't use this in my work.
I've seen a quirk in my v 0.0.2 already, "Samuel Bjørk" for instance comes as "Samuel Bjørk". So the output needs to be translated from html entities to standard text. Something for the "toDo" list :)
lewis wrote:I should be able to work out how to do the listing as per your method but in a simpler fashion
The new version stores any found chunks in properly named variables before formatting the output in the field. It's planned to create an array of it, for storage & later use. v 0.0.3 will write a table-like file containing all the data regarding the selected author.
(Actually this script works a lot better with my local epub library, so I'll keep on improving it - after all it's now only the basic structure of the page and the different tags that are different. And with my new version it's quite easy to adapt to this.)
lewis wrote:I noticed though is that your code does not give the numbers [...] nor does it give the publication date
The new version gives the years. And if you want the ordinal numbers, it's a little change in the processGrp function:
Code: Select all
-- Replace this line (line 91 in the "processGrp" function):
get inBeet(myLine,quote & ">","</a>",0) -- get Name
-- with the following:
if char 1 of myLine is a number then -- numbered entry
get 0 & CR & line 2 of (inBeet(myLine,".","<a h",-3)) & line 2 of (inBeet(myLine,quote & ">","</a>",0))
else -- not numbered
get inBeet(myLine,quote & ">","</a>",0)
end if
This uses a quite esoteric side effect of the inBeet() function - it should work at least with 2-digit numbers and small fractional numbers, but I don't really understand it yet, right now ;-)
lewis wrote:I see that you replaced <br/> with 'CR' which I understand is a synonym for 'return'
Should be synonym, right. I use CR because "return" is also used in the function control structure, and it makes it easier to search for "function return lines" when I don't have to mess with "return" as synonym of the line feed constant.
lewis wrote:your function 'processGrp'
Could be a command handler, too. For some strange reason it's a habit of me to write anything using parameters as functions ;-)
Doesn't matter, as I see it both command & function handlers are internally equal, they only differ in the syntax of calling & returning data.
And I make a separate handler of it because it has a different task: The mouseUp handler grabs the data & makes handy portions of it, and the processGrp function just grabs one such portion, and works with it. Much more easy to debug, by the way :)
About the offset stuff & the use of chars:
A strict limitation on chars & offset should compile into very fast assembler code, equal to the use of regular expressions (just not _this_ esotheric), IMHO. This comes at the cost of heavy arithmetic used in the "charsToSkip" parameter and leads to code nearly unreadable & most difficult to maintain (again, just like regexp ;-) ).
So I encapsulated the nasty stuff in a rather simple function for v 0.0.2, and this makes it much more enjoyable.
lewis wrote:I certainly don't expect you to to do any work on the above.
For sure I'll do!
For one, I finally want to be able to filter those of the 23.383 epubs of my town library that are a.) available and b.) of interest for me, without clicking for hours through most poorly made web pages.
And, on the other hand, my "finger exercise" here not only provided fun & satisfaction, it additionally resulted in a fast & robust "inbetween search" function, a benefit for my private library stack. Some of my commercial products are marked already for an implementation. Win - win :)
Have fun!