revxml

LCMark · Post by **LCMark** » Thu Jul 18, 2013 9:44 am

So my question: is there anything that this could mess up? Concatenating the data elements all together with no delimiters doesn't seem like a useful way to return the data elements, so it's hard for me to imagine a situation in which anyone would be currently relying on this.

Anybody's code that relies on the current functionality? (Although I'm not clear what revXML handlers use that particular libxml function as I think revXML uses it own traversal of the tree to get contents in various ways).

In this case, getting the elided form of all a nodes contents might be useful in specific circumstances - for example if the XML contains some sort of styled-text representation than an elided version of the content will give just the text without the formatting tags.

If doing what you have suggested would be useful, it should probably be exposed as a separate function - one that uses the libxml functions that are available to get the same effect - in general, modifying third-party libraries should be avoided wherever possible (limited to adding functionality that can't be gained by the API they expose - libzip has some modifications to add a progress callback, libxml has a small modification to ignore namespaces and libcairo has some modifications to generate PDF metadata and share image data).

However, in the case you are giving above, its using XML as a database type structure, so really you'd expect to know what elements are there to traverse and do so directly.

In general, we take a policy of 'do no harm' in the sense that functionality is not changed unless it is strictly erroneous or it absolutely has to be changed in order to support a feature and the benefit of the feature far outweighs the potential backwards compatibility issues that might ensue.

mwieder · Post by **mwieder** » Thu Jul 18, 2013 4:42 pm

However, in the case you are giving above, its using XML as a database type structure, so really you'd expect to know what elements are there to traverse and do so directly.

Exactly. And that still works as before.

It's just weird that getting the node contents as a whole slaps all the content together in one string without giving you a chance to separate the individual elements. The getNodeContents function recursively goes through its component elements and concatenates their string values.

My alternative, I guess, is to write my own version of the getNodeContents function and the functions that it calls, effectively bypassing the libxml library. That's probably a cleaner way to implement the change, although it's quite redundant.

I can't help but think it's a bug with the libxml library and I should probably report it there.

LCMark · Post by **LCMark** » Thu Jul 18, 2013 5:13 pm

It's just weird that getting the node contents as a whole slaps all the content together in one string without giving you a chance to separate the individual elements. The getNodeContents function recursively goes through its component elements and concatenates their string values.

Well, as I said there is a use-case for that in something like:

Code: Select all

<styledtext>Foo bar <bold>baz</bold> foo bar</styledtext>

Here getting the contents would return you the text without the formatting.

Having it separate the nodes in some way textually would mean you are no longer getting the data that it is there - indeed the doc description for xmlNodeGetContent is pretty clear about it's function so I doubt the libxml guys would entertain a bug about it... I'd imagine their response would be - if you want that functionality just step through the node's yourself. (If you look at it from the point of view that 'the content of an xml node is all the text node children', then eliding them without delimiter makes sense).

Since there are potentially cases where you might want the elided text, and in others might want delimited text (depending on the structure of the XML document), why not crib and modify the xmlNodeGetContent call taking a second delimiter and then add a second delimiter to the query call? That way both cases are served and the call becomes more flexible as a result.

Edit: Of course I just realized the above was probably precisely your intent - I got a bit confused about things when you talked about changing existing functionality potentially... revXML as it stands doesn't use the NodeGetContent call from libxml - it cribs and uses its own version (exposed as GetElementContent). This also stands as precedent for doing the same for Xpath purposes and augmenting it with a delimiter.

mwieder · Post by **mwieder** » Thu Jul 18, 2013 10:54 pm

Well, there's a BIG difference between revXMLNodeContents and the way the xpath functions are handling node data:

Given "<styledtext>Foo bar <bold>baz</bold> foo bar</styledtext>", revXMLNodeContents will only return "Foo bar" and ignore the rest of the "styledtext" element, i.e., only the first element it encounters. The libxml getNodeContents() function that the xpath functions call will return all the styledtext, but in the form "Foo bar baz foo bar". In this very specific instance that may be what's desired, but note that without the embedded spaces all the text would run together. And that there's no way to get the internal "<bold>" formatting without iterating through all the child nodes, which isn't possible with the xpath functions.

I'm not trying to write my own xpath functions here, just exposing the functionality that's already present in the libxml library.

That said, I have now copied a good chunk of the xpath libxml code into revxml.cpp and modified it to do the right thing, leaving the libxml code intact. I do agree, in spite of the redundancy, that it's a better way to go. For one thing, it allows specification of delimiters from a script.

So the function syntax is now

Code: Select all

revXMLEvaluateXPath xmlID, xPathExpression, [lineDelimiter]
revXMLDataFromXPathQuery xmlID, xPathExpression, [charDelimiter], [lineDelimiter]

where the delimiters all default to newLine.

Update: hmmm... on rereading that, maybe the delimiters should default to nothing and put the burden on the developer?

LCMark · Post by **LCMark** » Wed Aug 07, 2013 2:00 pm

@mwieder: I've merged the xpath changes into 'develop' - although could you check the release note I added (I took it from your xpath-docs branch). I'm not sure it's quite in sync (syntax-wise) with the final versions you sent a pull-request for (the delimiter parameters may have changed slightly?).

mwieder · Post by **mwieder** » Wed Aug 07, 2013 6:54 pm

@runrevmark- the release notes are almost correct. The syntax for revXMLDataFromXPathQuery has now changed to

Code: Select all

revXMLDataFromXPathQuery(pDocID, pXpathExpression [, charDelimiter [, lineDelimiter ] ])

where the optional delimiters for both revXMLDataFromXPathQuery and revXMLEvaluateXPath all default to cr if not explicitly specified.

I also noticed that I never put in syntax or example code for either of the two xpath functions, so I just pushed that now.
And I specified 6.2 as the release version for both, just to get something in there.

toddgeist · Post by **toddgeist** » Mon Sep 09, 2013 5:09 pm

Hi Guys,

Thanks for doing all this work on Xpath, I could really use it. Is this stuff usable at this point? Can I get access to it? Is it going to make it into full release?

( Sorry I backed the KickStarter campaign, but I haven't done much with LiveCode since. So I am not sure how things work in the new Open Source way )

Thanks

Todd

mwieder · Post by **mwieder** » Mon Sep 09, 2013 5:26 pm

Todd- the xpath stuff is in the develop branch of the engine source if you want to build your own engine. If not, it's scheduled for a future release.

toddgeist · Post by **toddgeist** » Mon Sep 09, 2013 6:16 pm

Thanks Mark,

Can I just build revXML or do I need to build the whole engine?

Todd

toddgeist · Post by **toddgeist** » Mon Sep 09, 2013 6:58 pm

The question is moot for me. I can't get anything to build.

Compiling the engine is not my cup of tea. I guess I am waiting until it makes into a release. I hope that is soon.

Thanks for the work on this

Todd

Martin Koob · Post by **Martin Koob** » Mon Sep 09, 2013 11:24 pm

This looks very interesting. I am looking at added searching XML document nodes and thought I would have to iterate through the siblings. This looks much more efficient.

Your example uses a > operator for an example.

With this would you also be able to evaluate strings i.e.

/bookstore/book/title[contains(.,"Harry")]

/bookstore/book[title="Harry Potter"]

Also which release is this projected to be in?

Thanks everyone for all the work on this.

Martin

mwieder · Post by **mwieder** » Tue Sep 10, 2013 3:27 am

@Martin-

Yes, happily both of those syntaxes are supported. The xpath functions are in the develop branch of the engine now, and I've marked them for release 6.2. But that's just me, and the actual release cycles for the official builds are, of course, in the hands of the rev team and not under my control.

toddgeist · Post by **toddgeist** » Tue Sep 10, 2013 5:04 am

sadly I can't compile the develop branch. I can follow instructions usually. But I can't get it to work using either LiveCode build instructions or Trevor's. I am too clueless to know whats wrong.

Is there anyone who is maintaining development binaries ? I'd gladly sponsor someone to do it at least until this xpath stuff ships. I need it ASAP, but only on MacOS for now, if anyone is interested.

Thanks

Todd

malte · Post by **malte** » Tue Sep 10, 2013 11:42 am

Count me in on sponsoring, but I would need the whole desktop range

mwieder · Post by **mwieder** » Tue Sep 10, 2013 4:40 pm

I sent email to Todd about the development branch, but it's worth repeating here:

I'd be *very* wary of using this for any real production work since there is a lot in there that is experimental, and so may or may not work and/or may or may not change when it finally gets to a release.

LiveCode Forums

revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml

Re: revxml