replaceText

nkrontir · Post by **nkrontir** » Wed Apr 12, 2006 4:05 am

The replaceText function does not seem to embody (or perhaps no documentation exists to support) all regular expression statements as the perl ~s or the python re.sub, even though the docs state "Searches for a regular expression and replaces the portions that match the regular expression". More specifically, I could not find a way to use regular expression groups (not even in Dan Shafer's book). In pcre, perl, python, etc. the way to describe a group is either with $groupnumber or \groupnumber, but in runtime revolution neither way seems to work. To illustrate with a simple example (I am aware that the following can be performed using chunks, but more complicated expressions cannot):

Let us assume that var1 equals "Hello again world!"
In perl you can do:

$var1=~s/(Hello)(.*)(world!)/$1 $3/; #Group 1 equals Hello, group 3 world. The second group is discarded.

In python you can do:

var1=re.sub(r'(Hello)(.*)(world!)',r'\1'+' '+r'\3',var1) #Group 1 equals Hello, group 3 world. The second group is discarded.

In both situations, the resulting var1 would be "Hello world!".

So, to cut a long story short, is there a possibility of full regular expressions support (either through replaceText, or even through a new future function)?

marielle · Post by **marielle** » Wed Apr 12, 2006 8:43 am

It doesn't let you use $1 etc. A function to address this is on my todo list.

What I have extracted from the pcre documentation for my own use is the following.

Code: Select all

____________________________________________________________
|    \w Â    matchesÂ word-constituent charactersÂ (letters,Â "_",Â &Â digits);Â 
|    \s Â     matchesÂ exactlyÂ oneÂ characterÂ of Â whitespace.Â (WhitespaceÂ isÂ 
|             definedÂ asÂ spaces,Â tabs,Â newlines,Â or anyÂ characterÂ 
|             whichÂ wouldÂ notÂ useÂ inkÂ ifÂ printedÂ onÂ aÂ printer.)
|   (?i)    If  this  bit is set, letters in the pattern match 
|             both upper and lower case letters. 
|   (?s)   If this bit is set, a dot metacharater in the pattern matches all characters,
|             including newlines when using [\s\S] rather than .*, it will include "\n" in the search string as well
|   (?x)   If this bit is set, whitespace  data  characters  in  the  pattern  are totally ignored
|              except when escaped or inside a character class. Whitespace does not include
|              the VT character (code 11). In addition, characters between an unescaped #
|              outside a character class and the next newline character, inclusive, are also  ignored.
|   (?m)  When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
|              constructs  match  immediately following or immediately before any new-
|              line in the subject string, respectively, as well as at the very  start
|              and  end. 
|   (?U)  This option inverts the "greediness" of the quantifiers  so  that  they
|              are  not greedy by default, but become greedy if followed by "?".

And here are some examples of patterns I use to extract xml nodes or parts:

Code: Select all

node_full1	(?s)(\s*<{{nodename}}(\s+[^>]*)?>[\s\S]+?</{{nodename}}>\s*)
node_full2	(?s)(\s*<{{nodename}}(\s+[^>]*)? />\s*)
node_fullwithDummy	(?s)(<{{nodename}}[^>]*>[\s\S]*{{chunk}}[\s\S]*</{{nodename}}>)
node_namepart	<(\w+)
node_attributespart	<[^\s]+\s+([\s\S]*?)(\s+/)?>
node_endtag	</([^>]+)>
attribs_last	(\s*(\w+)=([^=]+?)$)

Note that the advantage of using [\s\S]+ is that it will capture any character, including a end of line one, while .+ doesn't if the right tag isn't used.

To automatically replace the part in between {{xxx}}, I use a template.fill function. All the code used is pasted below.

Code: Select all

if matchtext(tXML, regexp.compile("xmlnode_full1", "nodename=" & tSearchName), tFull)

function  regexp.compile pPattern, pParams
   put regexp.prebuilt(pPattern) into tPattern
  return template.fill(tPattern, pParams)
end regexp.compile

function template.fill pText, pParams
  repeat with p = 2 to the paramCount
    put param(p) into tParam
    split tParam by "="
    replace "{{" & tolower(tParam[1]) & "}}" with tParam[2] in pText
  end repeat
  return pText
end template.fill

So, to cut a long story short, is there a possibility of full regular expressions support (either through replaceText, or even through a new future function)?

Sure there is a "possibility"

for it. But you will have to roll up your sleeves. First you need to do a get matchtext to find the matches for all () expressions and put them in an array. Make sure you add "()" around the pattern before running the matchtext, so that you can store the full of what is being matched in a tFull variable. Then loop through the text. For any successful match, use a replace \1 with aMatches[1] and replace \3 with aMatches[3], then check that matchtext still returns true for tFull as string and the pattern in which \1 and \3 have been replaced by the aMatches equivalent as regexp pattern.

bfisher · Post by **bfisher** » Sun Feb 18, 2007 8:38 pm

This is a crucial regular expression feature. Revolution cannot say that it has PCRE regex without replacing groups like \1 or $1.

This serious ommission has prevented me from doing real regular expression text processing in Revolution.

LiveCode Forums.

replaceText

replaceText

Missing regex replace