regex can degrade performance of subsequent statements

If you find an issue in LiveCode but are having difficulty pinning down a reliable recipe or want to sanity-check your findings with others, this is the place.

Please have one thread per issue, and try to summarize the issue concisely in the thread title so others can find related issues here.

Moderators: heatherlaine, Klaus, FourthWorld, robinmiller, kevinmiller

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 583
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: regex can degrade performance of subsequent statements

Post by Thierry » Wed Dec 14, 2016 12:31 pm

Ok,

got few minutes of free time....

Trying to narrow down the problem,
I developped my tdz1 function:

Code: Select all

function tdz1 random12, text1080
    local R
    filter lines of text1080 with regex \
          pattern format("^(?:(?:%s)\\s){3}",replaceText(random12,return,"|")) \
          into R
    return R
end tdz1
like this:

Code: Select all

function tdz random12, text1080
    local REX, gotIT, R
    replace return with "|" in random12
    put "^((?:(?:" & random12 &  ")\s){3})" into REX
    filter lines of text1080 with regex pattern REX into R
    return R
 end tdz

The only change I made is replacing this:

Code: Select all

replaceText(random12,return,"|"))
by:

Code: Select all

replace return with "|" in random12
and now, with the original stack script,
there is no more timing distorsion.

HTH for more precise investigation,

Thierry
Last edited by Thierry on Wed Dec 14, 2016 4:53 pm, edited 1 time in total.
Thierry Douez - https://sunny-tdz.com
Pourquoi tant de notes lorsqu'il suffit de jouer les plus belles... [Barbara]

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 996
Joined: Thu Apr 11, 2013 11:27 am

Re: regex can degrade performance of subsequent statements

Post by LCMark » Wed Dec 14, 2016 2:42 pm

@Thierry: I can see what is happening here - the reason you don't see any timing distortion when you remove the text from the script is that it is used (in the init1 handler) to initialize the test data. When you remove it, the test data becomes empty so all the tests run on nothing.

When I run the tests I get reasonably consistent results for tdz1 for each cycle (0.-7-1 - this is reasonable for single iteration timings on a desktop machine). However, the time taken for tdz2 does double from around 1.6 to 3.2. The reason for this is that after running the original tdz1, the random12 parameter which shares the value with sCards, has been made 16-bit units internally. Thus when this is passed to tdz2, and other functions they all use 16-bit (unicode) processing paths when manipulating it - which, will be at least twice as slow as the 8-bit ones by simple virtue of the fact that the string now takes up twice as much memory.

So, the effects you are seeing are all down to replaceText (and friends) converting the internal representation of the target text string from 8-bit units to 16-bit units.

Thierry
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 583
Joined: Wed Nov 22, 2006 3:42 pm
Location: France
Contact:

Re: regex can degrade performance of subsequent statements

Post by Thierry » Wed Dec 14, 2016 3:15 pm

Thanks Mark, all this makes sense :)

Regards,

Thierry
Thierry Douez - https://sunny-tdz.com
Pourquoi tant de notes lorsqu'il suffit de jouer les plus belles... [Barbara]

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 996
Joined: Thu Apr 11, 2013 11:27 am

Re: regex can degrade performance of subsequent statements

Post by LCMark » Wed Dec 14, 2016 4:12 pm

As it turns out the change in internal representation of the pattern is causing adverse behavior in the code used to cache previously used regex's - changing how the cache is searched returns things to a consistent speed and appears to give a tiny bump in performance of matchText/replaceText.

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 996
Joined: Thu Apr 11, 2013 11:27 am

Re: regex can degrade performance of subsequent statements

Post by LCMark » Wed Dec 14, 2016 5:19 pm

So - there are two problems here.

The first is that the target string of replaceText or matchText will change from 8-bit to 16-bit internally as soon as either is run on it - this causes different code paths to be used subsequently for all text processing operations - http://quality.livecode.com/show_bug.cgi?id=19005.

The second is that the cache implementation for compiled regexs is very inefficient and heavily affected by 16-bit compared to 8-bit internal representation - http://quality.livecode.com/show_bug.cgi?id=19004.

The inefficient cache is demonstrated quite heavily with tdz2 - as that is repeatedly using the same regex again and again. The conversion problem is demonstrated with all of the tests which process the target string using text operations - although the fix to this also improves performance of tdz2 without changing the cache implementation.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 6035
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: regex can degrade performance of subsequent statements

Post by FourthWorld » Wed Dec 14, 2016 7:14 pm

Might this also be related to this report from Mark Talluto?:
http://quality.livecode.com/show_bug.cgi?id=15711

That script relies on the MatchText function.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 996
Joined: Thu Apr 11, 2013 11:27 am

Re: regex can degrade performance of subsequent statements

Post by LCMark » Fri Dec 16, 2016 9:18 am

@FourthWorld: Thanks for digging up that bug report - I just ran a quick test here (latest develop-8.1 build which contains the two fixes above) against 8.1.1 and running 'Clean my code' on revliburl takes a couple of seconds with the patches, and a few minutes in 8.1.1.

(The patches have now been merged - the changes will first appear in 8.1.3-rc-1).

Post Reply

Return to “Bug Triage”