Randomize Lines of a Large Text File

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

xoxiwe
Posts: 3
Joined: Fri May 28, 2021 12:45 pm

Randomize Lines of a Large Text File

Post by xoxiwe » Fri May 28, 2021 1:00 pm

hello everybody,

hope you're doing good.

by the way, I was trying to randomize the lines of large a text file (around 100 mb). so for the first time, I used this method:

Code: Select all

   put empty into temp
   answer file "Select a Text File:" with type "Text File|txt"
   put it into ifile
   put url ("file:" & ifile) into temp
   set the itemdel to cr
   sort lines of temp by random(number of lines of temp)
   put temp into field "f1"
since it was taking a long time, and I never get the randomized output as the application was crashing after a while, I had to rewrite the code in a different following way:

Code: Select all

   answer file "Select a Text File:" with type "Text File|txt"
   put it into ifile
   put url ("file:" & ifile) into temp
   repeat the number of lines of temp
      get random(the number of lines of temp)
      put line it of temp & cr after temp2
      delete line it of temp
   end repeat
   put temp2 into fld "f1"
   put empty into temp
   put empty into temp2
either way I couldn't get the desired output since the app was crashing after a while too.

if any of you can show me a way to do the procedure faster without app crash would be so much appreciated. in case someone wants to know whats inside my text file, here are a few samples:
5122487-simba2006
3080803-fantasia10
12516196-19diver70
7238503-kreidler1
1671514-yamaha

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Randomize Lines of a Large Text File

Post by dunbarx » Fri May 28, 2021 2:00 pm

Hi.

Both handlers use time-tested methods to do what you want. You have several unnecessary lines, but they should not impact the process.

Here is what I suspected, and tested. On a new card I made two fields and a button . I put a dozen short lines of text into fld 1, and this in the button script:

Code: Select all

on mouseUp
   get fld 1
   repeat 1000
      put it & return after temp
   end repeat
   sort temp by random(the number of lines of temp)
   put temp into fld 2
end mouseUp


Running this, I got (on Mac) the beachball of death, but it shortly resolved itself and the randomized text appeared in fld 2. The large dataset you are working with just probably takes quite a while. It is the "sort" line that the program is unsure of. The OS thinks the program has fallen into an infinite loop, and responds with the beachball.

LC has not "crashed", it is just working hard.

Try your first handler again, but simply place a number in the random "seed", something like "sort whatever by random(9999). This should fix it.

Craig

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Randomize Lines of a Large Text File

Post by dunbarx » Fri May 28, 2021 2:19 pm

Playing around with this, LC seems indeed to slow to a crawl with a couple of hundred thousand lines of original text.

Anyone know anything about the working limits of this sort of thing?

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9824
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Randomize Lines of a Large Text File

Post by FourthWorld » Fri May 28, 2021 5:54 pm

There are likely ways to serve the goal of this app quite efficiently, once we know what the goal is.

I'm guessing the user is not expected to read 100 MBs of randomized text - is that correct?

If so, is displaying the text in a field necessary?

What is done with the list after it's randomized?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Randomize Lines of a Large Text File

Post by SparkOut » Fri May 28, 2021 10:11 pm

1) the apparent "crash" - which is just lack of responsive interface - is because a tight repeat loop will hog all the processor cycles and not give the engine an opportunity to update the screen or poll the keyboard, etc.
Inside the repeat loop add the line

Code: Select all

wait 0 millisecondswithmessages
and that will give the engine a moment as close to 0 as possible that gives it a chance to do those housekeeping tasks.
2) the speed will not be improved by that wait statement, and handling a file of 100MB might not be the best approach to take, as Richard mentioned. What is the desired outcome? For what purpose are you doing the randomisation in a file that large?

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: Randomize Lines of a Large Text File

Post by bogs » Fri May 28, 2021 10:20 pm

By the way, you don't have to be so stingy, give it a whole 10ms, which is a WHOPPING 0.01 of a second. Interestingly, I was running some polling tests on the mouse in a return loop, and found increasing the 'wait with messages' time sometimes actually *did* improve performance, and dropped the cpu usage quite a bit, which made the whole routine not only run better, but seem quite a bit snappier.
Image

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9648
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: Randomize Lines of a Large Text File

Post by dunbarx » Sat May 29, 2021 3:17 pm

All good stuff.

The beachball can be eliminated by giving the OS just a little relief.

But Richard makes the real point, that is, why do we need to randomize a 100Mb dataset? And read it as well? An old saw, "if you are dealing with something outlandish, rethink from the beginning".

So then, what is desired and required?

Craig

xoxiwe
Posts: 3
Joined: Fri May 28, 2021 12:45 pm

Re: Randomize Lines of a Large Text File

Post by xoxiwe » Sat May 29, 2021 4:10 pm

thanks to all of you who took your valuable time for me to help.

btw, who are interested to know of my purpose...actually the data in that large text file are categorized...say for example, first 1/3 are cat 1, then 2/3 cat 2, and 3/3 is cat 3...but I don't want it like that...I want it to mix up all. therefore repeat for 1000 or like that wont give me my desired result as I want even the last line will have the chance to be the first or second or third line and like that.

if anyone can provide me some idea so my OS (Windows 10) can handle that large data without crash would be so much appreciated :)

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9824
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Randomize Lines of a Large Text File

Post by FourthWorld » Sat May 29, 2021 7:59 pm

The Bible is roughly 4MB of text. Your program is putting the equivalent of 25 bibles into a field for display. Given both the length and the nature of your text, I'm guessing you don't expect the user to read it.

Does it need to be displayed in a field at all?

Right off the bat you'll save significant time by not doing that. Field rendering requires a great many steps under the hood, all the way down to antialiased font rendering for every character.

If you would kindly tell us what your program does with the randomized text, we may be able to speed it up even more, far beyond the gains from bypassing unneeded field rendering.

But the methods for doing so are varied, so it won't be possible to provide useful guidance until we know how the text is used.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

SparkOut
Posts: 2852
Joined: Sun Sep 23, 2007 4:58 pm

Re: Randomize Lines of a Large Text File

Post by SparkOut » Sat May 29, 2021 8:54 pm

What Richard said.

Just for info, I made a dummy file with a repeat loop that created a series of lines "This is line" && line number & cr

It took 5 million iterations to make a file over 100MB. While there are certainly sources that create that much data in a single file, operating over millions of data lines is not going to be optimal. The question is why do you need to process it, to what end?

xoxiwe
Posts: 3
Joined: Fri May 28, 2021 12:45 pm

Re: Randomize Lines of a Large Text File

Post by xoxiwe » Sun May 30, 2021 7:17 pm

well, my program checks every line in a text file based on some given condition, and gets me the lines according to the condition given

what I need is to randomize the lines so the chance to get the expected lines faster since the lines are categorized and I don't know from which line a new categorized record starts.

Also I forgot to mention that showing the outcome into a field doesnt necessary.

I have already made a tool using livecode which can reverse all the lines within short period of time, and it's the perfect one when I go for a reverse. what I need now is to shuffle the lines so the middle ones can have the chance to go upper.

I understand if it's somewhere complex and may not be to my expectation since handling such large file isn't so easy and might cause crash.

I'm also sorry to you all who have spent your valuable time after me. Thanks to you all :)

jacque
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7229
Joined: Sat Apr 08, 2006 8:31 pm
Location: Minneapolis MN
Contact:

Re: Randomize Lines of a Large Text File

Post by jacque » Sun May 30, 2021 8:33 pm

I'd use an array, which is very fast and supports huge numbers of entries. The method would be to convert the data to an array, sort the keys randomly, and then check each key in its sorted order to see if the info you want is the key's element. This is one way:

Code: Select all

local sData -- array of all data
local sRandomList -- list of randomized keys

on mouseUp
  put makeData(fld 1) into sData -- here you would import your data instead of creating test data
  randomizeTxt sData -- this creates a randomly sorted list of keys
end mouseUp

on randomizeTxt
  split sData by cr
  put keys(sData) into tKeys
  sort tKeys by random(the number of lines in tKeys)
  put tKeys into sRandomList
end randomizeTxt

function makeData -- create false test data; you won't need this
  get fld 2
  repeat 100
    put it & return after temp
  end repeat
  return temp
end makeData
If your data uses unique leading numbers, you could split the data by cr and "-". You probably wouldn't need to sort the keys in that case since the numbers in your example look pretty random to me already.

To search the array for specific info:

Code: Select all

on searchArray pText
  repeat for each key k in sRandomList
    if sData[k] = pText then return sData[k]
  end repeat
end searchArray
Jacqueline Landman Gay | jacque at hyperactivesw dot com
HyperActive Software | http://www.hyperactivesw.com

jiml
Posts: 336
Joined: Sat Dec 09, 2006 1:27 am
Location: Los Angeles

Re: Randomize Lines of a Large Text File

Post by jiml » Mon May 31, 2021 11:25 pm

You might want to consider using a database.

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Randomize Lines of a Large Text File

Post by AxWald » Tue Jun 01, 2021 1:25 pm

Hi,

guess it's sorting a big bunch of data that's the problem here. So just avoid sorting a big bunch of data ;-)

This looked interesting, so I played with it a bit. See my results:

To avoid having to sort huge amounts of data I create an array with the line numbers as keys at first, and add 2 random values to each key. Then I sort this array 2 times, by each of the values. This gives a list containing each possible line number:

Code: Select all

function getLiNus theNum
   /* Creates a list containing all numbers from 1 to "theNum",
   in randomized order, each on its line   */
   
   put 1 into myCnt
   repeat theNum
      put random(99999) into myArr[myCnt][1]          --  create  sort array
      put random(99999) into myArr[myCnt][2]
      add 1 to myCnt
   end repeat
   
   get the keys of myArr
   sort lines of it numeric by myArr[each][1]       --  sort it
   sort lines of it numeric by myArr[each][2]
   
   return it
end getLiNus
This is sufficiently fast, and sufficiently randomized (you might play with the number of keys, and the random(x) value). And this is how you utilize it:

Code: Select all

on mouseUp
   answer file "Which file?"
   if it is empty then exit mouseUp
   
   put URL ("file:" & it) into myData
   put 1 into myCnt
   repeat for each line L in myData                      --  create data array
      put L into myDatArr[myCnt]
      add 1 to myCnt
   end repeat
   
   put getLiNus(the number of lines of myData) into mySort    --  fetch the sort list   
   repeat for each line L in mySort                           --  and apply it
       put myDatArr[L] & CR after myRes
   end repeat
   delete char -1 of myRes

   put myRes
end mouseUp
Hint: For easy access this code is stripped of all not really necessary. For instance, each repeat should start with:

Code: Select all

      wait 0 millisec with messages
      if the controlKey is down then exit repeat
A problem I ran in is that it becomes cumbersome to fetch the single lines of our text data - once it's a really phat chunk with thousands of lines. So I start with throwing all lines of our data into an array first, with [lineNumber] as key. This way I can access them much faster later, once I have my sort order :)

For a file with ~55MB & ~1.000.000 lines I get (millisecs):

Code: Select all

Make data array: 4757 | Rearrange lines of data: 5570
Make sort array: 6295 | Sort sort array: 3924
Over all: 20867
Lines: 1000436
Remark: Even with "wait 0 with messages" this still runs into "unresponsiveness" during the last loop (rearranging the lines).

Anyway, perhaps something here is useful for someone. Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9824
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Randomize Lines of a Large Text File

Post by FourthWorld » Tue Jun 01, 2021 5:56 pm

xoxiwe wrote:
Sun May 30, 2021 7:17 pm
well, my program checks every line in a text file based on some given condition, and gets me the lines according to the condition given
In my reading that sounds like what you need is nearly the opposite of random, something very specific.

What is done with the line after it is retrieved?

And how often is the list file updated?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”