How to sort out this

Got a LiveCode personal license? Are you a beginner, hobbyist or educator that's new to LiveCode? This forum is the place to go for help getting started. Welcome!

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller

alemrantareq
Posts: 203
Joined: Wed Jul 23, 2008 8:46 am

How to sort out this

Post by alemrantareq » Mon Mar 12, 2018 7:08 am

hello,

I have some data in a text file like:

rawdata@data1:info5121
data665@data5:info34612
rawdata@data4:info2012
rawdata@data76:info215
dataef78@data1:info863
rawdata@data85:info3666
rawdata@data1:info5121
myrawdata@data54:info785
rawdata@data4:info245
rawdata75@data00:info312

I want to remove all the lines that hold only 'rawdata' before '@'. So the output should be like:

data665@data5:info34612
dataef78@data1:info863
myrawdata@data54:info785
rawdata75@data00:info312

the text file is about 100 mb in size. so any faster method to filter this would be so much appreciated. pls someone help me sort this things out. thx in advance..

MaxV
Posts: 1579
Joined: Tue May 28, 2013 2:20 pm
Location: Italy
Contact:

Re: How to sort out this

Post by MaxV » Mon Mar 12, 2018 2:11 pm

Try this:

Code: Select all

put URL "file:path/to/myfile.txt" into temp
filter  lines of temp without "rawdata*@*"
put temp into URL "file:path/to/myfile_purged.txt"
I removed also rawdata75@, because rawdata is before @, but if you want to keep it, change the expression to "rawdata@*".

See: http://livecode.wikia.com/wiki/Filter
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w

alemrantareq
Posts: 203
Joined: Wed Jul 23, 2008 8:46 am

Re: How to sort out this

Post by alemrantareq » Mon Mar 12, 2018 6:27 pm

thanks MaxV for your reply.

actually I have several files like that. All the files contain such data. But the string before '@' is different in the files. If I go to develop app for each file, its huge time consuming and not possible for me. So I want to develop only one app which will check the string before '@' of each line and it will try to match the string with the rest of the lines. if it finds exactly the same string before '@' in other lines, it will delete all the lines that have that exact string. I think I made myself clear :)

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9582
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: How to sort out this

Post by dunbarx » Mon Mar 12, 2018 7:52 pm

Hi.

A little trickier than it first seems, mainly because of the way items work.

Try this. make two fields and a button. Put your data into fld 1. Put this in the button script:

Code: Select all

on mouseUp
   put fld 1 into temp
   ask "Enter string to delete" with "rawData@" --just for testing
   set the itemDel to "@"
   put the length of it - 1 into tLength
   
   repeat for each line tLine in temp
      if it is in tLine and the length  of item 1 of tLine = tLength then next repeat
      put tLine & return after newData
   end repeat
    put newData into fld 2
end mouseUp
This is a kluge, in a way, but should work. It assumes that "@" is always the delimiter, otherwise you need to load that as well. The issue is the way items work. I first tried to make "rawData@" the itemDelimiter. This proved to be more of a headache than the method above, because the "@" cannot be part of the itemDel string, but it has to be in order to isolate the string. If that makes any sense.

Anyway, if we are kluging, there are about a million ways to go about it. And I bet Mr. Thierry will chime in with a regex in about 20 minutes.

Craig Newman

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: How to sort out this

Post by bogs » Mon Mar 12, 2018 10:24 pm

Or, you could just put a field on the card to type in the data you want to filter out, and use either of the two suggestions. Put the field with the text to filter out into a variable on close field, or even as is into the statement (shown), then use that variable as the filter key, something like (Max's code for example) -

Code: Select all

	put URL "file:path/to/myfile.txt" into temp
   	filter  lines of temp without the text of field 1 & "@*"
	put temp into URL "file:path/to/myfile_purged.txt"
Image

SparkOut
Posts: 2839
Joined: Sun Sep 23, 2007 4:58 pm

Re: How to sort out this

Post by SparkOut » Mon Mar 12, 2018 10:33 pm

alemrantareq wrote:
Mon Mar 12, 2018 6:27 pm
I think I made myself clear :)
No, you didn't.
Do you want to filter a list without duplicates? Or do you want to eliminate lines beginning with one particular prefix? Or do you want to get others to do all the work for you without explaining what work you want done, and blaming the ones trying to help for not getting it "right" when you don't even say what "right" is?

In case anyone thinks I am being harsh, there is a history here.

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9582
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: How to sort out this

Post by dunbarx » Mon Mar 12, 2018 10:51 pm

Sparkout.

He did include a smiley. And I do not think english is his first language.

I had also originally added a "I do not quite think so..." in my post, but then deleted it as per the above.

But I do not know about the history. I just do not want another amthonyBlack (sic) travail. :wink:

Craig

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: How to sort out this

Post by bogs » Mon Mar 12, 2018 11:12 pm

I think I missed a page (or 5000) from this conversation, but that is ok :)
Image

alemrantareq
Posts: 203
Joined: Wed Jul 23, 2008 8:46 am

Re: How to sort out this

Post by alemrantareq » Tue Mar 13, 2018 5:51 am

hi SparkOut,
SparkOut wrote:
Mon Mar 12, 2018 10:33 pm
Do you want to filter a list without duplicates? Or do you want to eliminate lines beginning with one particular prefix?
yes, I want to filter a list without dups, but these dups are not lines, just a string before the '@' sign.
SparkOut wrote:
Mon Mar 12, 2018 10:33 pm
Or do you want to get others to do all the work for you without explaining what work you want done, and blaming the ones trying to help for not getting it "right" when you don't even say what "right" is?
I've made this thread cause I couldn't find a way to make the script that will truly help me out. If it was the concept of removing the lines, I would never ask help here as I have made a dup lines remover of my own. So I just came here only to seek help, blaming others was never my concern ;)
SparkOut wrote:
Mon Mar 12, 2018 10:33 pm
In case anyone thinks I am being harsh, there is a history here.
Sir, I'm not a newby here. I've got several helps from you before, and always be grateful to you. I'm pretty much known about your style of talking, and I've come used to of it :D
Last edited by alemrantareq on Tue Mar 13, 2018 6:21 am, edited 1 time in total.

alemrantareq
Posts: 203
Joined: Wed Jul 23, 2008 8:46 am

Re: How to sort out this

Post by alemrantareq » Tue Mar 13, 2018 6:07 am

Thanks a lot dear Craig and bogs for taking the time to find a solution for me.

Both of your scripts require me to input the string I want to filter out. And that's the problem. The files I want to filter include more than 48 lakh lines. So if I'm to go search for such string, then I can delete those lines by text editor too. But as I said, it will be so much time consuming and not possible for me.

Even using 'repeat for each line' solution was also in my mind. But if my app start doing this, it will get crashed for sure because of the huge number of lines in the list.
dunbarx wrote:
Mon Mar 12, 2018 10:51 pm
And I do not think english is his first language.
You are right. English isn't my first language, and that's why sometimes I fail to express properly what I actually want. Thanks for taking it into consideration :p

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: How to sort out this

Post by bogs » Tue Mar 13, 2018 2:00 pm

alemrantareq wrote:
Tue Mar 13, 2018 6:07 am
Both of your scripts require me to input the string I want to filter out. And that's the problem. The files I want to filter include more than 48 lakh lines. So if I'm to go search for such string, then I can delete those lines by text editor too.
To remove anything, a computer needs to know what it has to remove. A computer doesn't work on magic, it works by way of being programmed to do a task over and over, relieving you of doing it yourself in this case.

You are right that you can just as easily do this particular kind of thing in a text editor, in most find and replace all with "" is what you would be running. The only advantage to using Lc to do it (over say, a text editor) is that you can set up all the keywords you want in the beginning, then filter the files through a repeat loop.

So, for instance, if all your text files are in a folder, you could write a little code like (psuedo)

Code: Select all

// code off the top of my head...
repeat for each file x in <tPath>
	put URL "file:path/to/" & x into temp
   	filter  lines of temp without "*[" & the text of field 1 &"]*" 
   	put temp into URL "file:path/to/" & x
end repeat 
You would, in the text box, need to put the words you are filtering of course, like they are shown in the filter example. For instance, you want the lines with rawdata, thisdata, thatdata, you would put them into the text box as :

Code: Select all

rawdata-thisdata-thatdata
The time saving by automating it through programming is that you don't have to open each and every file, but you do still have to provide a program a way to know what to remove.
Image

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9582
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: How to sort out this

Post by dunbarx » Tue Mar 13, 2018 7:56 pm

Both of your scripts require me to input the string I want to filter out. And that's the problem.
Like Bogs, I too wonder what your thinking is. Do the filter criteria change from reading to reading? If not, the filter string can be stored and the handler size can be reduced by a few lines. But if not, what would you think is the way forward?
The files I want to filter include more than 48 lakh lines
Why does the size of the file matter? LC can process in a heartbeat.

Craig

bwmilby
Posts: 438
Joined: Wed Jun 07, 2017 5:37 am
Location: Henrico, VA
Contact:

Re: How to sort out this

Post by bwmilby » Tue Mar 13, 2018 11:10 pm

You need a repeat loop before the filter. Loop through each line and record the text before the @. Check to see if the current line text is in the list. If so, exit the loop. Use the found text to filter the list.

I tested the below on the sample that you provided and it seemed to give the expected result.

########CODE to copy and paste with you mouse#######
on mouseUp
local tList, tKey, tKeys
set the itemDel to "@"
put the text of field "Source" into tList
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
if lineOffset(tKey, tKeys) > 0 then exit repeat
put cr & tKey after tKeys
end repeat
set the text of field "Key" to tKey
filter tList without (tKey & "*")
set the text of field "Destination" to tList
end mouseUp
#####END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq with livecode 9.0.0-dp-11#####

Edit 1: Add code
Edit 2: Add image
Edit 3: The code does not ensure that there was a duplicate line, so if none were found it would remove the last line of the file. That would be an easy thing to add though.

HTSOT.JPG
Last edited by bwmilby on Wed Mar 14, 2018 5:29 pm, edited 3 times in total.
Brian Milby

Script Tracker https://github.com/bwmilby/scriptTracker

rkriesel
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 118
Joined: Thu Apr 13, 2006 6:25 pm

Re: How to sort out this

Post by rkriesel » Wed Mar 14, 2018 12:22 am

alemrantareq wrote:
Tue Mar 13, 2018 5:51 am
I want to filter a list without dups, but these dups are not lines, just a string before the '@' sign.
Hi, alemrantareq. Here's a technique that derives your desired output without invoking repeat:

Code: Select all

function digestRecords t
   split t using return and "@"
   delete variable t["rawdata"]
   combine t using return and "@"
   return t
end digestRecords
Does it do what you need with your real data?
-- Dick

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9582
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: How to sort out this

Post by dunbarx » Wed Mar 14, 2018 3:24 am

Of course.

Well done.

Craig

Post Reply

Return to “Getting Started with LiveCode - Complete Beginners”