How to sort out this
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller
-
- Posts: 203
- Joined: Wed Jul 23, 2008 8:46 am
How to sort out this
hello,
I have some data in a text file like:
rawdata@data1:info5121
data665@data5:info34612
rawdata@data4:info2012
rawdata@data76:info215
dataef78@data1:info863
rawdata@data85:info3666
rawdata@data1:info5121
myrawdata@data54:info785
rawdata@data4:info245
rawdata75@data00:info312
I want to remove all the lines that hold only 'rawdata' before '@'. So the output should be like:
data665@data5:info34612
dataef78@data1:info863
myrawdata@data54:info785
rawdata75@data00:info312
the text file is about 100 mb in size. so any faster method to filter this would be so much appreciated. pls someone help me sort this things out. thx in advance..
I have some data in a text file like:
rawdata@data1:info5121
data665@data5:info34612
rawdata@data4:info2012
rawdata@data76:info215
dataef78@data1:info863
rawdata@data85:info3666
rawdata@data1:info5121
myrawdata@data54:info785
rawdata@data4:info245
rawdata75@data00:info312
I want to remove all the lines that hold only 'rawdata' before '@'. So the output should be like:
data665@data5:info34612
dataef78@data1:info863
myrawdata@data54:info785
rawdata75@data00:info312
the text file is about 100 mb in size. so any faster method to filter this would be so much appreciated. pls someone help me sort this things out. thx in advance..
Re: How to sort out this
Try this:
I removed also rawdata75@, because rawdata is before @, but if you want to keep it, change the expression to "rawdata@*".
See: http://livecode.wikia.com/wiki/Filter
Code: Select all
put URL "file:path/to/myfile.txt" into temp
filter lines of temp without "rawdata*@*"
put temp into URL "file:path/to/myfile_purged.txt"
See: http://livecode.wikia.com/wiki/Filter
Livecode Wiki: http://livecode.wikia.com
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w
My blog: https://livecode-blogger.blogspot.com
To post code use this: http://tinyurl.com/ogp6d5w
-
- Posts: 203
- Joined: Wed Jul 23, 2008 8:46 am
Re: How to sort out this
thanks MaxV for your reply.
actually I have several files like that. All the files contain such data. But the string before '@' is different in the files. If I go to develop app for each file, its huge time consuming and not possible for me. So I want to develop only one app which will check the string before '@' of each line and it will try to match the string with the rest of the lines. if it finds exactly the same string before '@' in other lines, it will delete all the lines that have that exact string. I think I made myself clear
actually I have several files like that. All the files contain such data. But the string before '@' is different in the files. If I go to develop app for each file, its huge time consuming and not possible for me. So I want to develop only one app which will check the string before '@' of each line and it will try to match the string with the rest of the lines. if it finds exactly the same string before '@' in other lines, it will delete all the lines that have that exact string. I think I made myself clear
-
- VIP Livecode Opensource Backer
- Posts: 9582
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: How to sort out this
Hi.
A little trickier than it first seems, mainly because of the way items work.
Try this. make two fields and a button. Put your data into fld 1. Put this in the button script:
This is a kluge, in a way, but should work. It assumes that "@" is always the delimiter, otherwise you need to load that as well. The issue is the way items work. I first tried to make "rawData@" the itemDelimiter. This proved to be more of a headache than the method above, because the "@" cannot be part of the itemDel string, but it has to be in order to isolate the string. If that makes any sense.
Anyway, if we are kluging, there are about a million ways to go about it. And I bet Mr. Thierry will chime in with a regex in about 20 minutes.
Craig Newman
A little trickier than it first seems, mainly because of the way items work.
Try this. make two fields and a button. Put your data into fld 1. Put this in the button script:
Code: Select all
on mouseUp
put fld 1 into temp
ask "Enter string to delete" with "rawData@" --just for testing
set the itemDel to "@"
put the length of it - 1 into tLength
repeat for each line tLine in temp
if it is in tLine and the length of item 1 of tLine = tLength then next repeat
put tLine & return after newData
end repeat
put newData into fld 2
end mouseUp
Anyway, if we are kluging, there are about a million ways to go about it. And I bet Mr. Thierry will chime in with a regex in about 20 minutes.
Craig Newman
Re: How to sort out this
Or, you could just put a field on the card to type in the data you want to filter out, and use either of the two suggestions. Put the field with the text to filter out into a variable on close field, or even as is into the statement (shown), then use that variable as the filter key, something like (Max's code for example) -
Code: Select all
put URL "file:path/to/myfile.txt" into temp
filter lines of temp without the text of field 1 & "@*"
put temp into URL "file:path/to/myfile_purged.txt"
Re: How to sort out this
No, you didn't.
Do you want to filter a list without duplicates? Or do you want to eliminate lines beginning with one particular prefix? Or do you want to get others to do all the work for you without explaining what work you want done, and blaming the ones trying to help for not getting it "right" when you don't even say what "right" is?
In case anyone thinks I am being harsh, there is a history here.
-
- VIP Livecode Opensource Backer
- Posts: 9582
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: How to sort out this
Sparkout.
He did include a smiley. And I do not think english is his first language.
I had also originally added a "I do not quite think so..." in my post, but then deleted it as per the above.
But I do not know about the history. I just do not want another amthonyBlack (sic) travail.
Craig
He did include a smiley. And I do not think english is his first language.
I had also originally added a "I do not quite think so..." in my post, but then deleted it as per the above.
But I do not know about the history. I just do not want another amthonyBlack (sic) travail.
Craig
Re: How to sort out this
I think I missed a page (or 5000) from this conversation, but that is ok
-
- Posts: 203
- Joined: Wed Jul 23, 2008 8:46 am
Re: How to sort out this
hi SparkOut,
yes, I want to filter a list without dups, but these dups are not lines, just a string before the '@' sign.
I've made this thread cause I couldn't find a way to make the script that will truly help me out. If it was the concept of removing the lines, I would never ask help here as I have made a dup lines remover of my own. So I just came here only to seek help, blaming others was never my concern
Sir, I'm not a newby here. I've got several helps from you before, and always be grateful to you. I'm pretty much known about your style of talking, and I've come used to of it
Last edited by alemrantareq on Tue Mar 13, 2018 6:21 am, edited 1 time in total.
-
- Posts: 203
- Joined: Wed Jul 23, 2008 8:46 am
Re: How to sort out this
Thanks a lot dear Craig and bogs for taking the time to find a solution for me.
Both of your scripts require me to input the string I want to filter out. And that's the problem. The files I want to filter include more than 48 lakh lines. So if I'm to go search for such string, then I can delete those lines by text editor too. But as I said, it will be so much time consuming and not possible for me.
Even using 'repeat for each line' solution was also in my mind. But if my app start doing this, it will get crashed for sure because of the huge number of lines in the list.
Both of your scripts require me to input the string I want to filter out. And that's the problem. The files I want to filter include more than 48 lakh lines. So if I'm to go search for such string, then I can delete those lines by text editor too. But as I said, it will be so much time consuming and not possible for me.
Even using 'repeat for each line' solution was also in my mind. But if my app start doing this, it will get crashed for sure because of the huge number of lines in the list.
You are right. English isn't my first language, and that's why sometimes I fail to express properly what I actually want. Thanks for taking it into consideration :p
Re: How to sort out this
To remove anything, a computer needs to know what it has to remove. A computer doesn't work on magic, it works by way of being programmed to do a task over and over, relieving you of doing it yourself in this case.alemrantareq wrote: ↑Tue Mar 13, 2018 6:07 amBoth of your scripts require me to input the string I want to filter out. And that's the problem. The files I want to filter include more than 48 lakh lines. So if I'm to go search for such string, then I can delete those lines by text editor too.
You are right that you can just as easily do this particular kind of thing in a text editor, in most find and replace all with "" is what you would be running. The only advantage to using Lc to do it (over say, a text editor) is that you can set up all the keywords you want in the beginning, then filter the files through a repeat loop.
So, for instance, if all your text files are in a folder, you could write a little code like (psuedo)
Code: Select all
// code off the top of my head...
repeat for each file x in <tPath>
put URL "file:path/to/" & x into temp
filter lines of temp without "*[" & the text of field 1 &"]*"
put temp into URL "file:path/to/" & x
end repeat
Code: Select all
rawdata-thisdata-thatdata
-
- VIP Livecode Opensource Backer
- Posts: 9582
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: How to sort out this
Like Bogs, I too wonder what your thinking is. Do the filter criteria change from reading to reading? If not, the filter string can be stored and the handler size can be reduced by a few lines. But if not, what would you think is the way forward?Both of your scripts require me to input the string I want to filter out. And that's the problem.
Why does the size of the file matter? LC can process in a heartbeat.The files I want to filter include more than 48 lakh lines
Craig
Re: How to sort out this
You need a repeat loop before the filter. Loop through each line and record the text before the @. Check to see if the current line text is in the list. If so, exit the loop. Use the found text to filter the list.
I tested the below on the sample that you provided and it seemed to give the expected result.
########CODE to copy and paste with you mouse#######
on mouseUp
local tList, tKey, tKeys
set the itemDel to "@"
put the text of field "Source" into tList
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
if lineOffset(tKey, tKeys) > 0 then exit repeat
put cr & tKey after tKeys
end repeat
set the text of field "Key" to tKey
filter tList without (tKey & "*")
set the text of field "Destination" to tList
end mouseUp
#####END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq with livecode 9.0.0-dp-11#####
Edit 1: Add code
Edit 2: Add image
Edit 3: The code does not ensure that there was a duplicate line, so if none were found it would remove the last line of the file. That would be an easy thing to add though.
I tested the below on the sample that you provided and it seemed to give the expected result.
########CODE to copy and paste with you mouse#######
on mouseUp
local tList, tKey, tKeys
set the itemDel to "@"
put the text of field "Source" into tList
repeat for each line tLine in tList
put (item 1 of tLine) & "@" into tKey
if lineOffset(tKey, tKeys) > 0 then exit repeat
put cr & tKey after tKeys
end repeat
set the text of field "Key" to tKey
filter tList without (tKey & "*")
set the text of field "Destination" to tList
end mouseUp
#####END OF CODE generated by this livecode app: http://tinyurl.com/j8xf3xq with livecode 9.0.0-dp-11#####
Edit 1: Add code
Edit 2: Add image
Edit 3: The code does not ensure that there was a duplicate line, so if none were found it would remove the last line of the file. That would be an easy thing to add though.
Last edited by bwmilby on Wed Mar 14, 2018 5:29 pm, edited 3 times in total.
Brian Milby
Script Tracker https://github.com/bwmilby/scriptTracker
Script Tracker https://github.com/bwmilby/scriptTracker
Re: How to sort out this
Hi, alemrantareq. Here's a technique that derives your desired output without invoking repeat:alemrantareq wrote: ↑Tue Mar 13, 2018 5:51 amI want to filter a list without dups, but these dups are not lines, just a string before the '@' sign.
Code: Select all
function digestRecords t
split t using return and "@"
delete variable t["rawdata"]
combine t using return and "@"
return t
end digestRecords
-- Dick
-
- VIP Livecode Opensource Backer
- Posts: 9582
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Re: How to sort out this
Of course.
Well done.
Craig
Well done.
Craig