Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Wed Dec 12, 2018 1:37 pm
Hi
I've tested LiveCode's md5digest function and its super fast, even with files larger in size than the computer's RAM.
Code: Select all
function fileMd5Digest pFilePath
local tCheckSum
get binarydecode("h*", md5digest(url ("binfile:" & pFilePath)), tCheckSum)
return tCheckSum
end fileMd5Digest
I just don't know how reliable it is for verifying the checksum of large files. How is it that it can compute it so quickly? It would almost seem like it doesn't actually read the entire file, and if it doesn't, how can this be reliable?
Regards
Gerrie
Last edited by
Ledigimate on Sat Dec 15, 2018 8:55 am, edited 1 time in total.
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Wed Dec 12, 2018 3:00 pm
I just tested the above code on two large files that differ by a single bit.
The result is disappointing.
It returns the same result for both files.
So now my question becomes, am I going about this the wrong way?
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
dunbarx
- VIP Livecode Opensource Backer
- Posts: 9579
- Joined: Wed May 06, 2009 2:28 pm
- Location: New York, NY
Post
by dunbarx » Wed Dec 12, 2018 4:17 pm
Hi.
I never used this function before, but with a string of about 7000 chars in two variables, I get a different value if they differ by a single char, and the same value if they are in fact the same.
Craig Newman
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Wed Dec 12, 2018 4:45 pm
If you pass a file URL to the md5Digest function, does it always read the whole file?
If not, that might explain why I got the same value for two very large files that differ by only one bit.
I made a copy of a 3.04 GB file, changed only one bit using a raw disk editor, ran the function against both files, and it gave the same result.
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
FourthWorld
- VIP Livecode Opensource Backer
- Posts: 9802
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
-
Contact:
Post
by FourthWorld » Wed Dec 12, 2018 5:57 pm
Ledigimate wrote: ↑Wed Dec 12, 2018 1:37 pm
I just don't know how reliable it is for verifying the checksum of large files. How is it that it can compute it so quickly? It would almost seem like it doesn't actually read the entire file, and if it doesn't, how can this be reliable?
How large is "large"?
Once generated, how is the md5 value being used?
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Wed Dec 12, 2018 11:17 pm
How large is "large"?
Files that are too large to be loaded entirely into RAM, I guess.
Once generated, how is the md5 value being used?
I would like to use the md5 value to verify the integrity of a copied file.
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
FourthWorld
- VIP Livecode Opensource Backer
- Posts: 9802
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
-
Contact:
Post
by FourthWorld » Wed Dec 12, 2018 11:23 pm
Ledigimate wrote: ↑Wed Dec 12, 2018 11:17 pm
How large is "large"?
Files that are too large to be loaded entirely into RAM, I guess.
Which OS are you using? Many provide hashing functions that can be called from the command line via LC's shell function.
Once generated, how is the md5 value being used?
I would like to use the md5 value to verify the integrity of a copied file.
Will you be doing that manually? For one file, 10 files, 10,000 files? Why MD5 as opposed to more recent algos? Is this all on your local hard drive, or by "copy" do you mean "download"?
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Thu Dec 13, 2018 12:23 am
Which OS are you using?
Any version of Windows from XP up to 10.
Many provide hashing functions that can be called from the command line via LC's shell function.
I have a command line utility from Microsoft that can do the job, but I just wanted to try the LC function first. If it could spare me some effort, it's worth a shot.
Will you be doing that manually? For one file, 10 files, 10,000 files?
I want to create a utility that runs off the root directory of a removable drive and recursively calculates the checksum values of each file on the drive, and presents the results in a user friendly format. I want to distribute it along with our software so the users can check for corrupted files on the installation media. It's about 15 files.
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
trevordevore
- VIP Livecode Opensource Backer
- Posts: 1005
- Joined: Sat Apr 08, 2006 3:06 pm
- Location: Overland Park, Kansas
-
Contact:
Post
by trevordevore » Thu Dec 13, 2018 1:38 am
I use the following function to get the md5 digest of a file. I think Mark Waddingham (LiveCode CTO) was the original author.
Code: Select all
function MD5DigestOfFile pFile
-----
local CHUNK_SIZE,theMD5
local theError
-----
## This combination gave the best results in a very rough test on
## OS X 10.5 on an Intel iMac.
## Compared with [3|1|2] * 1024 * 1024
## Compared 1024 * [128|32|8]
put 1024 * 128 into CHUNK_SIZE
open file pFile for binary read
put the result into theError
if theError is empty then
repeat
read from file pFile for CHUNK_SIZE chars
if the result is EOF then
exit repeat
else
if the result is not empty then
put the result into theError
end if
end if
put the md5Digest of it after theMD5
if theError is not empty then exit repeat
end repeat
close file pFile
end if
return the md5Digest of theMD5
end MD5DigestOfFile
Trevor DeVore
ScreenSteps - https://www.screensteps.com
LiveCode Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode
LiveCode Builder Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode-builder
-
FourthWorld
- VIP Livecode Opensource Backer
- Posts: 9802
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
-
Contact:
Post
by FourthWorld » Thu Dec 13, 2018 1:46 am
Thanks, Trevor. Is that chunksize and aggregating method the same as used by macOS's md5 command?
-
trevordevore
- VIP Livecode Opensource Backer
- Posts: 1005
- Joined: Sat Apr 08, 2006 3:06 pm
- Location: Overland Park, Kansas
-
Contact:
Post
by trevordevore » Thu Dec 13, 2018 1:58 am
I don't know Richard. I would guess not. I've only used it in situations where I ran the same function on all files.
Trevor DeVore
ScreenSteps - https://www.screensteps.com
LiveCode Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode
LiveCode Builder Repos - https://github.com/search?q=user%3Atrevordevore+topic:livecode-builder
-
FourthWorld
- VIP Livecode Opensource Backer
- Posts: 9802
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
-
Contact:
Post
by FourthWorld » Thu Dec 13, 2018 2:07 am
trevordevore wrote: ↑Thu Dec 13, 2018 1:58 am
I don't know Richard. I would guess not. I've only used it in situations where I ran the same function on all files.
Thanks. That's why I was asking about usage above. Solutions are easy to come by when the producer and the consumer of information is the same party. But if the scenario was to provide a checksum to others for a file being offered for download, depending on the method used any checksum we post may or may not bear any relationship to the algo they use to run a confirming checksum.
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Thu Dec 13, 2018 9:30 am
After some more testing, I also discovered that the binarydecode function swaps the hex characters in the result when asked to decode binary data to hex data, e.g. it yields
9c0f8f59bf89ba19955ff10d92e732d6
instead of
c9f0f895fb98ab9159f51fd0297e236d
Why on earth would it do that? Or is the md5Digest function to blame?
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101
-
FourthWorld
- VIP Livecode Opensource Backer
- Posts: 9802
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
-
Contact:
Post
by FourthWorld » Thu Dec 13, 2018 10:53 am
Ledigimate wrote: ↑Thu Dec 13, 2018 9:30 am
After some more testing, I also discovered that the binarydecode function swaps the hex characters in the result when asked to decode binary data to hex data, e.g. it yields
9c0f8f59bf89ba19955ff10d92e732d6
instead of
c9f0f895fb98ab9159f51fd0297e236d
Why on earth would it do that? Or is the md5Digest function to blame?
Double check that first argument. Options are provided for most data sizes in both byte orders.
-
Ledigimate
- Livecode Opensource Backer
- Posts: 132
- Joined: Mon Jan 14, 2013 3:37 pm
Post
by Ledigimate » Thu Dec 13, 2018 7:13 pm
Double check that first argument. Options are provided for most data sizes in both byte orders.
Thanks Richard, that was it. I couldn't make sense of the relevant dictionary entry from inside LC, so I looked it up online and even there I had to decipher the text which wasn't properly punctuated. But I figured it out.
Update: The problem where I get the same result on two different files occur only when the file size exeeds 504 MB.
Code: Select all
function fileMd5Digest pFilePath
local tCheckSum
local tData
put url ("binfile:" & pFilePath) into tData
get binarydecode("H*", md5digest(tData), tCheckSum)
return tCheckSum
end fileMd5Digest
When the file size exceeds 504 MB, tData is empty. I don't know if this is due to a limitation in LiveCode, or due to insufficient memory.
010100000110010101100001011000110110010100111101010011000110111101110110011001010010101101010100011100100111010101110100011010000010101101001010011101010111001101110100011010010110001101100101