function fileMd5Digest pFilePath
local tCheckSum
get binarydecode("h*", md5digest(url ("binfile:" & pFilePath)), tCheckSum)
return tCheckSum
end fileMd5Digest
I just don't know how reliable it is for verifying the checksum of large files. How is it that it can compute it so quickly? It would almost seem like it doesn't actually read the entire file, and if it doesn't, how can this be reliable?
Regards
Gerrie
Last edited by Ledigimate on Sat Dec 15, 2018 8:55 am, edited 1 time in total.
I just tested the above code on two large files that differ by a single bit.
The result is disappointing.
It returns the same result for both files.
So now my question becomes, am I going about this the wrong way?
I never used this function before, but with a string of about 7000 chars in two variables, I get a different value if they differ by a single char, and the same value if they are in fact the same.
If you pass a file URL to the md5Digest function, does it always read the whole file?
If not, that might explain why I got the same value for two very large files that differ by only one bit.
I made a copy of a 3.04 GB file, changed only one bit using a raw disk editor, ran the function against both files, and it gave the same result.
Ledigimate wrote: Wed Dec 12, 2018 1:37 pm
I just don't know how reliable it is for verifying the checksum of large files. How is it that it can compute it so quickly? It would almost seem like it doesn't actually read the entire file, and if it doesn't, how can this be reliable?
How large is "large"?
Once generated, how is the md5 value being used?
Richard Gaskin LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Files that are too large to be loaded entirely into RAM, I guess.
Which OS are you using? Many provide hashing functions that can be called from the command line via LC's shell function.
Once generated, how is the md5 value being used?
I would like to use the md5 value to verify the integrity of a copied file.
Will you be doing that manually? For one file, 10 files, 10,000 files? Why MD5 as opposed to more recent algos? Is this all on your local hard drive, or by "copy" do you mean "download"?
Richard Gaskin LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Many provide hashing functions that can be called from the command line via LC's shell function.
I have a command line utility from Microsoft that can do the job, but I just wanted to try the LC function first. If it could spare me some effort, it's worth a shot.
Will you be doing that manually? For one file, 10 files, 10,000 files?
I want to create a utility that runs off the root directory of a removable drive and recursively calculates the checksum values of each file on the drive, and presents the results in a user friendly format. I want to distribute it along with our software so the users can check for corrupted files on the installation media. It's about 15 files.
function MD5DigestOfFile pFile
-----
local CHUNK_SIZE,theMD5
local theError
-----
## This combination gave the best results in a very rough test on
## OS X 10.5 on an Intel iMac.
## Compared with [3|1|2] * 1024 * 1024
## Compared 1024 * [128|32|8]
put 1024 * 128 into CHUNK_SIZE
open file pFile for binary read
put the result into theError
if theError is empty then
repeat
read from file pFile for CHUNK_SIZE chars
if the result is EOF then
exit repeat
else
if the result is not empty then
put the result into theError
end if
end if
put the md5Digest of it after theMD5
if theError is not empty then exit repeat
end repeat
close file pFile
end if
return the md5Digest of theMD5
end MD5DigestOfFile
trevordevore wrote: Thu Dec 13, 2018 1:58 am
I don't know Richard. I would guess not. I've only used it in situations where I ran the same function on all files.
Thanks. That's why I was asking about usage above. Solutions are easy to come by when the producer and the consumer of information is the same party. But if the scenario was to provide a checksum to others for a file being offered for download, depending on the method used any checksum we post may or may not bear any relationship to the algo they use to run a confirming checksum.
Richard Gaskin LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
After some more testing, I also discovered that the binarydecode function swaps the hex characters in the result when asked to decode binary data to hex data, e.g. it yields
9c0f8f59bf89ba19955ff10d92e732d6
instead of
c9f0f895fb98ab9159f51fd0297e236d
Why on earth would it do that? Or is the md5Digest function to blame?
Ledigimate wrote: Thu Dec 13, 2018 9:30 am
After some more testing, I also discovered that the binarydecode function swaps the hex characters in the result when asked to decode binary data to hex data, e.g. it yields
9c0f8f59bf89ba19955ff10d92e732d6
instead of
c9f0f895fb98ab9159f51fd0297e236d
Why on earth would it do that? Or is the md5Digest function to blame?
Double check that first argument. Options are provided for most data sizes in both byte orders.
Richard Gaskin LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Double check that first argument. Options are provided for most data sizes in both byte orders.
Thanks Richard, that was it. I couldn't make sense of the relevant dictionary entry from inside LC, so I looked it up online and even there I had to decipher the text which wasn't properly punctuated. But I figured it out.
Update: The problem where I get the same result on two different files occur only when the file size exeeds 504 MB.
function fileMd5Digest pFilePath
local tCheckSum
local tData
put url ("binfile:" & pFilePath) into tData
get binarydecode("H*", md5digest(tData), tCheckSum)
return tCheckSum
end fileMd5Digest
When the file size exceeds 504 MB, tData is empty. I don't know if this is due to a limitation in LiveCode, or due to insufficient memory.