Save data using URL command mangles bit values

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
Simon Knight
Posts: 854
Joined: Wed Nov 04, 2009 11:41 am
Location: Gunthorpe, North Lincs, UK

Save data using URL command mangles bit values

Post by Simon Knight » Sat May 21, 2022 8:40 am

Hi,

I have some code that attempts to create a valid xmp sidecar file. These files are used to store metadata about images and other media files. A xmp file is a form of xml so is mostly text. However there is an exception in the first line which includes three high value characters.
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
The three characters are between the two single quotes following begin= . This header is common to all xmp files so my code stores the header in a variable. I have confirmed that the three characters are in the stream of bytes and they are EF BB BF. My code uses the URL command to write the final file and this strips the characters replacing them with a question mark.

Code: Select all

put tXMPhead & cr & tXMPKeywords & tXMPTail into tXMPData
         put tXMPdata into URL ("binfile:" & pRawFileDetailsA[tKey]["XMPfilePath"])
I have confirmed that tXMPhead contains the characters but when written to file they get replaced.

I have also tried the following code :

Code: Select all

put pRawFileDetailsA[tKey]["XMPfilePath"] into tFilePath
         Open file tFilePath for binary write
         write tXMPdata to file tFilePath
         close file tFilePath
This also fails.

The results of the Livecode file operations:
2022-05-21-082250-Screenshot 2022-05-21 at 08.22.45.png
Character decode of LC created file
2022-05-21-082250-Screenshot 2022-05-21 at 08.22.45.png (16.2 KiB) Viewed 2876 times
Hex values:
2022-05-21-082258-Screenshot 2022-05-21 at 08.22.52.png
Hex decode of LC created file
2022-05-21-082258-Screenshot 2022-05-21 at 08.22.52.png (20.76 KiB) Viewed 2876 times

The screen shots below show portions of the variable that is being saved into a file. They are taken from a hex editor. I placed a break point in my code and copied the contents of the variable into a text file and then opened the text file in the hex editor.
2022-05-21-081740-Screenshot 2022-05-21 at 08.17.34.png
Character decode of variable
2022-05-21-081740-Screenshot 2022-05-21 at 08.17.34.png (20.11 KiB) Viewed 2876 times

best wishes

Simon
best wishes
Skids

Simon Knight
Posts: 854
Joined: Wed Nov 04, 2009 11:41 am
Location: Gunthorpe, North Lincs, UK

Re: Save data using URL command mangles bit values

Post by Simon Knight » Sat May 21, 2022 8:41 am

here is the hex value of the variable described above. (forum would not allow me to post a 4th image)
2022-05-21-081747-Screenshot 2022-05-21 at 08.17.41.png
Hex of variable
2022-05-21-081747-Screenshot 2022-05-21 at 08.17.41.png (27.4 KiB) Viewed 2875 times
best wishes
Skids

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1208
Joined: Thu Apr 11, 2013 11:27 am

Re: Save data using URL command mangles bit values

Post by LCMark » Sat May 21, 2022 9:15 am

How is tXMPhead being constructed? It looks like you are mixing text and binary data. Since you are constructing a binary file, you need to make sure that all parts are binary to stop the engine applying default conversions from text to binary (in particular 'invalid' chars will map to ?).

If you do:

Code: Select all

put "<?xpacket begin='" & numToByte(0xEF) & numToByte(0xBB) & numToByte(0xBF) & "' id='W5M0MpCehiHzreSzNTczkc9d'?>" into tXMPhead
I suspect the problem will go away.

Simon Knight
Posts: 854
Joined: Wed Nov 04, 2009 11:41 am
Location: Gunthorpe, North Lincs, UK

Re: Save data using URL command mangles bit values

Post by Simon Knight » Sat May 21, 2022 9:58 am

Yes it works - thanks.

What I find confusing is why or how does using NumtoByte stop the engine from stripping these values ? I can understand that the engine looks at a stream of bytes and checks to see if they are within normal printable ASCII character values meaning that it sees the strange non character values and strips them out. But I don't understand how NumToByte is working. Is it adding a special control byte to tell the engine to pass the values on into the file ? Obviously I have no need to know how it works but it would be good to have some greater understanding.

Simon
best wishes
Skids

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1208
Joined: Thu Apr 11, 2013 11:27 am

Re: Save data using URL command mangles bit values

Post by LCMark » Sat May 21, 2022 1:20 pm

So I can't say for sure what was going on in your case as I don't know what your original code was doing (I've tried to reproduce the effect you described and cannot).

Prior to 7 there was no difference between binary data and text - they could be the same because text was only ever single-byte values interpreted relative to the native text encoding (e.g. MacRoman on macOS and Windows-1252 on Windows). When we moved to 7, text became (from script's point of view) a sequence of characters (relative to Unicode) - so the 1-1 mapping between text and binary data (via the native encoding) no longer existed, however we still needed to keep compatibility with existing code.

Internally the engine has a separate datatype for binary data (data) vs text (string). Byte operations generate data, text operations generate strings, and operations which makes sense on both will convert from data to string unless all operands are data.

Converting from data to string is done via the native text encoding (as it always implicitly was - even though the engine had to do nothing to achieve this prior to 7) - similarly, when you use a string in a context expecting data, the engine will convert to the native text encoding.

So if you have a string and put it into a binfile, the first thing the engine does is convert the string to binary data by mapping each character to the matching character in the native text encoding - if the character cannot be represented there, then it is replaced with ?.

In this case, it looks to me like you actually managed to have U+FFFD as a character in your tXMLhead variable (rather than three characters you expected) - this isn't present in the native encoding so maps to ?.

If you can share your original (not working) code I can probably work out precisely what was going on in your case though.

Simon Knight
Posts: 854
Joined: Wed Nov 04, 2009 11:41 am
Location: Gunthorpe, North Lincs, UK

Re: Save data using URL command mangles bit values

Post by Simon Knight » Sat May 21, 2022 1:53 pm

Mark,
Thanks for taking the time to write your response and to offer to look at my stack.

I have created an archive file that includes an image file which means that it is to large to fit in the forum. If all is well it should be available on this link https://www.dropbox.com/s/pjdi0l9gys162 ... k.zip?dl=0

The handler of interest is in the stack and named "UpdateCreateXMPSidecars"

This reads a custom property of the stack which holds the boiler plate "text" copied from a valid xmp file.

I have disabled all but the button of interest. Pressing the button "Add Filename -tags- to xmp as keywords" will prompt for a folder of images and -tagwords- in the filename to a xmp sidecar file.

I hope that all makes some sense.

Simon
Stack files minus example dng image file.
Stacks.zip
main stack is named SideCarSync...
(24.2 KiB) Downloaded 82 times
best wishes
Skids

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1208
Joined: Thu Apr 11, 2013 11:27 am

Re: Save data using URL command mangles bit values

Post by LCMark » Sat May 21, 2022 2:19 pm

So `the XMLHead` custom property of your main stack contains a string which contains U+FFFD (essentially the 'unknown character' unicode character) at the place you were expecting three bytes. So why that is there, rather than what you expected, depends on how you set that custom property.

Its worth pointing out the 3 byte sequence which you were expecting is actually the UTF-8 encoded version of U+FFFD - so an alternative way (and morally correct way!) to do this is just do textEncode as utf-8 before putting into the binfile url.

Post Reply

Return to “Talking LiveCode”