text copied from LC generated PDF, WTF?

Anything beyond the basics in using the LiveCode language. Share your handlers, functions and magic here.

Moderators: Klaus, FourthWorld, heatherlaine, kevinmiller, robinmiller

Klaus
Posts: 11752
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: text copied from LC generated PDF, WTF?

Post by Klaus » Thu Feb 20, 2020 12:19 pm

HI AxWald,

OK, get it, here is the PDF, good luck.
Noone on the mailing list thought this might be a bug!? :?


Best

Klaus
Attachments
drucken.pdf.zip
(17.22 KiB) Downloaded 32 times

LCMark
Livecode Staff Member
Livecode Staff Member
Posts: 1087
Joined: Thu Apr 11, 2013 11:27 am

Re: text copied from LC generated PDF, WTF?

Post by LCMark » Thu Feb 20, 2020 1:55 pm

Noone on the mailing list thought this might be a bug!? :?
Well someone (i.e. me) said it wasn't a bug on the forums - I can say it isn't a bug on the mailing list too if that helps ;)

As I said above, try explicit fonts (e.g. Courier, Arial, Helvetica) and see if things improve.

The font in that PDF is '.SFNSText' which is some macOS internal thing tied to the default system label font - it seems numeric glyphs aren't mapping back to numbers when using that (the other text is fine).

Klaus
Posts: 11752
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: text copied from LC generated PDF, WTF?

Post by Klaus » Thu Feb 20, 2020 2:21 pm

Hi Mark,
LCMark wrote:
Thu Feb 20, 2020 1:55 pm
Noone on the mailing list thought this might be a bug!? :?
Well someone (i.e. me) said it wasn't a bug on the forums - I can say it isn't a bug on the mailing list too if that helps ;)
yes, thank you, very helpful! :-D
LCMark wrote:
Thu Feb 20, 2020 1:55 pm
As I said above, try explicit fonts (e.g. Courier, Arial, Helvetica) and see if things improve.
Again, I don't really care for myself, it is a matter of principle!
But changing the fon to "Lucida Grande" solved the mistery, copying the first column (05.01.20) and pasting it into TextEdit I get now:
05.01.20


Yes, and two empty lines, but we are at least used to THIS behaviour when copying text from PDF files. :-D
LCMark wrote:
Thu Feb 20, 2020 1:55 pm
The font in that PDF is '.SFNSText' which is some macOS internal thing tied to the default system label font - it seems numeric glyphs aren't mapping back to numbers when using that (the other text is fine).
Yes, the fields font was (pre-)set to (Text), so one would not exspect that phenomenon.


Best

Klaus

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7653
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: text copied from LC generated PDF, WTF?

Post by FourthWorld » Thu Feb 20, 2020 5:06 pm

bogs wrote:
Wed Feb 19, 2020 10:20 am
LCMark wrote:
Wed Feb 19, 2020 8:52 am
P.S. Cross-posting to the forums and mailing-list is a little irksome and not very good practice. Where is someone meant to reply if they are on both?
I have to admit, I'm not sure why this would be 'irksome' or a 'not a good practice' myself.
The problem is a larger form of the issue raised by creating duplicate threads for a given topic in these forums: it introduced ambiguity to the reader about where they should reply, and raises the likelihood that future readers will encounter an incomplete version of the discussion.

When an authoritative answer becomes available (hard to find a better authority on the inner workings of LC than the lead engineer), the only way to make sure future readers find the right answer is to duplicate the answer in each venue where the question was duplicated.

Had Mark only replied to the list, those readers would have an adequate understanding of the nature of the problem, while readers here would believe the issue is somehow a bug in LC.

A better approach to covering bases without replicating the discussion itself might be to post the question here, and on the list merely describe the issue with a link to this forum thread, and encourage replies to go here.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7653
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: text copied from LC generated PDF, WTF?

Post by FourthWorld » Thu Feb 20, 2020 5:31 pm

Klaus wrote:
Wed Feb 19, 2020 10:56 am
I do not want to convert the PDF to text as I wrote...
Ah, but you do: the goal is to translate the hodge podge of vector imaging instructions that is PDF into a string, for at least the portion the user has copied from the document.
as always I am just trying to put myself in the position of a newbie!
They will consider this a bug in the software that had generated the PDF AND they will always use what they already got with
the OS like in my case.
Supporting user expectations is a noble and worthy goal all earnest developers aspire to. But when an expectation is unmeetable, the problem can be less managing the technology than about managing expectations.

For example, if a user wants to save a file to a specific volume they've unmounted, as much as we'd like to satisfy the expectation we understand that doing so isn't possible. So instead we have to provide guidance to manage the expectation, to remind them that even though the volume may still be plugged into the computer, if they've unmounted it it's no longer available and will need to be remounted. Yes, it's technical gobbledygook that we'd rather not introduce into our conversations with users, but from time to time with computing technical limitations are unavoidable, and must be explained.

In an ideal world, PDF usage would be limited to the one use case it was designed for: conveyance of a document destined for a printer.

For all other use cases where reading the document on printed paper is not the goal, PDF is a subpar experience, ever more so as screen sizes only continue to diversify. Most of those would be better handled with ePub, which is not only designed to reflow document content to fit the reader's screen but also uses actual text which can be reliably copied.

But alas, we don't live in an ideal world. Thanks to a series of historical accidents, we live in a world where PDF is used in far more cases than it was either designed for or is even the better choice, and until OS vendors modernize to make "Print to ePub" as ubiquitous as the less useful "Print to PDF" they have now, the widespread misuse of PDF can be expected to continue.

As app devs, we're unlikely to shift the tide convincingly in a more sane direction. So we're left with the most challenging user experience task of all: managing expectations.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

dunbarx
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 6629
Joined: Wed May 06, 2009 2:28 pm
Location: New York, NY

Re: text copied from LC generated PDF, WTF?

Post by dunbarx » Thu Feb 20, 2020 7:25 pm

There is a pdf to ePub converter, but it is a web based gadget:
https://ebook.online-convert.com/convert-to-epub

I have never needed to read a pdf in my LC world. This even though LC makes pdf's every day here, which are edited after the fact and then sent along to whomever.

Even if the above converter was a resident application, would it be straightforward to use? Do I assume AppleScript at the least?

Craig

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7653
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: text copied from LC generated PDF, WTF?

Post by FourthWorld » Thu Feb 20, 2020 7:33 pm

dunbarx wrote:
Thu Feb 20, 2020 7:25 pm
Even if the above converter was a resident application, would it be straightforward to use?
Unlikely to solve the problem Klaus is experiencing, given the nature of PDF itself.

From Mark Waddingham's post earlier:
PDFs don't actually contain the source text - all PDF viewers have to reverse engineer the text content by looking at the glyphs, fonts, their locations then reverse mapping those details back into a linear form of text. All PDF viewers do this differently and It isn't 100% accurate (most do it quite badly actually, for more than simple western-language paragraphs!). The text layout and fonts used can have a great effect on the efficacy of the process.
https://forums.livecode.com/viewtopic.p ... 83#p188189
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

SparkOut
Posts: 2167
Joined: Sun Sep 23, 2007 4:58 pm

Re: text copied from LC generated PDF, WTF?

Post by SparkOut » Thu Feb 20, 2020 9:30 pm

FourthWorld wrote:
Thu Feb 20, 2020 5:06 pm
A better approach to covering bases without replicating the discussion itself might be to post the question here, and on the list merely describe the issue with a link to this forum thread, and encourage replies to go here.
Oh yes! That would be brilliant. (If only...)

okk
Posts: 125
Joined: Wed Feb 04, 2015 11:37 am

Re: text copied from LC generated PDF, WTF?

Post by okk » Wed Feb 26, 2020 11:25 am

Hi, I can't help much with the original question posted by Klaus but wanted to chime in on the second thread that emerged here: I am a bit confused about the mailing-list and the forum running in parallel. Is there some logic when to post to the mailing list and when to the forum? I find the forum more user-friendly, but I noticed that the mailinglist is quite active and that there is not much crossreferencing between the two.

I second Richards suggestion 100%.

Best
Oliver

bogs
Posts: 4620
Joined: Sat Feb 25, 2017 10:45 pm

Re: text copied from LC generated PDF, WTF?

Post by bogs » Wed Feb 26, 2020 12:25 pm

The mailing list is far older than the forums. Almost everyone who was in this way back when was signed into it, and became used to the (pick one, format, method of reading, method of responding, etc).

The forums showed up later, date unknown. People who were new to Lc were encouraged to sign up for it. Most people newer to computers in general are used to the (pick one, format, method of reading, method of responding, etc) of forums.

The older users don't cross over much as you've mentioned. The newer members don't cross over much either. Some members get tired of having to enter email addresses and sign in's for 1 product and feel that one entry should stand everywhere for that product.

For instance, if you sign up for the 4 major parts Lc asks for, you'd =
provide email and password to register initially...
provide email and password to join forum ...
provide email and password to join Quality DB ...
provide email and password to join Use list ...

Then you'd have to keep track of each if you so desire. I have an RSS reader which sets up the forum posts for me, but I get more than enough emails from everywhere else to have to go in and check use-lists responses. I'm sure you see where that is going.
Image

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 7653
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: text copied from LC generated PDF, WTF?

Post by FourthWorld » Wed Feb 26, 2020 5:22 pm

bogs wrote:
Wed Feb 26, 2020 12:25 pm
For instance, if you sign up for the 4 major parts Lc asks for, you'd =
provide email and password to register initially...
provide email and password to join forum ...
provide email and password to join Quality DB ...
provide email and password to join Use list ...

Then you'd have to keep track of each if you so desire. I have an RSS reader which sets up the forum posts for me, but I get more than enough emails from everywhere else to have to go in and check use-lists responses. I'm sure you see where that is going.
And then there's:
Github: https://github.com/livecode/livecode
Stack Overflow: https://stackoverflow.com/questions/tagged/livecode
LinkedIn: https://www.linkedin.com/groups/50811/
Facebook: https://www.facebook.com/groups/livecodeusers/
And there are more, including several language-specific forums for German, Italian, Spanish, and others.

This is simply how communities organically grow when they have the good fortune of being around a while.

Use what works for you.
Richard Gaskin
Community volunteer LiveCode Community Liaison

LiveCode development, training, and consulting services: Fourth World Systems: http://FourthWorld.com
LiveCode User Group on Facebook : http://FaceBook.com/groups/LiveCodeUsers/

Post Reply

Return to “Talking LiveCode”