Returns in a column

Creating desktop or client-server database solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Returns in a column

Post by GregWills » Sat Jul 13, 2019 5:19 am

Hi Everyone

A question regarding a .csv file. I have a spreadsheet with a number of single word columns. One column however is a ‘Notes’ column with text and returns.

In Excel, the ‘Notes’ data is displayed and contained within the column. When this is imported into LiveCode each return creates a new line. So instead of 20 entries, there are 32, for example.

I have tried different delimiters for each item, however the return continue to do what it thinks it should do!!

Is there a way to ‘contain’ the text and return data in one item in an import?

Thanks

Greg

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: Returns in a column

Post by bogs » Sat Jul 13, 2019 11:10 am

Heya Greg,

What is actually separating the columns? commas, tabs, some other symbol?
Image

Klaus
Posts: 13793
Joined: Sat Apr 08, 2006 8:41 am
Location: Germany
Contact:

Re: Returns in a column

Post by Klaus » Sat Jul 13, 2019 11:28 am

Hi Greg,

that is really a pain in the ass! :(

Unfortunately there is no easy solution for this problem, but I remebered this link, which give you a working function for this incinvenience: https://www.fourthworld.com/embassy/art ... t-die.html

I really dig the headline! :D


Best

Klaus

bogs
Posts: 5435
Joined: Sat Feb 25, 2017 10:45 pm

Re: Returns in a column

Post by bogs » Sat Jul 13, 2019 11:59 am

LOL ! I remember when I read that, and it is still a good read today :D

I still think using null characters is the way to go with making a csv. After all, they can't be typed in on a keyboard, and most people would have no idea how to create one.

You might also want to look at this thread, and in particular this post by Thierry in particular for another take on it.
Image

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Returns in a column

Post by AxWald » Sun Jul 14, 2019 1:39 pm

Hi,
abusing a spreadsheet for layout or database tasks is the first step toward the swamps of sorrow ...
Additionally, Excel (at least my 2003 here) has poor support for .csv (Character Separated Values).

Still you're not lost:
  1. When saving as .csv (comma/semicolon separated) or .txt (tab separated) Excel will quote fields that contain 'dangerous' chars.
    So your task is to repeat through the text until you find [delimiter& quote], from there you're "in a field", until you find the closing [quote & delimiter]. And so on.
    Not elegant, but rather fast.
    .
  2. Or you read your Excel .csv into Access, and export it from there - in Access you can specify the separator chars. (But Access will replace your CR with empty)
    .
  3. Or you could use ODBC to connect to your Excel sheet, and use it with the revdb functions. With similar problems, I assume.
I'd use solution #1, writing a quick parser:

2 modes, switched whenever you encounter [field delimiter & quote] (= In Field) and
[quote & field/record delimiter] (= Out of Field).
When "in Field", change the dangerous chars (that are delimiters outside of it) to something innocent (•¶Ÿ¥¤¦¬ ...).
Quick & dirty code example (not really tested!):

Code: Select all

   put ";" into myFD      --  this is what Excel gives me as FD
   put CR into myRD
   put "•" into myReplFD  --  new field delimiter in text field
   put "¬" into myReplCR  --  new CR in text field
   put "¤" into myReplQuote   --  temp. char for unused quotes, to kill later
   put URL "file:D:\_Development\Mappe1.csv" into myData
   put false into myInFld
   repeat with i = 1 to len(myData)
      put char i of myData into thisChar
      if not myInFld then                                             --  we're outside a quoted field:
         if (thisChar = myFD) and (char i+1 of myData = quote) then   --  and found the beginning of a quote
            put true into myInFld                                     --  switch mode
            put myReplQuote into char i+1 of myData                   --  prepare quote for killing
         end if
      else                                                            --  we're inside a quoted field:
         if (thisChar = quote) and \
               ((char i+1 of myData = myFD) or (char i+1 of myData = myRD)) then   --  and found the end of a quote
            put false into myInFld                                    --  switch mode again
            put myReplQuote into char i of myData                     --  prepare quote for killing
         else
            if (thisChar = myFD) then put myReplFD into char i of myData   --  replace FD inside text fld
            if (thisChar = myRD) then put myReplCR into char i of myData   --  replace RD inside text fld
         end if
      end if
   end repeat
   replace myReplQuote with empty in myData                           --  kill unused quotes
   put myData
Here I only changed hazardous characters inside the text fields, and removed the quotes Excel used to designate these. Feel free to change these again when inserting the data in the appropriate fields of your GUI.

Notice that ";" and CR (the Excel delimiters), are now only used as such anymore, and can be quickly replaced to anything more desirable, at the end of the script.

When it comes to ASCII delimiters, look here, I usually suggest ASCII 28 - 31, as these are designated as such! But the ones I used work well, too (and are visible ;-) ).

Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9249
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Returns in a column

Post by richmond62 » Sun Jul 14, 2019 1:49 pm

Just run through you "guff" as it is imported and replace the carriage returns with something else . . .

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Re: Returns in a column

Post by GregWills » Tue Jul 16, 2019 8:05 am

Thank you all for your replies. I have been wandering around in this 'swamp of sorrow' :-) for some time now!

I will continue to see if I can get out, but I need some more assistance, before I go under :-(

AxWald, I liked your approach, but couldn't get your script to work. Am I missing something?

I also enjoyed Richard's article and all the discussion that followed so many years ago!

I still have some things to try, but if there are any scraps of food available to feed the crocodiles in the swamp, they will gratefully be accepted.

Cheers

Greg

richmond62
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 9249
Joined: Fri Feb 19, 2010 10:17 am
Location: Bulgaria

Re: Returns in a column

Post by richmond62 » Tue Jul 16, 2019 11:01 am

I have a feeling that you need to replace your CR chars with something like a SPACE.

HOWEVER, as far as I recall, there are several different Unicode addresses that render as a CR . . .

The 'standard' CR is U+000D

This is slightly unpleasant:

https://stackoverflow.com/questions/309 ... -form-feed

You have, certainly got to work out if your problem is with CR or LF (line feed).

'standard' LF is U+000A

Here's something lumpy as well:

"What is CRLF and LF?
The term CRLF refers to Carriage Return (ASCII 13, \r) Line Feed (ASCII 10, \n). ... For example: in Windows both a CR and LF are required to note the end of a line, whereas in Linux/UNIX a LF is only required. In the HTTP protocol, the CR-LF sequence is always used to terminate a line."

One thing that I don't understand is HOW (except, possibly by Copy-Paste) text with CR chars was inserted into spreadsheet cells. Certainly, in CALC (LibreOffice) hitting the ENTER key takes me to another cell in the spreadsheet.

Um, possibly: "By default key enter finishes editing, to let users to insert LF by Enter key, set AcceptEnters='1', the editing will be finished by Ctrl+Enter or Shift+Enter."

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Returns in a column

Post by AxWald » Wed Jul 17, 2019 11:26 am

Hi,
GregWills wrote:
Tue Jul 16, 2019 8:05 am
AxWald, I liked your approach, but couldn't get your script to work. Am I missing something?
It may be that your Excel writes different .csv's. My demo was this:
spreaddemo.png
a tiny spreadsheet
spreaddemo.png (2.88 KiB) Viewed 10059 times
Saved as .csv it looks like this:

Code: Select all

a;"This is
a text.";c
1;2;3
When I paste my code into a button & run it, msg contains:

Code: Select all

a;This is¬a text.;c
1;2;3
Remember:
AxWald wrote:
Sun Jul 14, 2019 1:39 pm
Here I only changed hazardous characters inside the text fields, and removed the quotes Excel used to designate these.
It may be that your Excel writes the csv with comma - it may be that in English based language settings this is used. After all, me, based in western Europe, never ever encountered one of the real comma-delimited .csvs that Richards condemns with such ferocity ;-)
In this case you better start the code this way:

Code: Select all

   put comma into myFD
   -- put ";" into myFD      --  this is what Excel gives me as FD
Another culprit may be the line/ record delimiter. My text editor shows me the line delim in the text field as LF, whereas the records are delimited with CRLF. My use of CR in the code should take care of this (CR/ return has shown as rather robust, for me)

Then, it may be that you're using a bleeding edge version of LC that has a bug/ regression - I write with 6.7.10 & tested with 9.0.1, and it worked.

Hope this helps, have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Re: Returns in a column

Post by GregWills » Thu Jul 18, 2019 8:26 am

Thank you so much AxWald for your very clear explanation. Much appreciated.

That was the problem. I am in Australia, so there are many commas running wild here. I have decided to use OpenOffice, as it provides more control over what csv’s will look like when saved. I’ve abandoned Excel, as it was stripping out any European characters in the spreadsheet too (not so commonly used here in OZ).

You have given me much to play with, so I will see if I can put this issue to rest (for me at least), but I may be back :-)

When I ran your solution with my database (over 11,000 entries, but over 33,000 lines when imported!!), it wasn’t happy and I needed to stop the import after more than 10 minutes.

Regards

Greg

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9801
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Returns in a column

Post by FourthWorld » Thu Jul 18, 2019 9:29 am

IIRC, the field enhancements in v5.5 include rendering ASCII 11 as Vertical Tab.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

AxWald
Posts: 578
Joined: Thu Mar 06, 2014 2:57 pm

Re: Returns in a column

Post by AxWald » Thu Jul 18, 2019 3:52 pm

Hi,
GregWills wrote:
Thu Jul 18, 2019 8:26 am
When I ran your solution with my database (over 11,000 entries, but over 33,000 lines when imported!!), it wasn’t happy and I needed to stop the import after more than 10 minutes.
Ooops. Wanted to do some timing, and created a file like this:

Code: Select all

"zero¬one",two,three,four,five,six,seven,eight,nine
zero,"one¬two",three,four,five,six,seven,eight,nine
zero,one,"two¬three",four,five,six,seven,eight,nine
...
where the "¬" are CRs.
Records: ~300000; Size resulting: ~ 15 MB
My script (as in my prev. post) runs it in ~8 - 10 seconds on my venerable HP business desktop. So you obviously have huge records, and the data file must be in the GB area.
(Not to mention that my script fails if the first field in a record is a text field ...)

I tried some other possibilities (using offset() and such ...) and all was ways slower.
We could speed it up using a "repeat for each char" approach (building an output variable during the loop), but this would also double the RAM requirements. And I assume we're already tight here?

What would I do? Well, .csv's usually come from databases, or can be fed to them.
The export as this here (mySQL dialect):

Code: Select all

SELECT ID, REPLACE(aText, CHAR(10), '¬') AS newText FROM `t_test` ...
will save you a lot of hassle. You may replace 'CHAR(10)' with 'CHAR(13)' or 'CHAR(13,10)', depending of your data.
If you have access to a database on a real database server, this is not noticeable slower than a normal export.

Have fun!
All code published by me here was created with Community Editions of LC (thus is GPLv3).
If you use it in closed source projects, or for the Apple AppStore, or with XCode
you'll violate some license terms - read your relevant EULAs & Licenses!

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Re: Returns in a column

Post by GregWills » Sun Jul 21, 2019 3:33 am

:-(
I have been trying to apply Alex Tweedly’s script but am struggling to understand how I use it. I haven’t required a custom function before so have been reading and exploring how they are constructed.

Unfortunately I can’t work out how to call this script or what ‘defaults’ the script is referring to. Therefore I keep on bumping into a brick wall (a few actually!).

Any assistance to get me out of this dark space will be greatly appreciated. (i.e., how to refer to the function in a button script, and what defaults are needed - the tab delimited file, I can do!)

Thanks

Greg

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Re: Returns in a column

Post by GregWills » Mon Jul 29, 2019 6:17 am

Hi All
I have been exploring AxWald’s approach to removing returns in an item and am having success at removing the returns from imported data, adding data to this item, then replacing the returns on export - to be opened in OpenOffice (then Joomla).

I know that item 15 in the import has returns within the item, so only need to convert this item only.

With small numbers of individual items (abut 4,000 in total), the conversion takes about 7 seconds. When the number of items in the data comes larger, (6,000) the conversion takes about 40 seconds. Over 70,000 items takes over 10 minutes. (There are over 180,000 items in the current database, so the conversion is not practical.)

Is there a way of speeding up this script?

Code: Select all

on mouseUp
   set the itemDelimiter to ";" -- each item is also quoted
   put field "Data" into tDataBaseHold -- there are 16 items for each student entry
   put the number of items of tDataBaseHold-1 into tHowManyItems -- minus 1 works?
   put the round of (tHowManyItems / 15) into tStudentNumbers -- get the number of student entries in the database
   -- want the 15th item for each student (a data area with returns within the item)
   
   put 0 into tCounter
  
   repeat for tStudentNumbers
      add 1 to tCounter
      put tCounter * 15 into tNextSet -- item 15 of the next student
      put item tNextSet of tDataBaseHold into tSecond 
      repeat for each char tChar in tSecond
         if tChar is return then
            replace return with "¬" in tChar
         end if
         put tChar after tCharHold
      end repeat
      put tCharHold into item tNextSet in tDataBaseHold
   end repeat
   
   put tDataBaseHold into field "Result"
   
end mouseUp
Cheers
Greg

GregWills
Posts: 69
Joined: Sun Aug 30, 2015 7:51 am

Re: Returns in a column

Post by GregWills » Mon Jul 29, 2019 6:40 am

Interesting how writing it down and seeing the information in another format can help!!

I have answered my own question!

Code: Select all

repeat for tStudentNumbers
      add 1 to tCounter
      put tCounter * 15 into tNextSet -- item 15 of the next student
      put item tNextSet of tDataBaseHold into tSecond 
      replace return with "¬" in tSecond
      put tSecond into item tNextSet in tDataBaseHold
   end repeat
I'll now run some tests.

Post Reply

Return to “Databases”