Fastest method for CSV to MySQL table

Creating desktop or client-server database solutions?

Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller

Post Reply
newpie
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 155
Joined: Sat Jun 29, 2013 11:24 pm

Fastest method for CSV to MySQL table

Post by newpie » Sat Oct 10, 2015 2:03 am

Hello, I am newer to livecode and especially MySQL,

I have a csv file with around 2800 lines"
Format:
"#########";"ABCD";"#ABC";ABCDEFG";"ABCDEFGH";"2010/11/18 12:12:12"

This csv report gets replaced every hour and I wish to make it into an msql sql. Each time I need wipe current table and upload the new csv file as the old data is invalid (in the mysql table).

The MySQL table is designed: "ID","Column1", "Column2", "Column3", "Column4", "Column5", "Column6"

My test code:

Code: Select all

global gConnectionID
on mouseUp
   local tFileName, tFileContents
   answer file "Please choose a file to import" with type "Comma Separated Values|csv|CSV"
   
   if the result is not "cancel" then
      put it into tFileName
      ## Get the contents of the file
      put URL ("file:" & tFileName) into tFileContents
      connectToDB
      set the itemDelimiter to ";"
      put "TestTable" into tTableName
      put "Column1, Column2, Column3, Column4, Column5, Column6" into tFields
      revexecutesql gConnectionID,"BEGIN IMMEDIATE TRANSACTION"  -- not using a separate function for this, just using the native stuff.
      put the seconds into tStart -- track time
      repeat for each line tLine in tFileContents
         put item 1 of tLine into Column1
         put item 2 of tLine into Column2
         put item 3 of tLine into Column3
         put item 4 of tLine into Column4
         put item 5 of tLine into Column5
         put item 6 of tLine into Column6
         -- construct the SQL - the :1, :2 & :3 placeholders in the SQL will be filled by variables in the revExecuteSQL line
         put "INSERT INTO " & tTableName & " (" & tFields & ") VALUES (:1, :2, :3, :4, :5, :6)" into tSQL
         -- send the SQL to the database, filling in the placeholders with data from variables
         revExecuteSQL gConnectionID, tSQL, "Column1", "Column2", "Column3", "Column4", "Column5", "Column6"   
      end repeat
      revexecutesql gConnectionID, "COMMIT"
      closeDB gConnectionID 
      put the seconds - tStart -- show the time in milliseconds    
      answer tStart
   end if  
end mouseUp
I had a few questions as well related to the type of columns setup.

1. I setup colum2 as an interger but it doesn't come over, same with the date. Any help would be great.

2. I noticed when I ran this code it took 4.30 min to successfully complete. That is pretty long I think, as I am not sure what to do with users if they try to use the table before it gets refreshed. Any insight would be great.

3. Also I don't need the first line of the csv file as it is just headings anwyay.

Thanks for your time.

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9842
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Fastest method for CSV to MySQL table

Post by FourthWorld » Sat Oct 10, 2015 2:53 am

Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

newpie
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 155
Joined: Sat Jun 29, 2013 11:24 pm

Re: Fastest method for CSV to MySQL table

Post by newpie » Sat Oct 10, 2015 4:58 am

Thanks FourthWorld, but I have no control over if I get a csv file or not to convert to my MySQL database, so can't kill much here unfortunately. Any insights on making the code I have faster though or the questions I asked would be appreciated if anyone has time.

Thanks

FourthWorld
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 9842
Joined: Sat Apr 08, 2006 7:05 am
Location: Los Angeles
Contact:

Re: Fastest method for CSV to MySQL table

Post by FourthWorld » Sat Oct 10, 2015 6:02 am

I posted the link because the largest part of the article in the middle of that page is the most efficient and reliable CSV parser in our community to date.
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn

scrabbles
Posts: 25
Joined: Sat Dec 20, 2014 4:32 am
Location: Melbourne, Australia

Re: Fastest method for CSV to MySQL table

Post by scrabbles » Sat Oct 10, 2015 10:21 am

Hi,

Do you have a way to put a file on the server? If so, the fastest way might be to just use mysql's import functions https://dev.mysql.com/doc/refman/5.7/en/load-data.html

As for the int and date.. you sure the values are passed correctly, perhaps you are making the int a string inadvertantly? The date could be a mismatch in format. To debug you can try manually executing the sql with a row from the file and see what mysql says.

– Mark

Mikey
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 755
Joined: Fri Jun 27, 2008 9:00 pm

Re: Fastest method for CSV to MySQL table

Post by Mikey » Sat Oct 10, 2015 2:47 pm

2800 lines should be nearly blink-of-an-eye speed. There are fast ways to process CSV's in LC and slow ways. Sometimes, it isn't obvious why one way will be faster than another, and there are so many ways to accomplish things in LC, that you may never even consider testing a fast way, so don't worry about the situation you're in.

I see multiple opportunities to improve your code. I have not tested these against your code, specifically, so you know how to find all of us if I boffed something:
0) Your time tracking is not in milliseconds, but in seconds (see the comment on your "put" at the end where you put the stop seconds minus the start seconds)

1) Build your INSERT statement outside of the loop - no need to make it happen every single time, it isn't changing.

2) I think the reason your dates and integers aren't coming over is that you aren't dealing with the quotes that separate each of the items. I would suggest that you add a call to the CSV converter, previously cited (but don't forget to replace the comma reference in the script for a semicolon, in your case), or roll your own, but, again, the way you implement it will affect speed. In the very simplest of examples, where there are no embedded <CR>'s in fields and no semicolons embedded in fields, either, then you can even do this (before the loop):

Code: Select all

delete char 1 of tFileContents # leading quote
delete last char of tFileContents #trailing quote
replace quote&CR&quote with CR in tFileContents # remove the quote from the last item in each line and from the first item of the following line
replace quote&";"&quote with ";" in tFileContents # remove the trailing quotes on items and leading quotes on the following items on a line
But, again, probably just passing your file through the CSV parser is the most bulletproof solution.

3) Replace the put item... lines with split tLine by ";" #makes tLine into a parameter array for passing in the UPDATE statement
If you are used to static typing of variables, in other languages, and don't like the dynamic typing of containers in LC, then you can put tline into tlineArray followed by split tlineArray by ";", instead.

4) Change your revexecutesql to use the tLine (or tLineArray if you went that route) parameter array instead of the variable list.


Let us know if your speed improves.

newpie
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 155
Joined: Sat Jun 29, 2013 11:24 pm

Re: Fastest method for CSV to MySQL table

Post by newpie » Sun Oct 11, 2015 6:32 pm

Hello, I apologize for my late response.

@fourthworld = thank you for the clarification fourthworld, I will use the parser you referred to

@scrabbles = I love this idea, I will definitely research if I can do this. I would like to know multiple methods for my knowledge base as well so I will try the parser method too. The int and date come over now once I use the parser, fyi. So you were correct.

@Mikey
- I am glad to hear that it should be a fast process, the 4 min deal was little unnerving, lol.
0) Thank you for pointing out the milliseconds vs seconds, I forgot to change the comment after I changed it to seconds (due to it taking so long)

1) You build your insert statement outside the loop as it doesn't change. Sorry I am confused, but doesn't it change as it is looking at a new line of the csv thru each repeat loop? Any examples would be great if you have time.

Code: Select all

      repeat for each line tLine in tFileContents
         put item 1 of tLine into Column1
         put item 2 of tLine into Column2
         put item 3 of tLine into Column3
         put item 4 of tLine into Column4
         put item 5 of tLine into Column5
         put item 6 of tLine into Column6
         -- construct the SQL - the :1, :2 & :3 placeholders in the SQL will be filled by variables in the revExecuteSQL line
         put "INSERT INTO " & tTableName & " (" & tFields & ") VALUES (:1, :2, :3, :4, :5, :6)" into tSQL
         -- send the SQL to the database, filling in the placeholders with data from variables
         revExecuteSQL gConnectionID, tSQL, "Column1", "Column2", "Column3", "Column4", "Column5", "Column6"   
      end repeat
2) Yes the reason was the quotes, once I implemented fourthworld's parser they came over great

3) I would love to learn both ways you mentioned. Is there an example code/tutorial somewhere perhaps on "the split tLine". I don't have a full grasp on arrays, is there a tutorial or can you show me an example for me to analyze to work with my coding

4) Same as #3.

--------------------------------------
New question if I could:

I noticed that my tStart variable is not reseting. I tried to put "put empty into tStart" in the code but doesn't effect it. I read in another post about pendingMessages, but doesn't seem to help me.

Thanks for your patience and help.

Mikey
VIP Livecode Opensource Backer
VIP Livecode Opensource Backer
Posts: 755
Joined: Fri Jun 27, 2008 9:00 pm

Re: Fastest method for CSV to MySQL table

Post by Mikey » Sun Oct 11, 2015 7:05 pm

There is no need to recreate the INSERT - you aren't embedding anything in it. The statement that is sent to the SQL parser isn't changing when the data changes. The :1..:6 are placeholders to tell SQL engine that what LC is passing to it in the other parameters are the values that go with the placeholders. This is common with SQL engines.

SPLIT takes a container and parses it using the delimiters you specify into an array. The array is then being passed in the revExecuteSQL statement. That's all you have to do with the array.

So, if you use the SQL parser that Richard pointed you to, in the code, find the comma reference (near the end) and change it to a semicolon, because the sample code you sent us uses a semicolon to separate the values. Richard's code is designed for lines that use commas as separators, i.e. "A","B","C" instead of "A";"B";"C"

IF you use Richard's code, what you get back will be a tab-delimited container that lacks the semicolons and quotes marking the fields. So, your new code would be

Code: Select all

global gConnectionID
on mouseUp
   local tFileName, tFileContents
   answer file "Please choose a file to import" with type "Comma Separated Values|csv|CSV"
   
   if the result is not "cancel" then
      put it into tFileName
      ## Get the contents of the file
      put URL ("file:" & tFileName) into tFileContents
      put CSV2Tab(tFileContents) into tFileContents
      connectToDB
      set the itemDelimiter to ";"
      put "TestTable" into tTableName
      put "Column1, Column2, Column3, Column4, Column5, Column6" into tFields
      revexecutesql gConnectionID,"BEGIN IMMEDIATE TRANSACTION"  -- not using a separate function for this, just using the native stuff.
      put the seconds into tStart -- track time
      put "INSERT INTO " & tTableName & " (" & tFields & ") VALUES (:1, :2, :3, :4, :5, :6)" into tSQL #mikey this goes here, no need to recreate each time through

      #<mikey new loop>
      repeat for each line tLine in tFileContents
            split tLine by tab
            revExecuteSQL gConnectionID, tSQL, "tLine"
      end repeat #for each line tLine in tFileContents
      #</mikey new loop>
     revexecutesql gConnectionID, "COMMIT"
      closeDB gConnectionID 
      put the seconds - tStart
      answer tStart
   end if # the result is not "Cancel"
end mouseUp



function CSV2Tab pData
  local tNuData -- contains tabbed copy of data
  local tReturnPlaceholder -- replaces cr in field data to avoid line
  --                       breaks which would be misread as records;
  --                       replaced later during dislay
  local tEscapedQuotePlaceholder -- used for keeping track of quotes
  --                       in data
  local tInQuotedText -- flag set while reading data between quotes
  --
  put numtochar(11) into tReturnPlaceholder -- vertical tab as
  --                       placeholder
  put numtochar(2)  into tEscapedQuotePlaceholder -- used to simplify
  --                       distinction between quotes in data and those
  --                       used in delimiters
  local k -- char string
  --
  -- Normalize line endings:
  replace crlf with cr in pData          -- Win to UNIX
  replace numtochar(13) with cr in pData -- Mac to UNIX
  --
  -- Put placeholder in escaped quote (non-delimiter) chars:
  replace ("\"&quote) with tEscapedQuotePlaceholder in pData
  replace quote&quote with tEscapedQuotePlaceholder in pData
  --
  put space before pData   -- to avoid ambiguity of starting context
  split pData by quote
  put False into tInQuotedText
  repeat with tCounter = 1 to the number of lines in the keys of pData
    put pData[tCounter] into k
    if (tInQuotedText) then
      replace cr with tReturnPlaceholder in k
      put k after tNuData
      put False into tInQuotedText
    else
      replace ";" with tab in k #mikey because the format we're using uses a semicolon instead of a comma
      put k after tNuData
      put true into tInQuotedText
    end if
  end repeat
  --
  delete char 1 of tNuData -- remove the leading space
  replace tEscapedQuotePlaceholder with quote in tNuData
  return tNuData
end CSV2Tab


newpie
Livecode Opensource Backer
Livecode Opensource Backer
Posts: 155
Joined: Sat Jun 29, 2013 11:24 pm

Re: Fastest method for CSV to MySQL table

Post by newpie » Mon Oct 12, 2015 6:42 pm

Wow, this is great Mikey. Thank you so much. I will try this soon and let you know the results.

Any idea why this tStart doesn't want to reset:
New question if I could:

I noticed that my tStart variable is not reseting. I tried to put "put empty into tStart" in the code but doesn't effect it. I read in another post about pendingMessages, but doesn't seem to help me.

Thanks for your patience and help.
Thanks again

Later Edit:
Now it seems to be resetting correctly. Not sure why was not before. I will let you know if crops again. I also see on my local server it is running quite fast now at 400 milliseconds. I will try to test on public one.

Thanks for help.

Post Reply

Return to “Databases”