|
|
|||||||
Hello. I have a problem. We are downloading lots of amount of zip-files from central servers. Each of these zip-files consists of several separate files. Each file needs to be modified and inserted text AFTER a special phrase. This phrase may be repeated several times and the insertion must be done accordingly. What I want to do, is make a script which finds all files under this dir (and possible under-dirs), open each file, finds this phrase, inserts a new line with data beneath this, and repeats this till eof. Then the original file should be written in the same location. Then repeat this until all files has been processed. I am looking through the posts here, but am not finding a good place to start. Help will be much appreciated. Jon |
||||||||
|
|
|||||||
Don't have something like that at hand right now but you could use the DirPlus() UDF to enumerate all files, the ReadFile() UDF to read each file one by one, the build in InStr() function to see if the line has the text you are looking for and add some stuff to it if it has, after all this write the line to a temp file using the build in WriteLine function, rename the original file to .old or something and rename the temp file to the original filename. I'm sure that there are other ways but this is one example that would work. UDF Library » DirPlus() - a recursive dir tool UDF Library » ReadFile() - Read a file into an array |
||||||||
|
|
|||||||
Well are you unzipping these files or leaving them as zip files? That is an important item that I don't see you mention. If left alone then each would need to be unzipped, update file, then rezip again. What about full or relative paths in the zip? |
||||||||
|
|
|||||||
Woops missed the zip part. Thanks Doc. 7zip for example works just fine for unzipping and/or zipping files using a script. http://www.7-zip.org/ |
||||||||
|
|
|||||||
The zip-part isn't the most essential part of the script, I imagine this would complicate the script dramatically if this should be implemented as well. But if anyone has good suggestions on how to implement this in the script, that would be VERY good. Jon |
||||||||
|
|
|||||||
As suggested you could use 7zip. If you get all filenames (zip files) and create a folder with the filename for each file and unzip its contents there then it would be easy to re-zip the files when the entire script is done. |
||||||||
|
|
|||||||
THINK! (Ex-IBM) Dealing with ZIP files can be incredibly complex... Do the zip files contain paths? How will you identify those paths? Do you need to modify the file content IN the zip file? That implies extracting the file, modifying it, and putting it back in the same logical place in the zip file. Will multiple zip files have identical content? If so, you need to process the zips one by one instead of first extracting them all and then processing the data. You need to deal with extracting the files, modify the files, put the files back in the zip, and remove the files you extracted without removing any other files. This would be much easier if, for example, you received the zip files and merely needed to extract them for processing. Then you could enumerate all the zip files with a Dir(), extract the files in each one, and then process all the resulting files - removing the Zip files after the extraction was complete. Of course, this assumes that the same file name is not present in multiple zip files. My point is - it's relatively simple to process files, search for content, and then modify (I'll outline that in a moment), but throwing in zip files requires careful planning and a knowledge of the zip file content and structure. As for basic search/replace, here's the basic idea... Code: $SearchData = 'xyzzy' ; insert when line containing this is found $InsertData = 'AbraCadabra' ; data line to insert $RootPath = 'D:\folder...' ; root path where files live $Files = DirList($RootPath, 6) ; return a list of all files with complete paths For Each $File in $Files Move $File 'WorkFile.tmp' ; rename the file If Open(2, 'workfile.tmp') $ = RedirectOutput($File) ; output to original file $Line = ReadLine(2) While Not @ERROR If InStr($Line, $SearchData) $InsertData ? ; output extra (inserted) line EndIf $Line ? ; output original line $Line = ReadLine(2) Loop $ = Close(2) $ = RedirectOutput('') ; close output file Del 'workfile.tmp' EndIf Next This is UNTESTED, and presented to illustrate the logic flow. This could easily be modified to REPLACE a line or APPEND a line rather than INSERT a line. A nice mod might be to define a var $MODE, and use a Case statement to Insert, Replace, or Append depending on the value of $MODE. Glenn |
||||||||
|
|
|||||||
If you know the exact file name and path inside the archive you could extract just that file, update it, then UPDATE the archive with it again on the fly. But that requires that you know EXACTLY the name and path and it has to always be that. Otherwise as Glenn says it becomes much more complicated and you would need to do file cleanup and maybe even a folder of its own for each user/zip file you deal with - or work on them individually which could be a slow process, but safer as well. Can it be done - YES - is it simple - NO but it can be done. However it could take a lot of work and time debugging from someone that does understand KiX so you may need to learn KiX some more so that you can debug it or hope that someone here has the time to work with you on debugging it. |
||||||||
|
|
|||||||
Originally Posted By: Glenn Barnas THINK! (Ex-IBM) Dealing with ZIP files can be incredibly complex... Do the zip files contain paths? How will you identify those paths? Do you need to modify the file content IN the zip file? That implies extracting the file, modifying it, and putting it back in the same logical place in the zip file. Will multiple zip files have identical content? If so, you need to process the zips one by one instead of first extracting them all and then processing the data. Well guys.. as I said, the zip-thing isn't the most essential thing, so if i accomplish integrating the other part at first, I can work with implementing this as a "bonus". Anyway.. when using the potential zip-solution, relative paths isn't an issue. The zip-files to be processed will all be in one dir, there must be created dirs according to zip-name under this dir, then extract each zip into that dir. The files that has been processed later in the script does NOT need to be zipped again afterwards. I will now start working more intensively on the other part of the script, using (and looking at ) some of the solution you guys have mentioned. Thanks for all help so far, and I will MOST certain be in need of more Jon |
||||||||
|
|
|||||||
Originally Posted By: NTDOC If you know the exact file name and path inside the archive you could extract just that file, update it, then UPDATE the archive with it again on the fly. But that requires that you know EXACTLY the name and path and it has to always be that. Otherwise as Glenn says it becomes much more complicated and you would need to do file cleanup and maybe even a folder of its own for each user/zip file you deal with - or work on them individually which could be a slow process, but safer as well. Can it be done - YES - is it simple - NO but it can be done. However it could take a lot of work and time debugging from someone that does understand KiX so you may need to learn KiX some more so that you can debug it or hope that someone here has the time to work with you on debugging it. Yes, I reckoned this wouldn't be quite a walk in the park. Regarding kix, i have used it in rather simple ways, and therefore thought it would be smart to hear with you guru's to be pointed in the right direction instead of fumbling around by myself. Glenn: thanks for the example, this seems quite stratight-forward. BUT, there's one thing: all the files from one zip extracted will need to be inserted another text than the next zip. The text to be inserted will be based on the name of the zip-file (the first to chars) I am not sure of how complicated I will do this script, maybe I must accept that the organizing of the zips in advance is one way to do it, than with a script that is much simpler. Jon |
||||||||
|
|
|||||||
Jon The logic example I presented could be wrapped up into a UDF. You would pass it the SearchData, NewData, and (with the Select/Case mod) Method arguments. You'd prepare the directory ahead of time, placing or extracting flies to "WorkFolder", and then after modification, "workfolder" would be renamed using something like a sequence ID. You could also perpare the folder and pass the folder name to the UDF, but - for me - using a temporary WorkFolder name allows the process to be interrupted and recovered more easily. Glenn |
||||||||
|
|
|||||||
Originally Posted By: Glenn Barnas As for basic search/replace, here's the basic idea... Code: $SearchData = 'xyzzy' ; insert when line containing this is found $InsertData = 'AbraCadabra' ; data line to insert $RootPath = 'D:\folder...' ; root path where files live $Files = DirList($RootPath, 6) ; return a list of all files with complete paths For Each $File in $Files Move $File 'WorkFile.tmp' ; rename the file If Open(2, 'workfile.tmp') $ = RedirectOutput($File) ; output to original file $Line = ReadLine(2) While Not @ERROR If InStr($Line, $SearchData) $InsertData ? ; output extra (inserted) line EndIf $Line ? ; output original line $Line = ReadLine(2) Loop $ = Close(2) $ = RedirectOutput('') ; close output file Del 'workfile.tmp' EndIf Next This is UNTESTED, and presented to illustrate the logic flow. This could easily be modified to REPLACE a line or APPEND a line rather than INSERT a line. A nice mod might be to define a var $MODE, and use a Case statement to Insert, Replace, or Append depending on the value of $MODE. Glenn I've started to work on this more intensively today, and took this code to start with. I understand that this also uses dirlist.udf. The basic works fine, but what's strange, is that the script doesn't do anything with file #1, it just cuts and paste file#1 into file#2(pastes into the beginning of file). File #2 is being processed, and the insertion is ok, but as you see.. the file#1 is then a part of this file. After this file#1 gets deleted, that is, this isn't totally true, the file is being put where the kix-script is started from, and has been renamed to 'workfile.tmp' Haven't tried with more then 2 files yet. Jon |
||||||||
|
|
|||||||
Code: If Open(2, 'workfile.tmp')=0 Had to add '=0' at the end of that line. That did the trick.. Each file is now being processed. So far so good. Jon |
||||||||
|
|
|||||||
Code: $SearchData = 'objtype' ; insert when line containing this is found $RootPath = 'c:\test\sosi\' ; root path where files live $Files = DirList($RootPath, 6) ; return a list of all files with complete paths For Each $File In $Files $kommunenr = SubStr ("$file", 14,2) ; extract "county"-number from filename $InsertData = '..KOMMUNE $kommunenr' ; data line to insert Move $File 'WorkFile.tmp' ; rename the file If Open(2, 'workfile.tmp')=0 $ = RedirectOutput($File) ; output to original file $Line = ReadLine(2) While NOT @ERROR If InStr($Line, $SearchData) $InsertData ? ; output extra (inserted) line EndIf $Line ? ; output original line $Line = ReadLine(2) Loop $ = Close(2) $ = RedirectOutput('') ; close output file Del 'workfile.tmp' EndIf Next This is how the script looks so far. Right now the files is extracted in an explicit dir, and the script handles all files here, inserting a line, including some text and the two first chars of the filename for each hit on specific text. I will next start expanding the script, trying to implement files to be processed from different directories, and look into the possibility to handle the unzipping as well Jon |
||||||||
|
|
|||||||
Once you've got the logic working, think about makeing it a function that can be called. Test again.. then, with the core code working, it's easier to expand the functionality of the overall script. Everything after this line: $InsertData = '..KOMMUNE $kommunenr' ; data line to insert up to the Next can be put in a function and called by ModFile($File, $SearchData, $InsertData) Speaking of that line, it should be more like: $InsertData = '..KOMMUNE ' + $kommunenr ; data line to insert so you don't put Vars inside Strings (bad.. very bad!) Just my $0.42... Glenn |
||||||||
|
|
|||||||
I will look At the possibility to make an udf later. Right now, I consentrate on implementing the zip-routine. I will have all zip-files that needs to be processed ready In one Dir. I will Use 7z.exe. What I need getting done: Extract all zip-files In this area, AND put the extracted files In one location. 7z.exe will be in the $ZipDir. The syntax For 7z will be like this: Code: 7z e *.zip *.sos -oOutDir where *.sos will be the file-type to be extracted, AND OutDir is the output-directory. I was thinking the script will be something like this: Code: $zipDir = 'c:\sosi\zip\' $zipOutDir = 'Utpakket\' Run "%COMSPEC% /e:1024 /c $ZipDir'7z.exe e'+$ZipDir+ '*.zip *.sos' + '-o'+$ZipDir+$ZipOutDir" But something is wrong, can't see it right now. This might not be the right way to run this dos-program? Btw Glenn, I will try to "sharpen" up regarding vars in strings etc. I was looking through using this run-command, and it seems to me that there are problems using vars inside the run-syntax The solution (at least so far..) was to define a var containing the syntax for the run-command, like this: Code: $ZipDir = 'c:\sosi\zip\' $zipOutDir = 'Utpakket' $RunZip = $ZipDir+'7z.exe e '+$ZipDir+'*.zip *.sos -o'+$ZipDir+$ZipOutDir Run "%COMSPEC% /e:1024 /c $RunZip" Is this the best way to do this?? Jon |
||||||||
|
|
|||||||
A - try Shell, not Run. Shell will cause Kix to wait for the files to unzip, while Run will not. B - waiting to turn your small working code into a UDF will actually complicate your process.. when you have a UDF that does what you need, you isolate that from the new code you develop around it. I STRONGLY recommend the following format for any Run/Shell commands: Code: $Cmd = '%COMSPEC% /c' $Cmd = $Cmd + ' mycommand.exe /a:' + $ArgInVar $Cmd = $Cmd + ' /switchA /switchB' $Cmd = $Cmd + ' >' + $OutFile ; following line for debugging 'About to run the following command: ' ? $Cmd ? Shell $Cmd This allows you to build the command step by step, display it before it runs so you can see if it looks right. You can even copy the screen output and paste it into the command prompt to run it. Glenn |
||||||||
|
|
|||||||
You may want WrapAtEOL on for that. |
||||||||
|
|
|||||||
I was troubling a little bit with that run-part of the script, and working with another way to fix the zip-extraction, when I looked in here and saw your post Glenn. I was working on something in that direction, byt thanks (once again..) for your time and help. Now my script is doing almost everything I want it to, I am sure the code can be written much better, but anyhow, it does what I need it to. I will look a bit on trying to make udf of some parts of the script, see what I can make out of it. Here's the script for now ( Right now it runs and has finished unpacking 200 zip-files, giving a bit under thousand files (totally abouut 10 gigs), now being rewritten and inserted the correct line. Works like a charm.. Not far from it anyway Some text is written in norwegian. Code: ; Routine to search given directory for textfiles ; Uses DirList()Udf to search through Directory. ; Scans files in spesific directory and search for specific text ; When found - Dependent on file-naming, Insert line with different text ($InsertData) from var $Kommunenr ; every time text is found (numerous times in one single file). ; Version 1.0 - 20080402 ; Necessary File-structure: ; d:\Sosi - Root-Dir ; d:\Sosi\Zip - Zip-files is placed here ; Rest of needed dirs is created in the script ;Dependecies: Dirlist()udf - Embedded in script ;7z.exe must reside in zip-dir ;Defined Vars $ZipDir = 'd:\sosi\zip\' $7z = $ZipDir + '7z.exe' $zipOutDir = $ZipDir +'Utpakket\' $ZipSyntax = $7z + ' e ' + $ZipDir +'*.zip *.sos -y -o'+$ZipOutDir $SearchData = 'objtype' ; insert when line containing this is found $RootPath = 'd:\sosi\' ; root path where files live $Original = 'Zip\Utpakket\' $Files = DirList($RootPath+$Original, 2) ; return a list of all files with complete paths $LogDir = 'Log\' ; Log-dir $LogFile = $RootPath+$logdir+'logfile.txt' ; Logfile-placement $counter = 0 ; Counter for printing # of files on screen $WorkDir = 'WorkDir\' $WorkFile = $RootPath+$Workdir+'WorkFile.tmp' $OutPutDir = $RootPath + 'OutDir\' ;Create Necessary directories.. if already existing, empty thees ones. If NOT Exist ($ZipOutDir) MD ($ZipOutDir) Else Del $ZipDir+$ZipOutDir EndIf If Exist ($logfile) Del ($logfile) ; Check if log-file exists, if it does, delete it Else MD ($RootPath+$logdir) ; If log-dir doesn't exist, create it EndIf If NOT Exist ($RootPath+WorkDir) MD ($RootPath+WorkDir) EndIf If NOT Exist ($OutPutDir) MD ($OutPutDir) Else Del $OutPutDir EndIf ;Running zip-routine: ;7zip kommando for å pakke gitte filtyper til ett fast område: ; '7z e zipfil.zip *.sos -od:\sosi\zip\utpakket' (eller bare -outpakket om du står i Dir over) $Cmd = '%COMSPEC% /c' $Cmd = $Cmd + $ZipSyntax Shell $Cmd ? ? "Zip-filer pakket ut og skrevet til fil.." ; Confirmation after zip-ectraction Sleep 10 ;Checking each file for given text, starts here.. CLS For Each $File In $Files $counter = $counter+1 ; Counter to use on screen and in logfile. $FileProcessed = SubStr ("$file",22,30) ;Extract filename from $file ? "Skriver Fil# " $counter " " $fileprocessed " til fil.." ; Display on screen file-progress. Open(1, $logfile, 5 ) ;Write all on screen to log-file WriteLine(1 , "Skriver Fil# "+ $counter + " "+ $fileprocessed + " til fil.." + @CRLF) ;Sjekke if file is N50-data $kommunenr = SubStr ("$file", 22,4) ; File is probably n50 ; Check if file is Fkb-data If SubStr("$file",22,2)='32' $kommunenr = SubStr("$file",25,4) EndIf ;Check if file is N250-data If SubStr("$file",22,3)='08_' OR SubStr("$file",22,3)='06_' If SubStr("$file",22,3)='08_' $kommunenr = SubStr("$file",22,2)+'00' EndIf If SubStr("$file",22,3)='06_' $kommunenr = SubStr("$file",22,2)+'00' EndIf EndIf $InsertData = '..KOMM '+ $kommunenr ;$kommunenr will decide which text to put in written out-file Copy $File $WorkFile ; rename the file If Open(2, $WorkFile)=0 $ = RedirectOutput($OutPutDir+$FileProcessed,1) ; output to output dir with original file-name $Line = ReadLine(2) While NOT @ERROR If InStr($Line, $SearchData) $InsertData ? ; output extra (inserted) line EndIf $Line ? ; output original line $Line = ReadLine(2) Loop $ = Close(2) $ = RedirectOutput('') ; close output file Del $WorkFile EndIf Next ? ? ? "Skrevet " $counter " filer til lagringsmappe " $OutPutDir ; Write confirmation of written files to output-dir ? "Loggfil 'Logfile.txt' ligger i mappen " $LogFile ; Display the placement of the log-file WriteLine(1 ,@crlf + $crlf + "Skrevet "+ $counter +" filer til lagringsmappe "+ $OutPutDir + @CRLF) ; Same info written in the log-file Close (1) Sleep 15 |
||||||||
|
|
|||||||
Norwegian? Closest I can come is one word in Swedish! Glad it's working! It's a perfect opportunity to illustrate RUN vs. SHELL, too. Let's say that some external process delivers 50 Zip files each day to a network share by 1am. You need to expand and process them by 3am. It takes 40 minutes to unzip the files, one by one, using Shell. It then takes 90 minutes to process the files. Your boss is upset because the report comes out at 03:10 instead of before 03:00. You can't spead up the processing, or can you??? Well, this is where RUN can come in handy, creating a form of multi-threaded scripting. You create a script that runs at 1am and enumerates all of the zip files. For each zip file it finds, it runs an unzip command via RUN. It keeps a running count of the files, and, say, every 10 it processes, it sleeps 30 seconds. Now you have 10 CONCURRENT unzip processes running instead of one at a time. You realize that you can now unzip all of the files in under 10 minutes!! Knowing this, you can run the second script that processes the unzipped files, starting at 01:20, and completing at 02:50. I use this type of multi-threading to fire off 25 Kix scripts at a time. These process logs from 300+ systems in about 12 minutes instead of the 3 hours it took doing them one at a time. In my environment, I actually keep track of the number of active subprocesses, firing off 25, and 5 more each time the count drops to 20 or less. Shawn used this concept recently with some impressive results. Also, just something to consider - this was pulled directly from your code.. Code: $WorkFile = $RootPath+$Workdir+'WorkFile.tmp' $OutPutDir = $RootPath + 'OutDir\' Which line is easier to read? Having an open and clear (and consistent) format will go a long way in making your code more supportable. Just something to consider. Glenn |
||||||||
|
|
|||||||
Yes, Norway here Tthanks for your input. I see the benefit you can have from multi-processes, but guess it will complicate any script a lot using this method. Also, I am sure you handle much bigger systems than I do. In addition I don't use kixtart for so very much more than a rather advanced logon-script, and some jobs in conjunction with this. What I mean to say, is that my scripting is very simple in comparison to yours. But I really appreciate your input, always trying to improve my scripting, allthough i unfortunately haven't got the time I would like for doing this. I know thosw vars were a bit messy, things went a bit fast, but I am cleaning up bit by bit Another question: I am using kix-editor. It has a function to make exe-file of the script, embedding kix32.exe. I have implemented a var, which checks a special var when running the script, like 'RunnKix.exe $var=yes. When this input for the var is matched in the script, I use it in a if-sentence, giving this var another input (writing some stuff to the registry). When running the kix-file this works nicely, but when using the generated exe, this doesn't work. Is there a way to fix this, or is it just like this it will be, because kix can't take the var from the command-prompt inside the exe? Jon |
||||||||
|
|
|||||||
Jon I wasn't suggesting that you implement the multi-threading - just thought it was a good opportunity to illustrate the benefits of and differences between RUN and SHELL with a potentially realistic example. I don't use the make-exe capabilities, so I'm not familiar with any limitations it might have. I'm sure someone on the boards here might have some insight into that. Glenn |
||||||||
|
|
|||||||
Glenn I understood that you didn't mean for me to use multi-threading, but never-the-less I'm curious of how you build your script doing that. So if you have a script or some examples, it would be interesting thaing a peak. Anyway.. thanks to all for the input on this thread. Jon |