Curious
(Getting the hang of it)
2008-03-25 03:48 PM
Open all files in dir, find phrase, insert text, write file

Hello.

I have a problem.

We are downloading lots of amount of zip-files from central servers. Each of these zip-files consists of several separate files. Each file needs to be modified and inserted text AFTER a special phrase. This phrase may be repeated several times and the insertion must be done accordingly.

What I want to do, is make a script which finds all files under this dir (and possible under-dirs), open each file, finds this phrase, inserts a new line with data beneath this, and repeats this till eof. Then the original file should be written in the same location. Then repeat this until all files has been processed.

I am looking through the posts here, but am not finding a good place to start.

Help will be much appreciated.

Jon


Mart
(KiX Supporter)
2008-03-25 04:50 PM
Re: Open all files in dir, find phrase, insert text, write file

Don't have something like that at hand right now but you could use the DirPlus() UDF to enumerate all files, the ReadFile() UDF to read each file one by one, the build in InStr() function to see if the line has the text you are looking for and add some stuff to it if it has, after all this write the line to a temp file using the build in WriteLine function, rename the original file to .old or something and rename the temp file to the original filename.
I'm sure that there are other ways but this is one example that would work.

UDF Library » DirPlus() - a recursive dir tool
UDF Library » ReadFile() - Read a file into an array


NTDOCAdministrator
(KiX Master)
2008-03-25 06:29 PM
Re: Open all files in dir, find phrase, insert text, write file

Well are you unzipping these files or leaving them as zip files?
That is an important item that I don't see you mention.

If left alone then each would need to be unzipped, update file, then rezip again. What about full or relative paths in the zip?


Mart
(KiX Supporter)
2008-03-25 08:24 PM
Re: Open all files in dir, find phrase, insert text, write file

Woops missed the zip part. Thanks Doc.
7zip for example works just fine for unzipping and/or zipping files using a script.
http://www.7-zip.org/


Curious
(Getting the hang of it)
2008-03-26 11:06 AM
Re: Open all files in dir, find phrase, insert text, write file

The zip-part isn't the most essential part of the script, I imagine this would complicate the script dramatically if this should be implemented as well.

But if anyone has good suggestions on how to implement this in the script, that would be VERY good.

Jon


Mart
(KiX Supporter)
2008-03-26 01:35 PM
Re: Open all files in dir, find phrase, insert text, write file

As suggested you could use 7zip.
If you get all filenames (zip files) and create a folder with the filename for each file and unzip its contents there then it would be easy to re-zip the files when the entire script is done.


Glenn BarnasAdministrator
(KiX Supporter)
2008-03-26 03:36 PM
Re: Open all files in dir, find phrase, insert text, write file

THINK!
(Ex-IBM)

Dealing with ZIP files can be incredibly complex...
Do the zip files contain paths?
How will you identify those paths?
Do you need to modify the file content IN the zip file? That implies extracting the file, modifying it, and putting it back in the same logical place in the zip file.
Will multiple zip files have identical content? If so, you need to process the zips one by one instead of first extracting them all and then processing the data.

You need to deal with extracting the files, modify the files, put the files back in the zip, and remove the files you extracted without removing any other files. This would be much easier if, for example, you received the zip files and merely needed to extract them for processing. Then you could enumerate all the zip files with a Dir(), extract the files in each one, and then process all the resulting files - removing the Zip files after the extraction was complete. Of course, this assumes that the same file name is not present in multiple zip files.

My point is - it's relatively simple to process files, search for content, and then modify (I'll outline that in a moment), but throwing in zip files requires careful planning and a knowledge of the zip file content and structure.

As for basic search/replace, here's the basic idea...
 Code:
$SearchData = 'xyzzy'           ; insert when line containing this is found
$InsertData = 'AbraCadabra'     ; data line to insert
$RootPath = 'D:\folder...' ; root path where files live
$Files = DirList($RootPath, 6) ; return a list of all files with complete paths
For Each $File in $Files
  Move $File 'WorkFile.tmp' ; rename the file
  If Open(2, 'workfile.tmp')
    $ = RedirectOutput($File) ; output to original file
    $Line = ReadLine(2)
    While Not @ERROR
      If InStr($Line, $SearchData)
        $InsertData ?        ; output extra (inserted) line
      EndIf
    $Line ?                  ; output original line
    $Line = ReadLine(2)
    Loop
    $ = Close(2)
    $ = RedirectOutput('')  ; close output file
    Del 'workfile.tmp'
  EndIf
Next

This is UNTESTED, and presented to illustrate the logic flow. This could easily be modified to REPLACE a line or APPEND a line rather than INSERT a line. A nice mod might be to define a var $MODE, and use a Case statement to Insert, Replace, or Append depending on the value of $MODE.

Glenn


NTDOCAdministrator
(KiX Master)
2008-03-26 08:10 PM
Re: Open all files in dir, find phrase, insert text, write file

If you know the exact file name and path inside the archive you could extract just that file, update it, then UPDATE the archive with it again on the fly.

But that requires that you know EXACTLY the name and path and it has to always be that. Otherwise as Glenn says it becomes much more complicated and you would need to do file cleanup and maybe even a folder of its own for each user/zip file you deal with - or work on them individually which could be a slow process, but safer as well.

Can it be done - YES - is it simple - NO but it can be done.
However it could take a lot of work and time debugging from someone that does understand KiX so you may need to learn KiX some more so that you can debug it or hope that someone here has the time to work with you on debugging it.


Curious
(Getting the hang of it)
2008-04-01 09:32 AM
Re: Open all files in dir, find phrase, insert text, write file

 Originally Posted By: Glenn Barnas
THINK!
(Ex-IBM)

Dealing with ZIP files can be incredibly complex...
Do the zip files contain paths?
How will you identify those paths?
Do you need to modify the file content IN the zip file? That implies extracting the file, modifying it, and putting it back in the same logical place in the zip file.
Will multiple zip files have identical content? If so, you need to process the zips one by one instead of first extracting them all and then processing the data.




Well guys.. as I said, the zip-thing isn't the most essential thing, so if i accomplish integrating the other part at first, I can work with implementing this as a "bonus". Anyway.. when using the potential zip-solution, relative paths isn't an issue. The zip-files to be processed will all be in one dir, there must be created dirs according to zip-name under this dir, then extract each zip into that dir.

The files that has been processed later in the script does NOT need to be zipped again afterwards.

I will now start working more intensively on the other part of the script, using (and looking at ) some of the solution you guys have mentioned.

Thanks for all help so far, and I will MOST certain be in need of more \:\)

Jon


Curious
(Getting the hang of it)
2008-04-01 09:57 AM
Re: Open all files in dir, find phrase, insert text, write file

 Originally Posted By: NTDOC
If you know the exact file name and path inside the archive you could extract just that file, update it, then UPDATE the archive with it again on the fly.

But that requires that you know EXACTLY the name and path and it has to always be that. Otherwise as Glenn says it becomes much more complicated and you would need to do file cleanup and maybe even a folder of its own for each user/zip file you deal with - or work on them individually which could be a slow process, but safer as well.

Can it be done - YES - is it simple - NO but it can be done.
However it could take a lot of work and time debugging from someone that does understand KiX so you may need to learn KiX some more so that you can debug it or hope that someone here has the time to work with you on debugging it.


Yes, I reckoned this wouldn't be quite a walk in the park. Regarding kix, i have used it in rather simple ways, and therefore thought it would be smart to hear with you guru's to be pointed in the right direction instead of fumbling around by myself.

Glenn: thanks for the example, this seems quite stratight-forward. BUT, there's one thing: all the files from one zip extracted will need to be inserted another text than the next zip. The text to be inserted will be based on the name of the zip-file (the first to chars)

I am not sure of how complicated I will do this script, maybe I must accept that the organizing of the zips in advance is one way to do it, than with a script that is much simpler.


Jon


Glenn BarnasAdministrator
(KiX Supporter)
2008-04-01 01:05 PM
Re: Open all files in dir, find phrase, insert text, write file

Jon

The logic example I presented could be wrapped up into a UDF. You would pass it the SearchData, NewData, and (with the Select/Case mod) Method arguments.

You'd prepare the directory ahead of time, placing or extracting flies to "WorkFolder", and then after modification, "workfolder" would be renamed using something like a sequence ID. You could also perpare the folder and pass the folder name to the UDF, but - for me - using a temporary WorkFolder name allows the process to be interrupted and recovered more easily.

Glenn


Curious
(Getting the hang of it)
2008-04-01 04:05 PM
Re: Open all files in dir, find phrase, insert text, write file

 Originally Posted By: Glenn Barnas

As for basic search/replace, here's the basic idea...
 Code:
$SearchData = 'xyzzy'           ; insert when line containing this is found
$InsertData = 'AbraCadabra'     ; data line to insert
$RootPath = 'D:\folder...' ; root path where files live
$Files = DirList($RootPath, 6) ; return a list of all files with complete paths
For Each $File in $Files
  Move $File 'WorkFile.tmp' ; rename the file
  If Open(2, 'workfile.tmp')
    $ = RedirectOutput($File) ; output to original file
    $Line = ReadLine(2)
    While Not @ERROR
      If InStr($Line, $SearchData)
        $InsertData ?        ; output extra (inserted) line
      EndIf
    $Line ?                  ; output original line
    $Line = ReadLine(2)
    Loop
    $ = Close(2)
    $ = RedirectOutput('')  ; close output file
    Del 'workfile.tmp'
  EndIf
Next

This is UNTESTED, and presented to illustrate the logic flow. This could easily be modified to REPLACE a line or APPEND a line rather than INSERT a line. A nice mod might be to define a var $MODE, and use a Case statement to Insert, Replace, or Append depending on the value of $MODE.

Glenn


I've started to work on this more intensively today, and took this code to start with. I understand that this also uses dirlist.udf. The basic works fine, but what's strange, is that the script doesn't do anything with file #1, it just cuts and paste file#1 into file#2(pastes into the beginning of file). File #2 is being processed, and the insertion is ok, but as you see.. the file#1 is then a part of this file. After this file#1 gets deleted, that is, this isn't totally true, the file is being put where the kix-script is started from, and has been renamed to 'workfile.tmp'

Haven't tried with more then 2 files yet.


Jon


Curious
(Getting the hang of it)
2008-04-01 08:57 PM
Re: Open all files in dir, find phrase, insert text, write file

 Code:
  If Open(2, 'workfile.tmp')=0


Had to add '=0' at the end of that line. That did the trick.. Each file is now being processed. So far so good.

Jon


Curious
(Getting the hang of it)
2008-04-01 11:27 PM
Re: Open all files in dir, find phrase, insert text, write file

 Code:
$SearchData = 'objtype'           ; insert when line containing this is found
$RootPath = 'c:\test\sosi\' ; root path where files live
$Files = DirList($RootPath, 6) ; return a list of all files with complete paths

For Each $File In $Files
  $kommunenr = SubStr ("$file", 14,2)   ; extract "county"-number from filename
  $InsertData = '..KOMMUNE $kommunenr'     ; data line to insert
   Move $File 'WorkFile.tmp' ; rename the file
    If Open(2, 'workfile.tmp')=0
      $ = RedirectOutput($File) ; output to original file
      $Line = ReadLine(2)
      While NOT @ERROR
        If InStr($Line, $SearchData)
          $InsertData ?        ; output extra (inserted) line
        EndIf
      $Line ?                  ; output original line
      $Line = ReadLine(2)
      Loop
      $ = Close(2)
      $ = RedirectOutput('')  ; close output file
      Del 'workfile.tmp'
    EndIf
Next


This is how the script looks so far. Right now the files is extracted in an explicit dir, and the script handles all files here, inserting a line, including some text and the two first chars of the filename for each hit on specific text.

I will next start expanding the script, trying to implement files to be processed from different directories, and look into the possibility to handle the unzipping as well

Jon


Glenn BarnasAdministrator
(KiX Supporter)
2008-04-02 12:32 AM
Re: Open all files in dir, find phrase, insert text, write file

Once you've got the logic working, think about makeing it a function that can be called. Test again.. then, with the core code working, it's easier to expand the functionality of the overall script.

Everything after this line:
$InsertData = '..KOMMUNE $kommunenr' ; data line to insert
up to the Next can be put in a function and called by
ModFile($File, $SearchData, $InsertData)

Speaking of that line, it should be more like:
$InsertData = '..KOMMUNE ' + $kommunenr ; data line to insert
so you don't put Vars inside Strings (bad.. very bad!) ;\)



Just my $0.42...

Glenn


Curious
(Getting the hang of it)
2008-04-03 01:20 PM
Re: Open all files in dir, find phrase, insert text, write file

I will look At the possibility to make an udf later.

Right now, I consentrate on implementing the zip-routine. I will have all zip-files that needs to be processed ready In one Dir. I will Use 7z.exe. What I need getting done: Extract all zip-files In this area, AND put the extracted files In one location. 7z.exe will be in the $ZipDir. The syntax For 7z will be like this:

 Code:
7z e *.zip *.sos -oOutDir


where *.sos will be the file-type to be extracted, AND OutDir is the output-directory.

I was thinking the script will be something like this:

 Code:
$zipDir = 'c:\sosi\zip\'
$zipOutDir = 'Utpakket\'

Run "%COMSPEC% /e:1024 /c $ZipDir'7z.exe e'+$ZipDir+ '*.zip *.sos' + '-o'+$ZipDir+$ZipOutDir"


But something is wrong, can't see it right now. This might not be the right way to run this dos-program?

Btw Glenn, I will try to "sharpen" up regarding vars in strings etc.

I was looking through using this run-command, and it seems to me that there are problems using vars inside the run-syntax
The solution (at least so far..) was to define a var containing the syntax for the run-command, like this:

 Code:
$ZipDir = 'c:\sosi\zip\'
$zipOutDir = 'Utpakket'
$RunZip = $ZipDir+'7z.exe e '+$ZipDir+'*.zip *.sos -o'+$ZipDir+$ZipOutDir

Run "%COMSPEC% /e:1024 /c $RunZip"


Is this the best way to do this??


Jon


Glenn BarnasAdministrator
(KiX Supporter)
2008-04-03 03:59 PM
Re: Open all files in dir, find phrase, insert text, write file

A - try Shell, not Run. Shell will cause Kix to wait for the files to unzip, while Run will not.

B - waiting to turn your small working code into a UDF will actually complicate your process.. when you have a UDF that does what you need, you isolate that from the new code you develop around it.

I STRONGLY recommend the following format for any Run/Shell commands:
 Code:
$Cmd = '%COMSPEC% /c'
$Cmd = $Cmd + ' mycommand.exe /a:' + $ArgInVar
$Cmd = $Cmd + ' /switchA /switchB'
$Cmd = $Cmd + ' >' + $OutFile
; following line for debugging
'About to run the following command: ' ? $Cmd ?
Shell $Cmd

This allows you to build the command step by step, display it before it runs so you can see if it looks right. You can even copy the screen output and paste it into the command prompt to run it.

Glenn


Les
(KiX Master)
2008-04-03 05:06 PM
Re: Open all files in dir, find phrase, insert text, write file

You may want WrapAtEOL on for that.

Curious
(Getting the hang of it)
2008-04-04 05:03 PM
Re: Open all files in dir, find phrase, insert text, write file

I was troubling a little bit with that run-part of the script, and working with another way to fix the zip-extraction, when I looked in here and saw your post Glenn. I was working on something in that direction, byt thanks (once again..) for your time and help.

Now my script is doing almost everything I want it to, I am sure the code can be written much better, but anyhow, it does what I need it to. I will look a bit on trying to make udf of some parts of the script, see what I can make out of it.

Here's the script for now ( Right now it runs and has finished unpacking 200 zip-files, giving a bit under thousand files (totally abouut 10 gigs), now being rewritten and inserted the correct line. Works like a charm.. Not far from it anyway
Some text is written in norwegian.


 Code:

; Routine to search given directory for textfiles
; Uses DirList()Udf to search through Directory.
; Scans files in spesific directory and search for specific text
; When found - Dependent on file-naming, Insert line with different text ($InsertData) from var $Kommunenr
; every time text is found (numerous times in one single file).

; Version 1.0 - 20080402

; Necessary File-structure: 
	; d:\Sosi - Root-Dir
	; d:\Sosi\Zip - Zip-files is placed here
	; Rest of needed dirs is created in the script

;Dependecies: Dirlist()udf - Embedded in script
	;7z.exe must reside in zip-dir


;Defined Vars
$ZipDir = 'd:\sosi\zip\'
$7z = $ZipDir + '7z.exe'
$zipOutDir = $ZipDir +'Utpakket\'
$ZipSyntax = $7z + ' e ' + $ZipDir  +'*.zip *.sos -y -o'+$ZipOutDir
$SearchData = 'objtype'           ; insert when line containing this is found
$RootPath = 'd:\sosi\' ; root path where files live
$Original = 'Zip\Utpakket\'
$Files = DirList($RootPath+$Original, 2) ; return a list of all files with complete paths
$LogDir = 'Log\'  ; Log-dir
$LogFile = $RootPath+$logdir+'logfile.txt' ; Logfile-placement 
$counter = 0  ; Counter for printing # of files on screen
$WorkDir = 'WorkDir\'
$WorkFile = $RootPath+$Workdir+'WorkFile.tmp'
$OutPutDir = $RootPath + 'OutDir\'


;Create Necessary directories.. if already existing, empty thees ones.
If NOT Exist ($ZipOutDir) MD ($ZipOutDir)
  Else Del $ZipDir+$ZipOutDir
EndIf

If Exist ($logfile) Del ($logfile)  ; Check if log-file exists, if it does, delete it
  Else MD ($RootPath+$logdir)  ; If log-dir doesn't exist, create it
EndIf

If NOT Exist ($RootPath+WorkDir) MD ($RootPath+WorkDir)  
EndIf

If NOT Exist ($OutPutDir) MD ($OutPutDir)
  Else Del $OutPutDir
EndIf


;Running zip-routine:

;7zip kommando for å pakke gitte filtyper til ett fast område: 
; '7z e zipfil.zip *.sos -od:\sosi\zip\utpakket' (eller bare -outpakket om du står i Dir over)


$Cmd = '%COMSPEC% /c'
$Cmd = $Cmd + $ZipSyntax

Shell $Cmd

?
? "Zip-filer pakket ut og skrevet til fil.."  ; Confirmation after zip-ectraction
Sleep 10


;Checking each file for given text, starts here..
CLS

For Each $File In $Files

 $counter = $counter+1  ; Counter to use on screen and in logfile.
  $FileProcessed = SubStr ("$file",22,30)  ;Extract filename from $file
  ? "Skriver Fil# " $counter " " $fileprocessed " til fil.."  ; Display on screen file-progress.
  Open(1, $logfile, 5 )  ;Write all on screen to log-file
  WriteLine(1 , "Skriver Fil# "+ $counter + " "+ $fileprocessed + " til fil.." + @CRLF)
  
  ;Sjekke if file is N50-data
  $kommunenr = SubStr ("$file", 22,4)   ; File is probably n50
  
  ; Check if file is Fkb-data
  If SubStr("$file",22,2)='32' $kommunenr = SubStr("$file",25,4)
  EndIf
   
  ;Check if file is N250-data
  If SubStr("$file",22,3)='08_' OR SubStr("$file",22,3)='06_' 
  	If SubStr("$file",22,3)='08_' $kommunenr = SubStr("$file",22,2)+'00'
  	EndIf
  	
  	If SubStr("$file",22,3)='06_' $kommunenr = SubStr("$file",22,2)+'00'
  	EndIf
  EndIf  	
    	
  $InsertData = '..KOMM '+ $kommunenr  ;$kommunenr will decide which text to put in written out-file 
  
   Copy $File $WorkFile ; rename the file

    If Open(2, $WorkFile)=0
      
      $ = RedirectOutput($OutPutDir+$FileProcessed,1) ; output to output dir with original file-name
      $Line = ReadLine(2)
      
      While NOT @ERROR
        If InStr($Line, $SearchData)
          $InsertData ?        ; output extra (inserted) line
        EndIf
      $Line ?                  ; output original line
      $Line = ReadLine(2)
      Loop
    
      $ = Close(2)
      $ = RedirectOutput('')  ; close output file
      Del $WorkFile
    EndIf
Next
?
?
? "Skrevet " $counter " filer til lagringsmappe " $OutPutDir ; Write confirmation of written files to output-dir

? "Loggfil 'Logfile.txt' ligger i mappen " $LogFile  ; Display the placement of the log-file
WriteLine(1 ,@crlf + $crlf + "Skrevet "+ $counter +" filer til lagringsmappe "+ $OutPutDir + @CRLF)  ; Same info written in the log-file
Close (1)
Sleep 15



Glenn BarnasAdministrator
(KiX Supporter)
2008-04-04 08:17 PM
Re: Open all files in dir, find phrase, insert text, write file

Norwegian? Closest I can come is one word in Swedish! ;\)

Glad it's working! It's a perfect opportunity to illustrate RUN vs. SHELL, too.

Let's say that some external process delivers 50 Zip files each day to a network share by 1am. You need to expand and process them by 3am. It takes 40 minutes to unzip the files, one by one, using Shell. It then takes 90 minutes to process the files. Your boss is upset because the report comes out at 03:10 instead of before 03:00. You can't spead up the processing, or can you???

Well, this is where RUN can come in handy, creating a form of multi-threaded scripting. You create a script that runs at 1am and enumerates all of the zip files. For each zip file it finds, it runs an unzip command via RUN. It keeps a running count of the files, and, say, every 10 it processes, it sleeps 30 seconds. Now you have 10 CONCURRENT unzip processes running instead of one at a time. You realize that you can now unzip all of the files in under 10 minutes!! Knowing this, you can run the second script that processes the unzipped files, starting at 01:20, and completing at 02:50.

I use this type of multi-threading to fire off 25 Kix scripts at a time. These process logs from 300+ systems in about 12 minutes instead of the 3 hours it took doing them one at a time. In my environment, I actually keep track of the number of active subprocesses, firing off 25, and 5 more each time the count drops to 20 or less. Shawn used this concept recently with some impressive results.

Also, just something to consider - this was pulled directly from your code..
 Code:
$WorkFile = $RootPath+$Workdir+'WorkFile.tmp'
$OutPutDir = $RootPath + 'OutDir\'

Which line is easier to read? \:\)
Having an open and clear (and consistent) format will go a long way in making your code more supportable. Just something to consider.

Glenn


Curious
(Getting the hang of it)
2008-04-05 12:11 AM
Re: Open all files in dir, find phrase, insert text, write file

Yes, Norway here \:\)

Tthanks for your input. I see the benefit you can have from multi-processes, but guess it will complicate any script a lot using this method.

Also, I am sure you handle much bigger systems than I do. In addition I don't use kixtart for so very much more than a rather advanced logon-script, and some jobs in conjunction with this. What I mean to say, is that my scripting is very simple in comparison to yours.

But I really appreciate your input, always trying to improve my scripting, allthough i unfortunately haven't got the time I would like for doing this.

I know thosw vars were a bit messy, things went a bit fast, but I am cleaning up bit by bit

Another question: I am using kix-editor. It has a function to make exe-file of the script, embedding kix32.exe. I have implemented a var, which checks a special var when running the script, like 'RunnKix.exe $var=yes. When this input for the var is matched in the script, I use it in a if-sentence, giving this var another input (writing some stuff to the registry). When running the kix-file this works nicely, but when using the generated exe, this doesn't work. Is there a way to fix this, or is it just like this it will be, because kix can't take the var from the command-prompt inside the exe?

Jon


Glenn BarnasAdministrator
(KiX Supporter)
2008-04-05 01:45 AM
Re: Open all files in dir, find phrase, insert text, write file

Jon

I wasn't suggesting that you implement the multi-threading - just thought it was a good opportunity to illustrate the benefits of and differences between RUN and SHELL with a potentially realistic example. \:\)

I don't use the make-exe capabilities, so I'm not familiar with any limitations it might have. I'm sure someone on the boards here might have some insight into that.

Glenn


Curious
(Getting the hang of it)
2008-04-05 04:21 PM
Re: Open all files in dir, find phrase, insert text, write file

Glenn

I understood that you didn't mean for me to use multi-threading, but never-the-less I'm curious of how you build your script doing that. So if you have a script or some examples, it would be interesting thaing a peak.

Anyway.. thanks to all for the input on this thread.

Jon