split on page break - KiXtart.org

You are not logged in. [Log In] KiXtart.org website » Forums » KiXtart » Basic Scripting » split on page break

Page 1 of 1

Topic Options

#200385 - 2010-10-26 11:05 PM split on page break
booey booey Getting the hang of it Registered: 2005-07-25 Posts: 76 Loc: USA	I have a large text file with page breaks in the file. I'd like to break the larger document into single file text files based on the page break. Is this possible using the split function or some other way? Thanks.
Top

#200386 - 2010-10-26 11:07 PM Re: split on page break [Re: booey]
Allen Allen KiX Supporter Registered: 2003-04-19 Posts: 4562 Loc: USA	Should be possible. Can you provide a sample of what the page break looks like?
Top

#200387 - 2010-10-26 11:15 PM Re: split on page break [Re: Allen]
booey booey Getting the hang of it Registered: 2005-07-25 Posts: 76 Loc: USA	I attached a sample of the file. I can see the page break in Textpad, but not so much in Notepad. Thanks. Attachments Document2.txt (324 downloads) Description:
Top

#200388 - 2010-10-27 01:00 AM

Re: split on page break [Re: booey]

Allen

KiX Supporter

Registered: 2003-04-19
Posts: 4562
Loc: USA

This was a fun puzzle... Your page break was an chr(12) character.

Requires Loadfile() -
http://www.kixtart.org/forums/ubbthreads.php?ubb=showflat&Number=165959

This is a little dirty and probably could be optimized, but I think it works. Will create files based on the original with a _#####.ext in the same directory as the original. See what you get.

break off
 

$filename="d:\temp\222.txt"
 

$RC=pagebreak($filename)
 

 

 

function pagebreak($filename)
  dim $count,$line,$ffh,$rc,$folder,$ext,$file
  if exist($filename)
    $folder=left($filename,instrrev($filename,"\"))
    $ext=right($filename,4)
    $file=left(right($filename,(len($filename)-len($folder))),-4)
    $count=1
    $ffh=freefilehandle()
    for each $line in loadfile($filename,@CRLF)
      if asc(left($line,1))=12
        $count=$count+1
      else
        if open($ffh,$folder +  $file + "_" + right("0000"+ $count,5) + $ext,5)=0
          $rc=writeline($ffh,$line + @CRLF)
          $rc=close($ffh)
        endif
      endif
    next
  else
    exit 2
  endif
endfunction

Top

#200390 - 2010-10-27 09:28 AM Re: split on page break [Re: Allen]
Richard H. Richard H. Administrator Registered: 2000-01-24 Posts: 4946 Loc: Leatherhead, Surrey, UK	Out of interest, why not pass the Form Feed characater to the LoadFile() UDF? That way you'll get back an array of pages.
Top

#200394 - 2010-10-27 02:48 PM Re: split on page break [Re: Richard H.]
Allen Allen KiX Supporter Registered: 2003-04-19 Posts: 4562 Loc: USA	Good point, and no reason that shouldn't work. Like I said though, it could use some work / tlc (last night I was rushing to get it done before leaving for dinner.)
Top

#200396 - 2010-10-27 03:13 PM

Re: split on page break [Re: Allen]

Allen

KiX Supporter

Registered: 2003-04-19
Posts: 4562
Loc: USA

Here's Richard's suggestion in action. Still needs more error checking, etc.

function pagebreak($filename)
  dim $i,$ffh,$rc,$folder,$ext,$file,$pages
  if exist($filename)
    $folder=left($filename,instrrev($filename,"\"))
    $ext=right($filename,4)
    $file=left(right($filename,(len($filename)-len($folder))),-4)
    $ffh=freefilehandle()
    $pages=loadfile($filename,chr(12))
    for $i= 0 to ubound($pages)
      if open($ffh,$folder +  $file + "_" + right("0000"+ ($i+1),5) + $ext,5)=0
        $rc=writeline($ffh,$pages[$i])
        $rc=close($ffh)
      endif
    next
  else
    exit 2
  endif
endfunction

Top

#200398 - 2010-10-27 03:53 PM Re: split on page break [Re: Allen]
booey booey Getting the hang of it Registered: 2005-07-25 Posts: 76 Loc: USA	You guys are amazing. That works great. By the way, how were you able to determine that page break was a chr(12)?
Top

#200400 - 2010-10-27 04:52 PM Re: split on page break [Re: booey]
Richard H. Richard H. Administrator Registered: 2000-01-24 Posts: 4946 Loc: Leatherhead, Surrey, UK	Originally Posted By: Young 'un By the way, how were you able to determine that page break was a chr(12)? Us old timers know the ASCII control set forwards and backwards. CHR(12) is a standard character used in device control. Primarily printers, teletypes and old style character terminals. It is a "form feed" character, which in the days of continuous fan-fold paper meant "advance to the top of the next page" For modern page printers (laser jet) is means "eject the current page" and for character terminals it means "clear the screen". Control characters do other things too - carriage return, horizontal and vertical tabs, bell (or beep), and the good old "introduce a non-standard sequence" escape character. For more information (though Lord knows why you'd want to): see the WIKI page: http://en.wikipedia.org/wiki/ASCII
Top

#200401 - 2010-10-27 05:27 PM Re: split on page break [Re: Richard H.]
booey booey Getting the hang of it Registered: 2005-07-25 Posts: 76 Loc: USA	Richard, thanks for the good background and information about ASCII. Now that I'm able to successfully split the document by page break, I have a new dilemma. I need to search the file for some text, shown as "Findme" below in the subset of the file. The codes (4012F, 98966) under FindMe may or may not exist and they'll always be different. I need to determine if they exist or not. My question is, after I find the "FindMe" text in the file, how do I have the script search two lines below to see if any text exists? Thanks Start subset of file ================================================================================ Example Code Description Mod1 Mod2 Mod3 Level -------------------------------------------------------------------------------- 99441 Description here 1 1 ================================================================================ Findme Description Mod1 Mod2 Mod3 Level -------------------------------------------------------------------------------- 4012F Description1 1 1 98966 Description2 1 1
Top

#200402 - 2010-10-27 06:14 PM

Re: split on page break [Re: booey]

Richard H. Offline

Administrator

Registered: 2000-01-24
Posts: 4946
Loc: Leatherhead, Surrey, UK

The following script will search your original file and output the information that you are looking for.

This assumes that lines are terminated by carriage-return / line-feed pairs.

It requires the LoadFile() UDF as before.

Code:

Break ON

$=SetOption("Explicit","ON")

Dim $sSource,$asPages,$iPageCount,$sSentinelText,$iIndex,$asLines

$sSource=".\findme.txt"
$sSentinelText="Findme"

$asPages=LoadFile($sSource,Chr(12))

For $iPageCount=0 to UBound($asPages)
	"Working on page # "+(1+$iPageCount)+@CRLF
	$iIndex=InStr($asPages[$iPageCount],$sSentinelText)
	If Not $iIndex "   String '"+$sSentinelText+"' not found on page"+@CRLF EndIf
	While $iIndex
		$asPages[$iPageCount]=SubStr($asPages[$iPageCount],$iIndex+Len($sSentinelText))
		$asLines=Split($asPages[$iPageCount]+@CRLF+@CRLF+@CRLF,@CRLF)
		If (""+$asLines[2]+$asLines[3])=""
			"   String '"+$sSentinelText+"' found on page, but no data present"+@CRLF
		Else
			"   First code : "+Split($asLines[2])[0]+@CRLF
			"   Second code: "+Split($asLines[3])[0]+@CRLF
		EndIf
		$iIndex=InStr($asPages[$iPageCount],$sSentinelText)
	Loop
Next

Top

#200403 - 2010-10-27 09:15 PM Re: split on page break [Re: Richard H.]
booey booey Getting the hang of it Registered: 2005-07-25 Posts: 76 Loc: USA	Richard, Thank you very much. I'm envious of your and other's Kixtart skills. I should be able to take it from. Thanks to all who helped me with this.
Top

#200404 - 2010-10-27 09:45 PM Re: split on page break [Re: booey]
Allen Allen KiX Supporter Registered: 2003-04-19 Posts: 4562 Loc: USA	Quote: By the way, how were you able to determine that page break was a chr(12)? I didn't know for certain what the page break code was, but the way I confirmed it was to use "type" in the cmd shell. Once in the cmd, I typed: type yourtextfile.txt, and it displayed the contents as well as the the pagebreak symbols to the screen. Then I just wrote a little code that would check the first letter of each line and display the ascii code: asc($letter). This spit out 12 for the page break.
Top

#200408 - 2010-10-28 02:16 AM

Re: split on page break [Re: Richard H.]

Glenn Barnas Offline

KiX Supporter

Registered: 2003-01-28
Posts: 4401
Loc: New Jersey

Well, personally, I think an ASCII primer should be required reading in grade school. $;\)$

Despite Unicode and MultiByte and even the occasional UniCycle $;\)$ , basic scripting and page formatting still relies on control codes, and a basic understanding of them is important.

I regularly use US, FS, SOT, EOT and other "separator" characters in my scripts. Chr(31) is a valid ASCII code that works well (and somewhat officially) as a delimiter in split, join, and even message strings when I need a delimiter and can't use a printable charachter.

In fact, I regularly transmit arrays of arrays via socket communications in Kix where the outer array (record) is delimited with Chr(31) and the inner (field) array is delimited with Chr(30). Works exceptionally well and doesn't interfere with the payload.

Just to throw an alternative solution out there, here's a simple script that accomplishes both tasks - break a file into separate page files, and locate a string on each page and determine if the line(s) that follow contain data:

Code:

; This script relies on the FileIO function
Call '%KIXLIBPATH%\FileIO.kxf'

$FPath = 'c:\temp\'					; location of file(s)
$File = 'test.txt'					; name of file

$aData = FileIO($FPath + $File, 'R')			; read the original file
$aData = Join($aData, @CRLF)				; Combine into a single string
$aData = Split($aData, Chr(12))				; break on FormFeed chars

; This block will break the original file into separate files per page
; This solves the original request, but is not needed to continue 
; to the next section which checks for data after a specific string
; on each page
For $Page = 0 to UBound($aData)				; enumerate pages
  $SubFile = Right('0000' + $Page, 4) + '_' + $File	; create page filename "0000_filename.txt"
  $aSubData = Split($aData[$Page],@CRLF)		; create array of lines for current page
  $ = FileIO($FPath + $SubFile, 'W', $aSubData)		; write the sub-file
Next


; at this point, we have an array of pages as a simple string
; Using a similar logic block, we can search each page for a string and then check the next two lines

For $Page = 0 to UBound($aData)				; enumerate pages
  $aSubData = Split($aData[$Page],@CRLF)		; create array of lines for current page
  $SearchStart = AScan($aSubData, 'findme', 1, , 1)	; locate the line with the search phrase
  If $SearchStart					; was it found? will be zero if not
    ; the search phrase was found in the current page on the line represented by $SearchStart
    ; Check the next two lines for data, but only if at least 2 more lines are present on the page
    If UBound($aSubData) >= $SearchStart + 2
      If $aSubData[$SearchStart + 1] 			; FindMe + 1 has data
        ; do something, such as
        'Page ' ($Page + 1) ' - search line 1 contains ' $aSubData[$SearchStart + 1] ?
      EndIf
      If $aSubData[$SearchStart + 2] 			; FindMe + 2 has data
        'Page ' ($Page + 1) ' - search line 1 contains ' $aSubData[$SearchStart + 2] ?
        ; do something
      EndIf
    EndIf
  EndIf
Next

Glenn

_________________________
Actually I am a Rocket Scientist! $\:D$

Top

Page 1 of 1

Previous Topic View All Topics

Index Next Topic

Moderator: Jochen, Allen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Arend_, Mart

Hop to:

Shout Box

Search the board with:
superb Board Search
or try with google:

	Web kixtart.org