Regain 2.1.0-STABLE API

net.sf.regain.crawler.preparator
Class JacobMsExcelPreparator

java.lang.Object
  extended by net.sf.regain.crawler.document.AbstractPreparator
      extended by net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
          extended by net.sf.regain.crawler.preparator.JacobMsExcelPreparator
All Implemented Interfaces:
Pluggable, Preparator, WriteablePreparator

public class JacobMsExcelPreparator
extends AbstractJacobMsOfficePreparator

Präpariert ein Microsoft-Excel-Dokument für die Indizierung mit Hilfe der Jacob-API, wobei Jacobgen genutzt wurde, um den Zugriff zu erleichtern.

Dabei werden die Rohdaten des Dokuments von Formatierungsinformation befreit, es wird der Titel extrahiert.

Author:
Til Schneider, www.murfman.de, Reinhard Balling

Field Summary
private  de.filiadata.lucene.spider.generated.msoffice2000.excel.Application mExcelApplication
          Die Excel-Applikation.
 
Fields inherited from interface net.sf.regain.crawler.document.Preparator
DEFAULT_BUFFER_SIZE
 
Constructor Summary
JacobMsExcelPreparator()
          Creates a new instance of JacobMsExcelPreparator.
 
Method Summary
 void close()
          Frees all resources reserved by the preparator.
 de.filiadata.lucene.spider.generated.msoffice2000.excel.Range getCells(de.filiadata.lucene.spider.generated.msoffice2000.excel.Worksheet sheet, int row, int col)
          Wrapper for calling the ActiveX-Method with input-parameter(s).
 void init(PreparatorConfig config)
          Initializes the preparator.
 void prepare(RawDocument rawDocument)
          Präpariert ein Dokument für die Indizierung.
 
Methods inherited from class net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
readProperties
 
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator
accepts, addAdditionalField, cleanUp, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mExcelApplication

private de.filiadata.lucene.spider.generated.msoffice2000.excel.Application mExcelApplication
Die Excel-Applikation. Ist null, solange noch kein Dokument bearbeitet wurde.

Constructor Detail

JacobMsExcelPreparator

public JacobMsExcelPreparator()
                       throws RegainException
Creates a new instance of JacobMsExcelPreparator.

Throws:
RegainException - If creating the preparator failed.
Method Detail

init

public void init(PreparatorConfig config)
          throws RegainException
Initializes the preparator.

Specified by:
init in interface Pluggable
Overrides:
init in class AbstractJacobMsOfficePreparator
Parameters:
config - The configuration
Throws:
RegainException - If the configuration has an error.

getCells

public de.filiadata.lucene.spider.generated.msoffice2000.excel.Range getCells(de.filiadata.lucene.spider.generated.msoffice2000.excel.Worksheet sheet,
                                                                              int row,
                                                                              int col)
Wrapper for calling the ActiveX-Method with input-parameter(s).

Parameters:
row - an input-parameter of type int
col - an input-parameter of type int
Returns:
the result is of type Range

prepare

public void prepare(RawDocument rawDocument)
             throws RegainException
Präpariert ein Dokument für die Indizierung.

Parameters:
rawDocument - Das zu pr�pariernde Dokument.
Throws:
RegainException - Wenn die Pr�paration fehl schlug.

close

public void close()
           throws RegainException
Frees all resources reserved by the preparator.

Is called at the end of the crawler process after all documents were processed.

Specified by:
close in interface Preparator
Overrides:
close in class AbstractPreparator
Throws:
RegainException - If freeing the resources failed.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info