Regain 2.1.0-STABLE API

net.sf.regain.crawler.preparator
Class JacobMsPowerPointPreparator

java.lang.Object
  extended by net.sf.regain.crawler.document.AbstractPreparator
      extended by net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
          extended by net.sf.regain.crawler.preparator.JacobMsPowerPointPreparator
All Implemented Interfaces:
Pluggable, Preparator, WriteablePreparator

public class JacobMsPowerPointPreparator
extends AbstractJacobMsOfficePreparator

Präpariert ein Microsoft-Powerpoint-Dokument für die Indizierung mit Hilfe der Jacob-API, wobei Jacobgen genutzt wurde, um den Zugriff zu erleichtern.

Dabei werden die Rohdaten des Dokuments von Formatierungsinformation befreit, es wird der Titel extrahiert.

Author:
Til Schneider, www.murfman.de, Reinhard Balling

Field Summary
private  de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Application mPowerPointApplication
          Die PowerPoint-Applikation.
private static int MSOGROUP
           
 
Fields inherited from interface net.sf.regain.crawler.document.Preparator
DEFAULT_BUFFER_SIZE
 
Constructor Summary
JacobMsPowerPointPreparator()
          Creates a new instance of JacobMsPowerPointPreparator.
 
Method Summary
 void close()
          Frees all resources reserved by the preparator.
private  void extractTextFrom(de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Shape shape, StringBuffer contentBuf)
          Extrahiert den Text aus einem Powerpoint-Form-Objekt und tr�gt ihn in den StringBuffer ein.
 void init(PreparatorConfig config)
          Initializes the preparator.
 void prepare(RawDocument rawDocument)
          Präpariert ein Dokument für die Indizierung.
private  String removeHyphenation(String text)
          RB: Eliminates hyphenation either -\n\r or -\013
 
Methods inherited from class net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
readProperties
 
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator
accepts, addAdditionalField, cleanUp, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mPowerPointApplication

private de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Application mPowerPointApplication
Die PowerPoint-Applikation. Ist null, solange noch kein Dokument bearbeitet wurde.


MSOGROUP

private static int MSOGROUP
Constructor Detail

JacobMsPowerPointPreparator

public JacobMsPowerPointPreparator()
                            throws RegainException
Creates a new instance of JacobMsPowerPointPreparator.

Throws:
RegainException - If creating the preparator failed.
Method Detail

init

public void init(PreparatorConfig config)
          throws RegainException
Initializes the preparator.

Specified by:
init in interface Pluggable
Overrides:
init in class AbstractJacobMsOfficePreparator
Parameters:
config - The configuration
Throws:
RegainException - If the configuration has an error.

prepare

public void prepare(RawDocument rawDocument)
             throws RegainException
Präpariert ein Dokument für die Indizierung.

Parameters:
rawDocument - Das zu pr�pariernde Dokument.
Throws:
RegainException - Wenn die Pr�paration fehl schlug.

extractTextFrom

private void extractTextFrom(de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Shape shape,
                             StringBuffer contentBuf)
Extrahiert den Text aus einem Powerpoint-Form-Objekt und tr�gt ihn in den StringBuffer ein.

Parameters:
shape - Das zu durchsuchende Powerpoint-Form-Objekt.
contentBuf - Der Puffer in den der evtl. gefundene Text einzutragen ist.

removeHyphenation

private String removeHyphenation(String text)
RB: Eliminates hyphenation either -\n\r or -\013


close

public void close()
           throws RegainException
Frees all resources reserved by the preparator.

Is called at the end of the crawler process after all documents were processed.

Specified by:
close in interface Preparator
Overrides:
close in class AbstractPreparator
Throws:
RegainException - If freeing the resources failed.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info