Regain 2.1.0-STABLE API

net.sf.regain.crawler.preparator
Class PdfBoxPreparator

java.lang.Object
  extended by net.sf.regain.crawler.document.AbstractPreparator
      extended by net.sf.regain.crawler.preparator.PdfBoxPreparator
All Implemented Interfaces:
Pluggable, Preparator, WriteablePreparator

public class PdfBoxPreparator
extends AbstractPreparator

Präpariert ein PDF-Dokument für die Indizierung.

Dabei werden die Rohdaten des Dokuments von Formatierungsinformation befreit, es wird der Titel extrahiert.

Author:
Til Schneider, www.murfman.de

Field Summary
private static org.apache.log4j.Logger mLog
          The logger for this class
 
Fields inherited from interface net.sf.regain.crawler.document.Preparator
DEFAULT_BUFFER_SIZE
 
Constructor Summary
PdfBoxPreparator()
          Creates a new instance of PdfBoxPreparator.
 
Method Summary
 void prepare(RawDocument rawDocument)
          Präpariert ein Dokument für die Indizierung.
 
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, init, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mLog

private static org.apache.log4j.Logger mLog
The logger for this class

Constructor Detail

PdfBoxPreparator

public PdfBoxPreparator()
                 throws RegainException
Creates a new instance of PdfBoxPreparator.

Throws:
RegainException - If creating the preparator failed.
Method Detail

prepare

public void prepare(RawDocument rawDocument)
             throws RegainException
Präpariert ein Dokument für die Indizierung.

Parameters:
rawDocument - Das zu pr�pariernde Dokument.
Throws:
RegainException - Wenn die Pr�paration fehl schlug.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info