Regain 2.1.0-STABLE API

net.sf.regain.crawler.preparator
Class ExternalPreparator

java.lang.Object
  extended by net.sf.regain.crawler.document.AbstractPreparator
      extended by net.sf.regain.crawler.preparator.ExternalPreparator
All Implemented Interfaces:
Pluggable, Preparator, WriteablePreparator

public class ExternalPreparator
extends AbstractPreparator

Prepares a document by calling an external program that writes the plain text to Standard.out.

Author:
Til Schneider, www.murfman.de, Paul Ortyl

Field Summary
private  boolean[] mCheckExitCodeArr
           
private  String[] mCommandLineArr
          The command pattern.
private  org.apache.regexp.RE[] mUrlRegexArr
           
 
Fields inherited from interface net.sf.regain.crawler.document.Preparator
DEFAULT_BUFFER_SIZE
 
Constructor Summary
ExternalPreparator()
          Creates a new instance of ExternalPreparator.
 
Method Summary
 boolean accepts(RawDocument rawDocument)
          Gets whether the preparator is able to process the given document.
 void init(PreparatorConfig config)
          Initializes the preparator.
 void prepare(RawDocument rawDocument)
          Prepares a document for indexing.
 
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator
addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mCommandLineArr

private String[] mCommandLineArr
The command pattern.


mUrlRegexArr

private org.apache.regexp.RE[] mUrlRegexArr

mCheckExitCodeArr

private boolean[] mCheckExitCodeArr
Constructor Detail

ExternalPreparator

public ExternalPreparator()
                   throws RegainException
Creates a new instance of ExternalPreparator.

Throws:
RegainException - If creating the preparator failed.
Method Detail

init

public void init(PreparatorConfig config)
          throws RegainException
Description copied from class: AbstractPreparator
Initializes the preparator.

Does nothing by default. May be overridden by subclasses.

Specified by:
init in interface Pluggable
Overrides:
init in class AbstractPreparator
Parameters:
config - The configuration for this preparator.
Throws:
RegainException - If the regular expression or the configuration has an error.

accepts

public boolean accepts(RawDocument rawDocument)
Description copied from class: AbstractPreparator
Gets whether the preparator is able to process the given document. This is the case, if its URL matches the URL regex.

Specified by:
accepts in interface Preparator
Overrides:
accepts in class AbstractPreparator
Parameters:
rawDocument - The document to check.
Returns:
Whether the preparator is able to process the given document.
See Also:
AbstractPreparator.setUrlRegex(RE)

prepare

public void prepare(RawDocument rawDocument)
             throws RegainException
Description copied from interface: Preparator
Prepares a document for indexing.

Parameters:
rawDocument - The document to prepare.
Throws:
RegainException - If preparing the document failed.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info