|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.ExternalPreparator
public class ExternalPreparator
Prepares a document by calling an external program that writes the plain text to Standard.out.
Field Summary | |
---|---|
private boolean[] |
mCheckExitCodeArr
|
private String[] |
mCommandLineArr
The command pattern. |
private org.apache.regexp.RE[] |
mUrlRegexArr
|
Fields inherited from interface net.sf.regain.crawler.document.Preparator |
---|
DEFAULT_BUFFER_SIZE |
Constructor Summary | |
---|---|
ExternalPreparator()
Creates a new instance of ExternalPreparator. |
Method Summary | |
---|---|
boolean |
accepts(RawDocument rawDocument)
Gets whether the preparator is able to process the given document. |
void |
init(PreparatorConfig config)
Initializes the preparator. |
void |
prepare(RawDocument rawDocument)
Prepares a document for indexing. |
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
---|
addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private String[] mCommandLineArr
private org.apache.regexp.RE[] mUrlRegexArr
private boolean[] mCheckExitCodeArr
Constructor Detail |
---|
public ExternalPreparator() throws RegainException
RegainException
- If creating the preparator failed.Method Detail |
---|
public void init(PreparatorConfig config) throws RegainException
AbstractPreparator
Does nothing by default. May be overridden by subclasses.
init
in interface Pluggable
init
in class AbstractPreparator
config
- The configuration for this preparator.
RegainException
- If the regular expression or the configuration
has an error.public boolean accepts(RawDocument rawDocument)
AbstractPreparator
accepts
in interface Preparator
accepts
in class AbstractPreparator
rawDocument
- The document to check.
AbstractPreparator.setUrlRegex(RE)
public void prepare(RawDocument rawDocument) throws RegainException
Preparator
rawDocument
- The document to prepare.
RegainException
- If preparing the document failed.
|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |