|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.ExternalPreparator
public class ExternalPreparator
Prepares a document by calling an external program that writes the plain text to Standard.out.
| Field Summary | |
|---|---|
private boolean[] |
mCheckExitCodeArr
|
private String[] |
mCommandLineArr
The command pattern. |
private org.apache.regexp.RE[] |
mUrlRegexArr
|
| Fields inherited from interface net.sf.regain.crawler.document.Preparator |
|---|
DEFAULT_BUFFER_SIZE |
| Constructor Summary | |
|---|---|
ExternalPreparator()
Creates a new instance of ExternalPreparator. |
|
| Method Summary | |
|---|---|
boolean |
accepts(RawDocument rawDocument)
Gets whether the preparator is able to process the given document. |
void |
init(PreparatorConfig config)
Initializes the preparator. |
void |
prepare(RawDocument rawDocument)
Prepares a document for indexing. |
| Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
|---|
addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private String[] mCommandLineArr
private org.apache.regexp.RE[] mUrlRegexArr
private boolean[] mCheckExitCodeArr
| Constructor Detail |
|---|
public ExternalPreparator()
throws RegainException
RegainException - If creating the preparator failed.| Method Detail |
|---|
public void init(PreparatorConfig config)
throws RegainException
AbstractPreparatorDoes nothing by default. May be overridden by subclasses.
init in interface Pluggableinit in class AbstractPreparatorconfig - The configuration for this preparator.
RegainException - If the regular expression or the configuration
has an error.public boolean accepts(RawDocument rawDocument)
AbstractPreparator
accepts in interface Preparatoraccepts in class AbstractPreparatorrawDocument - The document to check.
AbstractPreparator.setUrlRegex(RE)
public void prepare(RawDocument rawDocument)
throws RegainException
Preparator
rawDocument - The document to prepare.
RegainException - If preparing the document failed.
|
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||