|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface WriteablePreparator
Prepares a document for indexing. Via this interface, the values of the preparator can be changed from the outside.
This is done by extracting the raw text from a document. In other words the document is stripped from formating information. Specific text parts like a title or a summary may be extracted as well.
The procedure of preparation is the following:
Pluggable.init(PreparatorConfig)
is called.Preparator.accepts(RawDocument)
is called.true
was returned the actual preparation of the document
is made:
Preparator.prepare(RawDocument)
is called. The preparator extracts
now all nessesary information.Preparator.getCleanedContent()
, Preparator.getHeadlines()
,
Preparator.getPath()
, Preparator.getSummary()
and Preparator.getTitle()
.Preparator.cleanUp()
is called. The preparator should release all
information about the current document in order to prepare the
next one.Preparator.close()
is called.
Field Summary |
---|
Fields inherited from interface net.sf.regain.crawler.document.Preparator |
---|
DEFAULT_BUFFER_SIZE |
Method Summary | |
---|---|
void |
addAdditionalField(String fieldName,
String fieldValue)
Adds an additional field to the current document. |
void |
setCleanedContent(String cleanedContent)
Setzt von Formatierungsinformation befreiten Inhalt des Dokuments, das gerade Präpariert wird. |
void |
setCleanedMetaData(String mCleanedMetaData)
|
void |
setHeadlines(String headlines)
Setzt die überschriften, in im Dokument, das gerade Präpariert wird, gefunden wurden. |
void |
setSummary(String summary)
Setzt die Zusammenfassung des Dokuments, das gerade Präpariert wird. |
void |
setTitle(String title)
Setzt den Titel des Dokuments, das gerade Präpariert wird. |
Methods inherited from interface net.sf.regain.crawler.document.Preparator |
---|
accepts, cleanUp, close, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, prepare, setPriority, setUrlRegex |
Methods inherited from interface net.sf.regain.crawler.document.Pluggable |
---|
init |
Method Detail |
---|
void addAdditionalField(String fieldName, String fieldValue)
This field will be indexed and stored.
fieldName
- The name of the field.fieldValue
- The value of the field.void setCleanedMetaData(String mCleanedMetaData)
mCleanedMetaData
- the mCleanedMetaData to setvoid setCleanedContent(String cleanedContent)
cleanedContent
- void setSummary(String summary)
summary
- Die Zusammenfassungvoid setHeadlines(String headlines)
headlines
- Die Zusammenfassungvoid setTitle(String title)
title
- Der Titel.
|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |