| 
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface WriteablePreparator
Prepares a document for indexing. Via this interface, the values of the preparator can be changed from the outside.
This is done by extracting the raw text from a document. In other words the document is stripped from formating information. Specific text parts like a title or a summary may be extracted as well.
The procedure of preparation is the following:
Pluggable.init(PreparatorConfig) is called.Preparator.accepts(RawDocument) is called.true was returned the actual preparation of the document
     is made:
     Preparator.prepare(RawDocument) is called. The preparator extracts
         now all nessesary information.Preparator.getCleanedContent(), Preparator.getHeadlines(),
         Preparator.getPath(), Preparator.getSummary() and Preparator.getTitle().Preparator.cleanUp() is called. The preparator should release all
         information about the current document in order to prepare the
         next one.Preparator.close() is called.
 
| Field Summary | 
|---|
| Fields inherited from interface net.sf.regain.crawler.document.Preparator | 
|---|
DEFAULT_BUFFER_SIZE | 
| Method Summary | |
|---|---|
 void | 
addAdditionalField(String fieldName,
                   String fieldValue)
Adds an additional field to the current document.  | 
 void | 
setCleanedContent(String cleanedContent)
Setzt von Formatierungsinformation befreiten Inhalt des Dokuments, das gerade Präpariert wird.  | 
 void | 
setCleanedMetaData(String mCleanedMetaData)
 | 
 void | 
setHeadlines(String headlines)
Setzt die überschriften, in im Dokument, das gerade Präpariert wird, gefunden wurden.  | 
 void | 
setSummary(String summary)
Setzt die Zusammenfassung des Dokuments, das gerade Präpariert wird.  | 
 void | 
setTitle(String title)
Setzt den Titel des Dokuments, das gerade Präpariert wird.  | 
| Methods inherited from interface net.sf.regain.crawler.document.Preparator | 
|---|
accepts, cleanUp, close, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, prepare, setPriority, setUrlRegex | 
| Methods inherited from interface net.sf.regain.crawler.document.Pluggable | 
|---|
init | 
| Method Detail | 
|---|
void addAdditionalField(String fieldName,
                        String fieldValue)
This field will be indexed and stored.
fieldName - The name of the field.fieldValue - The value of the field.void setCleanedMetaData(String mCleanedMetaData)
mCleanedMetaData - the mCleanedMetaData to setvoid setCleanedContent(String cleanedContent)
cleanedContent - void setSummary(String summary)
summary - Die Zusammenfassungvoid setHeadlines(String headlines)
headlines - Die Zusammenfassungvoid setTitle(String title)
title - Der Titel.
  | 
Regain 2.1.0-STABLE API | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||