|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
net.sf.regain.crawler.preparator.JacobMsPowerPointPreparator
public class JacobMsPowerPointPreparator
Präpariert ein Microsoft-Powerpoint-Dokument für die Indizierung mit Hilfe der Jacob-API, wobei Jacobgen genutzt wurde, um den Zugriff zu erleichtern.
Dabei werden die Rohdaten des Dokuments von Formatierungsinformation befreit, es wird der Titel extrahiert.
Field Summary | |
---|---|
private de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Application |
mPowerPointApplication
Die PowerPoint-Applikation. |
private static int |
MSOGROUP
|
Fields inherited from interface net.sf.regain.crawler.document.Preparator |
---|
DEFAULT_BUFFER_SIZE |
Constructor Summary | |
---|---|
JacobMsPowerPointPreparator()
Creates a new instance of JacobMsPowerPointPreparator. |
Method Summary | |
---|---|
void |
close()
Frees all resources reserved by the preparator. |
private void |
extractTextFrom(de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Shape shape,
StringBuffer contentBuf)
Extrahiert den Text aus einem Powerpoint-Form-Objekt und tr�gt ihn in den StringBuffer ein. |
void |
init(PreparatorConfig config)
Initializes the preparator. |
void |
prepare(RawDocument rawDocument)
Präpariert ein Dokument für die Indizierung. |
private String |
removeHyphenation(String text)
RB: Eliminates hyphenation either -\n\r or -\013 |
Methods inherited from class net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator |
---|
readProperties |
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
---|
accepts, addAdditionalField, cleanUp, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Application mPowerPointApplication
null
, solange noch kein Dokument
bearbeitet wurde.
private static int MSOGROUP
Constructor Detail |
---|
public JacobMsPowerPointPreparator() throws RegainException
RegainException
- If creating the preparator failed.Method Detail |
---|
public void init(PreparatorConfig config) throws RegainException
init
in interface Pluggable
init
in class AbstractJacobMsOfficePreparator
config
- The configuration
RegainException
- If the configuration has an error.public void prepare(RawDocument rawDocument) throws RegainException
rawDocument
- Das zu pr�pariernde Dokument.
RegainException
- Wenn die Pr�paration fehl schlug.private void extractTextFrom(de.filiadata.lucene.spider.generated.msoffice2000.powerpoint.Shape shape, StringBuffer contentBuf)
shape
- Das zu durchsuchende Powerpoint-Form-Objekt.contentBuf
- Der Puffer in den der evtl. gefundene Text einzutragen
ist.private String removeHyphenation(String text)
public void close() throws RegainException
Is called at the end of the crawler process after all documents were processed.
close
in interface Preparator
close
in class AbstractPreparator
RegainException
- If freeing the resources failed.
|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |