net.sf.regain.crawler.preparator
Class AbstractJacobMsOfficePreparator
java.lang.Object
net.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
- All Implemented Interfaces:
- Pluggable, Preparator, WriteablePreparator
- Direct Known Subclasses:
- JacobMsExcelPreparator, JacobMsPowerPointPreparator, JacobMsWordPreparator
public abstract class AbstractJacobMsOfficePreparator
- extends AbstractPreparator
- Author:
- Tilman Schneider, STZ-IDA an der FH Karlsruhe
Method Summary |
void |
init(PreparatorConfig config)
Initializes the preparator. |
protected void |
readProperties(com.jacob.com.Dispatch document)
Reads the configured document properties from a MS Office document. |
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
mWantedPropertiesArr
private String[] mWantedPropertiesArr
- The properties that should be extracted.
mPropertyMap
private HashMap<String,com.jacob.com.Variant> mPropertyMap
- Holds the document properties that may be extracted from a word document.
(key: The property name (String), value: The property constant (Variant))
AbstractJacobMsOfficePreparator
public AbstractJacobMsOfficePreparator(String[] extensionArr)
throws RegainException
- Creates a new instance of JacobMsWordPreparator.
- Parameters:
extensionArr
- The file extensions a URL must have one to be accepted
by this preparator.
- Throws:
RegainException
- If creating the preparator failed.
init
public void init(PreparatorConfig config)
throws RegainException
- Initializes the preparator.
- Specified by:
init
in interface Pluggable
- Overrides:
init
in class AbstractPreparator
- Parameters:
config
- The configuration.
- Throws:
RegainException
- If the configuration has an error.
readProperties
protected void readProperties(com.jacob.com.Dispatch document)
- Reads the configured document properties from a MS Office document.
- Parameters:
document
- The document to read the properties from.
Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info