Regain 2.1.0-STABLE API

net.sf.regain.crawler.preparator
Class AbstractJacobMsOfficePreparator

java.lang.Object
  extended by net.sf.regain.crawler.document.AbstractPreparator
      extended by net.sf.regain.crawler.preparator.AbstractJacobMsOfficePreparator
All Implemented Interfaces:
Pluggable, Preparator, WriteablePreparator
Direct Known Subclasses:
JacobMsExcelPreparator, JacobMsPowerPointPreparator, JacobMsWordPreparator

public abstract class AbstractJacobMsOfficePreparator
extends AbstractPreparator

Author:
Tilman Schneider, STZ-IDA an der FH Karlsruhe

Field Summary
private  HashMap<String,com.jacob.com.Variant> mPropertyMap
          Holds the document properties that may be extracted from a word document.
private  String[] mWantedPropertiesArr
          The properties that should be extracted.
 
Fields inherited from interface net.sf.regain.crawler.document.Preparator
DEFAULT_BUFFER_SIZE
 
Constructor Summary
AbstractJacobMsOfficePreparator(String[] extensionArr)
          Creates a new instance of JacobMsWordPreparator.
 
Method Summary
 void init(PreparatorConfig config)
          Initializes the preparator.
protected  void readProperties(com.jacob.com.Dispatch document)
          Reads the configured document properties from a MS Office document.
 
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface net.sf.regain.crawler.document.Preparator
prepare
 

Field Detail

mWantedPropertiesArr

private String[] mWantedPropertiesArr
The properties that should be extracted.


mPropertyMap

private HashMap<String,com.jacob.com.Variant> mPropertyMap
Holds the document properties that may be extracted from a word document. (key: The property name (String), value: The property constant (Variant))

Constructor Detail

AbstractJacobMsOfficePreparator

public AbstractJacobMsOfficePreparator(String[] extensionArr)
                                throws RegainException
Creates a new instance of JacobMsWordPreparator.

Parameters:
extensionArr - The file extensions a URL must have one to be accepted by this preparator.
Throws:
RegainException - If creating the preparator failed.
Method Detail

init

public void init(PreparatorConfig config)
          throws RegainException
Initializes the preparator.

Specified by:
init in interface Pluggable
Overrides:
init in class AbstractPreparator
Parameters:
config - The configuration.
Throws:
RegainException - If the configuration has an error.

readProperties

protected void readProperties(com.jacob.com.Dispatch document)
Reads the configured document properties from a MS Office document.

Parameters:
document - The document to read the properties from.

Regain 2.1.0-STABLE API

Regain 2.1.0-STABLE, Copyright (C) 2004-2010 Til Schneider, www.murfman.de, Thomas Tesche, www.clustersystems.info