|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectnet.sf.regain.crawler.document.AbstractPreparator
net.sf.regain.crawler.preparator.MessagePreparator
public class MessagePreparator
This class prepares messages (MIME, rfc822), specifically spoof email messages.
The document contains the message text and the file names of the attachments.
MessagePreparator
Field Summary | |
---|---|
private static org.apache.log4j.Logger |
mLog
The logger for this class |
private static java.util.regex.Pattern |
mURLPattern
Regex Compilation to match URLs in body. |
Fields inherited from interface net.sf.regain.crawler.document.Preparator |
---|
DEFAULT_BUFFER_SIZE |
Constructor Summary | |
---|---|
MessagePreparator()
Creates a new instance of MessagePreparator. |
Method Summary | |
---|---|
private Collection<String> |
extractURLs(String text)
Extract URLs from text source. |
private javax.mail.Address[] |
fixAddress(javax.mail.internet.AddressException ae,
javax.mail.internet.MimeMessage message,
String headerName)
Occasionally see Addresses that have semi-colons rather than commas, which cause "Illegal semicolon, not in group" AddressException. |
static String |
inputStreamAsString(InputStream stream)
Get the content of an InputStream as String. |
void |
prepare(RawDocument rawDocument)
Prepares the document for indexing. |
private String |
stripNoneWordChars(String uncleanString)
Removes unwanted chars from a given string. |
Methods inherited from class net.sf.regain.crawler.document.AbstractPreparator |
---|
accepts, addAdditionalField, cleanUp, close, concatenateStringParts, getAdditionalFields, getCleanedContent, getCleanedMetaData, getHeadlines, getPath, getPriority, getSummary, getTitle, init, setCleanedContent, setCleanedMetaData, setHeadlines, setPath, setPriority, setSummary, setTitle, setUrlRegex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static org.apache.log4j.Logger mLog
private static java.util.regex.Pattern mURLPattern
Constructor Detail |
---|
public MessagePreparator() throws RegainException
RegainException
- If creating of the preparator failed.Method Detail |
---|
public void prepare(RawDocument rawDocument) throws RegainException
rawDocument
- The document to prepare.
RegainException
- If the preparation fails.private javax.mail.Address[] fixAddress(javax.mail.internet.AddressException ae, javax.mail.internet.MimeMessage message, String headerName)
ae
- Address Exception objectmessage
- MIME Message objectheaderName
- Name of header, e.g. To, From, Reply-To
private Collection<String> extractURLs(String text)
text
- input string of text or HTML
private String stripNoneWordChars(String uncleanString)
uncleanString
-
public static String inputStreamAsString(InputStream stream) throws IOException
stream
- the InputStream
IOException
|
Regain 2.1.0-STABLE API | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |