• Please review our updated Terms and Rules here

Importing old messages

mbbrutman

Associate Cat Herder
Staff member
Joined
May 3, 2003
Messages
6,417
Shawn provided me with a Zip file of the Yahoo! group that somebody scraped.

The messages are in "email" format, so they look like full emails with all of the headers. vBulletin has an API for creating threads. Writing some code to import each message and add it to a new thread (or an existing thread if the subject start with "Re:") should be be too terrible.

For now I think I want to write a simple script to just get the date, subject, sender and the body text onto a static web page. That's now more than a few hours of work and it would be searchable, but it would not be threaded. Getting things threaded and/or importing into vBulletin is a longer term project.


Thoughts?
 
.EML - nice, human readable text. The filenames are just sequence numbers.
 
Shawn provided me with a Zip file of the Yahoo! group that somebody scraped.

The messages are in "email" format, so they look like full emails with all of the headers. vBulletin has an API for creating threads. Writing some code to import each message and add it to a new thread (or an existing thread if the subject start with "Re:") should be be too terrible.

For now I think I want to write a simple script to just get the date, subject, sender and the body text onto a static web page. That's now more than a few hours of work and it would be searchable, but it would not be threaded. Getting things threaded and/or importing into vBulletin is a longer term project.


Thoughts?

This is a great idea, and will be incredibly helpful for future searchers. vBulletin integration will be nice, but at least there will be something to refer to (with a stickied link at the top of the Grid forum?) when digging for old info.

Has a home been found for the files from the group?
 
I've been meaning to get to the message import but I got sidelined by something horrible. Trust me, it's a good excuse. It will happen in the next few weeks.

Hosting the files here is still possible; I just need to see what kind of copyright risk we would be taking on. One thing that works in our favor is that we are a registered 501C3 with a real museum, so we have more latitude to protect and preserve software than I let on. It's the distribution part that we need to be careful about.
 
Some progress:

http://www.brutman.com/RuGRiD/

That directory has 8 HTML files which have 500 messages each. The messages have some light formatting on them.

Known problems/limitations:
  • Many of the messages are "multi-part" and include an HTML version and a plaintext version; the HTML version of those messages is suppressed for now until I can properly sanitize the HTML in them. (It includes things like <head> and <body> tags which screw the overall page up.
  • Attached pictures and files are not included yet.
  • This is a prototype. The final location will be on something owned by VCFed.org

Please have a look and let me know about outright bugs. Things like completely missing message bodies might still be happening. The formatting is rough, but it is as it appears in the originals.

Reading MIME emails has been more challenging than I expected. :)
 
I updated the files again today; the files should be more complete and readable.

Still to come - attachments.
 
Back
Top