Community Discussions and Support
Unix Mailbox Format (mbox) indexing does not work properly?

In your example, sure, that will throw the parser for a loop.  However, if one has to choose between an algorithm that is less than perfect and one that is just plain outright error-prone, if those are the only two choices, then doesn't it make logical sense to use the one that statistically will result in the least amount of errors (even if it isn't perfect)?

Finding a rogue "From ???@??? " in the body text of a message is going to be FAR less likely to occur than in messages with "from today" or "From everyone" or the other millions of other possible english phrases that now do result in a corrupted PMG index file being created, as my own experience showed me.

In any case, I know how to fix my own MBX files, to get my messages out and into a normal Pegasus Mail folder, so I'm going to do that and then never use the MBX format in Pegasus Mail again.   I just didn't realize Pegasus was not actually able to properly create a PGM index file from scanning a MBX file, like Thunderbird is able to create its index MSF file directly from a local folder (mbox file), until yesterday; it was like someone sucker punched me in the gut, having used Pegasus Mail since 1995 and never realizing this fact.


 

 


 


 

In your example, sure, that will throw the parser for a loop.  However, if one has to choose between an algorithm that is less than perfect and one that is just plain outright error-prone, if those are the only two choices, then doesn't it make logical sense to use the one that statistically will result in the least amount of errors (even if it isn't perfect)? <p>Finding a rogue "From ???@??? " in the body text of a message is going to be FAR less likely to occur than in messages with "from today" or "From everyone" or the other millions of other possible english phrases that now do result in a corrupted PMG index file being created, as my own experience showed me.</p><p>In any case, I know how to fix my own MBX files, to get my messages out and into a normal Pegasus Mail folder, so I'm going to do that and then never use the MBX format in Pegasus Mail again.   I just didn't realize Pegasus was not actually able to properly create a PGM index file from scanning a MBX file, like Thunderbird is able to create its index MSF file directly from a local folder (mbox file), until yesterday; it was like someone sucker punched me in the gut, having used Pegasus Mail since 1995 and never realizing this fact. </p><p>  </p><p> </p><p>  </p><p>  </p>

I hope there is something I am obviously doing wrong, but it appears that Pegasus Mail (v4.41) is not able to properly index a mbox (.MBX) folder?  It seems that the index routine that scans through a mbox file counts any line that begins with "from " (of any case combination of those characters, from what I have been able to replicate in testing) as the start of a new message, not just lines that start with "From ???@???".  Because of that, I have .mbx folders that are "hiding" messages that should show up in the folder and I also have some messages that do show up but without header information (the raw message view starts with say a line that says "from everything I've heard" or such). 

I also found that in some instances, the end of previous message did

not get a CRLF, so that the "From ???@???" part wasn't at the beginning

of a new line and that then jumbles together two messages into one,

when viewed in Pegasus.

I understand this format was intended for conversion processes, but it is still useful to store messages for long term storage in a format that other utilities could easily import and I didn't realized there was a flaw that made it very unwise to use it for anything other than a step in a conversion process to a new e-mail application ???


 

 

 

 

 

 

<p>I hope there is something I am obviously doing wrong, but it appears that Pegasus Mail (v4.41) is not able to properly index a mbox (.MBX) folder?  It seems that the index routine that scans through a mbox file counts any line that begins with "from " (of any case combination of those characters, from what I have been able to replicate in testing) as the start of a new message, <u><b>not</b></u> just lines that start with "From ???@???".  Because of that, I have .mbx folders that are "hiding" messages that should show up in the folder and I also have some messages that do show up but without header information (the raw message view starts with say a line that says "from everything I've heard" or such).  </p><p>I also found that in some instances, the end of previous message did not get a CRLF, so that the "From ???@???" part wasn't at the beginning of a new line and that then jumbles together two messages into one, when viewed in Pegasus.</p><p>I understand this format was intended for conversion processes, but it is still useful to store messages for long term storage in a format that other utilities could easily import and I didn't realized there was a flaw that made it very unwise to use it for anything other than a step in a conversion process to a new e-mail application ??? </p><p> </p><p> </p><p> </p><p>   </p><p> </p><p> </p>

[quote user="ivorygate"]

I hope there is something I am obviously doing wrong, but it appears that Pegasus Mail (v4.41) is not able to properly index a mbox (.MBX) folder?  It seems that the index routine that scans through a mbox file counts any line that begins with "from " (of any case combination of those characters, from what I have been able to replicate in testing) as the start of a new message, not just lines that start with "From ???@???".  Because of that, I have .mbx folders that are "hiding" messages that should show up in the folder and I also have some messages that do show up but without header information (the raw message view starts with say a line that says "from everything I've heard" or such). 

I also found that in some instances, the end of previous message did

not get a CRLF, so that the "From ???@???" part wasn't at the beginning

of a new line and that then jumbles together two messages into one,

when viewed in Pegasus.

I understand this format was intended for conversion processes, but it is still useful to store messages for long term storage in a format that other utilities could easily import and I didn't realized there was a flaw that made it very unwise to use it for anything other than a step in a conversion process to a new e-mail application ???

[/quote]

 

And you have found the major problem with the  MBOX format.  Pegasus Mail does not munge the "From" in the first position of any line in the body of the message.  If you want to use this as a storage format you will have to do that yourself while then are in the CNM form before moving them to the MBX type folders.
 

However a better method than using MBX to  provide interoperability between Pegasus Mail and other mail applications is to setup Mercury/32 with Pegasus Mail.  You can then use IMAP4 to allow the Pegasus Mail folders to be accessed from any IMAP4 application.  Works quite well for me with Netscape, T-bird, Outlook, OE, Eudora, etc when testing mail clients.

[quote user="ivorygate"]<p>I hope there is something I am obviously doing wrong, but it appears that Pegasus Mail (v4.41) is not able to properly index a mbox (.MBX) folder?  It seems that the index routine that scans through a mbox file counts any line that begins with "from " (of any case combination of those characters, from what I have been able to replicate in testing) as the start of a new message, <u><b>not</b></u> just lines that start with "From ???@???".  Because of that, I have .mbx folders that are "hiding" messages that should show up in the folder and I also have some messages that do show up but without header information (the raw message view starts with say a line that says "from everything I've heard" or such).  </p><p>I also found that in some instances, the end of previous message did not get a CRLF, so that the "From ???@???" part wasn't at the beginning of a new line and that then jumbles together two messages into one, when viewed in Pegasus.</p><p>I understand this format was intended for conversion processes, but it is still useful to store messages for long term storage in a format that other utilities could easily import and I didn't realized there was a flaw that made it very unwise to use it for anything other than a step in a conversion process to a new e-mail application ??? </p><p>[/quote]</p><p> </p><p>And you have found the major problem with the  MBOX format.  Pegasus Mail does not munge the "From" in the first position of any line in the body of the message.  If you want to use this as a storage format you will have to do that yourself while then are in the CNM form before moving them to the MBX type folders.  </p><p>However a better method than using MBX to  provide interoperability between Pegasus Mail and other mail applications is to setup Mercury/32 with Pegasus Mail.  You can then use IMAP4 to allow the Pegasus Mail folders to be accessed from any IMAP4 application.  Works quite well for me with Netscape, T-bird, Outlook, OE, Eudora, etc when testing mail clients. </p>

I understand the theory behind the issue, the Unix Mailbox Format employed by Pegasus Mail as being an mbox variant, but I still don't understand why the generation of the .PMG index file doesn't work properly.  The index file is used by Pegasus and Pegasus alone, so I don't know why the index routine doesn't match only on "From ???@??? " to index a new mail message, instead of every variant of "from " at the beginning of a line.

I would say that Pegasus Mail shouldn't even allow .MBX  folders to be created and used within the program, along side PMM files, but only used as Pegasus Mail folder Export option, given the current implementation. 

In any case, last night I wrote a quick'n dirty script to add a space to the front of any "from " at the beginning of a line that wasn't a "From ???@??? " match and then after that make sure all "From ???@???" matches started at the beginning of a new line.  I delete the .PMG index file, start back up Pegasus, and then move all of those messages into a standard PMM folder instead.   The lesson being never use the MBX format in Pegasus, except for one-off exporting purposes.


 

 


 

<p>I understand the theory behind the issue, the Unix Mailbox Format employed by Pegasus Mail as being an mbox variant, but I still don't understand why the generation of the .PMG index file doesn't work properly.  The index file is used by Pegasus and Pegasus alone, so I don't know why the index routine doesn't match only on "From ???@??? " to index a new mail message, instead of every variant of "from " at the beginning of a line.</p><p>I would say that Pegasus Mail shouldn't even allow .MBX  folders to be created and used within the program, along side PMM files, but only used as Pegasus Mail folder Export option, given the current implementation.  </p><p>In any case, last night I wrote a quick'n dirty script to add a space to the front of any "from " at the beginning of a line that wasn't a "From ???@??? " match and then after that make sure all "From ???@???" matches started at the beginning of a new line.  I delete the .PMG index file, start back up Pegasus, and then move all of those messages into a standard PMM folder instead.   The lesson being never use the MBX format in Pegasus, except for one-off exporting purposes. </p><p>  </p><p> </p><p>  </p>

[quote user="ivorygate"]

I understand the theory behind the issue, the Unix Mailbox Format employed by Pegasus Mail as being an mbox variant, but I still don't understand why the generation of the .PMG index file doesn't work properly.  The index file is used by Pegasus and Pegasus alone, so I don't know why the index routine doesn't match only on "From ???@??? " to index a new mail message, instead of every variant of "from " at the beginning of a line.

I would say that Pegasus Mail shouldn't even allow .MBX  folders to be created and used within the program, along side PMM files, but only used as Pegasus Mail folder Export option, given the current implementation. 

In any case, last night I wrote a quick'n dirty script to add a space to the front of any "from " at the beginning of a line that wasn't a "From ???@??? " match and then after that make sure all "From ???@???" matches started at the beginning of a new line.  I delete the .PMG index file, start back up Pegasus, and then move all of those messages into a standard PMM folder instead.   The lesson being never use the MBX format in Pegasus, except for one-off exporting purposes.

[/quote]

The index file is simply looking for the start of a message, a "From" in the start of a line.  Since the actual message bodies are not munged the index simply treas these like any MBOX message.  There is no standard whay that a MBOX file will be generated by an application, there are a lot of variations.  People have asked to be allowed to create an MBOX type file and that what they get.  The only way you can ensure that a from in the body of a message is not the start of a message is to parse the line starting with a from and try and compare if to any possible from line and then home that there is not someone sendiong a message talking about the various way is can be done and put the string at the start of a line of body text.  How would your parser handle 

From ???@??? is the way that some MBX mailer delineate the start of a new message.

in this if it were an email message.


 

 

 

[quote user="ivorygate"]<p>I understand the theory behind the issue, the Unix Mailbox Format employed by Pegasus Mail as being an mbox variant, but I still don't understand why the generation of the .PMG index file doesn't work properly.  The index file is used by Pegasus and Pegasus alone, so I don't know why the index routine doesn't match only on "From ???@??? " to index a new mail message, instead of every variant of "from " at the beginning of a line.</p><p>I would say that Pegasus Mail shouldn't even allow .MBX  folders to be created and used within the program, along side PMM files, but only used as Pegasus Mail folder Export option, given the current implementation.  </p><p>In any case, last night I wrote a quick'n dirty script to add a space to the front of any "from " at the beginning of a line that wasn't a "From ???@??? " match and then after that make sure all "From ???@???" matches started at the beginning of a new line.  I delete the .PMG index file, start back up Pegasus, and then move all of those messages into a standard PMM folder instead.   The lesson being never use the MBX format in Pegasus, except for one-off exporting purposes. </p><p>[/quote]</p><p>The index file is simply looking for the start of a message, a "From" in the start of a line.  Since the actual message bodies are not munged the index simply treas these like any MBOX message.  There is no standard whay that a MBOX file will be generated by an application, there are a lot of variations.  People have asked to be allowed to create an MBOX type file and that what they get.  The only way you can ensure that a from in the body of a message is not the start of a message is to parse the line starting with a from and try and compare if to any possible from line and then home that there is not someone sendiong a message talking about the various way is can be done and put the string at the start of a line of body text.  How would your parser handle  </p><p>From ???@??? is the way that some MBX mailer delineate the start of a new message. </p><p>in this if it were an email message.</p><p>  </p><p> </p><p> </p>
live preview
enter atleast 10 characters
WARNING: You mentioned %MENTIONS%, but they cannot see this message and will not be notified
Saving...
Saved
With selected deselect posts show selected posts
All posts under this topic will be deleted ?
Pending draft ... Click to resume editing
Discard draft