David,
Thank you for your quick response. First let me clarify a couple of things. I didn't say or suggest that PMail is the only mail program to use monolithic folder file systems. I said it is the only PROGRAM that I have that still uses that method. The only email program I have is PMail. I prefer it over all the others I have tried over the years. Also, I have had to reindex PMail folders on numerous occasions, sometimes without success. It is painful. I'm sorry but I can't agree that any data loss from reindexing is necessary, inevitable, or acceptable. Clearly it is inevitable with the current PMail file system but I know of no database system that loses data much less expects to lose data just from reindexing.
The bottom line on corruption is that any data storage system can be damaged. I have had non-recoverable data loss in my PMail system. However, anecdotal as it may be, in fifteen years of using Sybase SQL databases and ten with MySQL, on both Unix and Windows systems, I have never lost any data due to corruption. Perhaps I am lucky but I have never even suffered a corrupted MySQL index. Of course I have suffered data loss with other systems. In the mid-80's I wrote and marketed a database engine for Pascal programmers that created and used dBASE II and III database files, in part because dBASE itself frequently corrupted databases. But dBASE is ancient history and we are way past that horror. Regardless of the system, I believe that the BEST defense is to do regular backups and while I suppose it is debatable, I think it is easier to backup a single file instead of thousands of files.
I can't say what most databases do but I suggested using MySQL because it is reliable and supports a large number of data types including variable-length columns to conserve disk space. The docs say the largest text-blob column type can store 4,294,967,295 bytes, however that requires a 64-bit system and lots of memory. They say that the practical limits for the rest of us are "around some hundreds of megs per BLOB". Since it supports full-text indexing of text blobs I believe it can handle the search requirements. It even lets you search and then sort the results by a numeric relevance score that lets you display the results the way web search-engines do, with the "best" results first. Full-text searching can be done utilizing just the indexes so it can be fast.
Yes, the biggest problem I have with PMail is the search speed. I have a relatively fast dual-core computer with 2GB of memory and a large and reasonably fast SATA disk but sometimes when searching my entire email archive I might as well go get a cup of coffee because it is pointless to stare at the screen hoping it will finish. In contrast, I am used to using database applications that I develop using MySQL which are amazingly fast.
Here is an example. I am in San Francisco and the slowest computer I have is an older single Celeron box with 256MB of memory running LAMP (Linux/Apache/MySQL/PHP) as a web server, hosted at a server farm in Texas. It runs eight database driven web sites. On one of those sites I have a web page that produces a report of hourly usage stats for six aircraft over 31 contiguous days. To do that it executes 190 separate SQL queries with table joins, sorting of result tables, etc. against a set of tables some of which have 50,000 records. True, it isn't a very large database but a request for that page takes just 735ms round-trip. That includes the time it takes to execute the program that issues the queries and assembles the results and formats it all with HTML, AND the overhead of sending the request and response over the net. I just tested it using Firefox and the YSlow plug-in performance analysis tool and that was the worst time it recorded. I probably could have restructured it to do fewer queries, and be even faster, but why bother?
All I am trying to demonstrate here is that MySQL is fast and I expect that it could produce dramatically fast results searching email messages. Of course, different applications, different data sets and schemas and different platforms will have different performance. For instance, to use MySQL for Windows with Borland Delphi produced programs, as I do, requires the use of ODBC. Consequently MySQL isn't as fast as Sybase SQL-Anywhere on Windows because SQL-Anywhere is optimized for ODBC and uses ODBC internally.
Only you can determine the best solution. All I can do is make suggestions based on my experience.
Thanks for listening.
--Richard
<P>David,</P>
<P>Thank you for your quick response.&nbsp; First let me clarify a couple of things.&nbsp; I didn't say or suggest that PMail is the only mail program to use&nbsp;monolithic folder file systems. I said it is the only&nbsp;PROGRAM that I have that still uses that method.&nbsp; The only email program I&nbsp;have&nbsp;is PMail.&nbsp; I prefer it over all the others I have tried over the years. &nbsp;Also, I have had to reindex PMail folders on numerous occasions, sometimes without success.&nbsp; It is painful.&nbsp; I'm sorry but I can't agree that any data loss from reindexing is necessary,&nbsp; inevitable, or acceptable. Clearly it is inevitable with the current PMail file system but I know of no database system that loses data much less expects to lose data just from reindexing.&nbsp;
The bottom line on corruption is that any data storage system can be damaged. I have had non-recoverable data loss in my&nbsp;PMail system. However, anecdotal as it may be, in fifteen&nbsp;years of using&nbsp;Sybase SQL databases and ten with MySQL, on both Unix and Windows systems, I have never lost any data due to corruption.&nbsp;Perhaps I am lucky but I have never&nbsp;even suffered a corrupted&nbsp;MySQL index.&nbsp; Of course I have suffered data loss with other systems. In the mid-80's I wrote and marketed a&nbsp;database engine for&nbsp;Pascal programmers that&nbsp;created and used&nbsp;dBASE II and III database files, in part because dBASE itself frequently corrupted databases.&nbsp;&nbsp;But dBASE is ancient history and we are way past that horror. Regardless of the system,&nbsp;I believe that the BEST defense is to do regular backups and while I suppose it is debatable, I think it is easier to backup a single file instead of&nbsp;thousands of files.&nbsp; </P>
<P>I can't say what most databases do but I suggested using&nbsp;MySQL&nbsp;because it is reliable and supports a large number of data types including variable-length columns to conserve disk space. The docs say the&nbsp;largest text-blob column type can store 4,294,967,295 bytes, however that requires a 64-bit system and lots of memory.&nbsp;They say that the practical limits for the rest of us are "around some hundreds of megs per BLOB".&nbsp;&nbsp;Since it supports full-text indexing of text blobs I believe it can handle the search requirements. It even lets you search and then sort the results by a numeric relevance score that lets you display the results the way web&nbsp;search-engines do, with the "best" results first.&nbsp; Full-text searching can be done utilizing just the indexes so it can be&nbsp;fast.</P>
<P>Yes, the&nbsp;biggest problem I have with PMail is the search speed.&nbsp;&nbsp;&nbsp;I have a relatively fast dual-core&nbsp;computer with 2GB of memory and a large and reasonably fast SATA disk but sometimes when searching my entire email archive I might as well go get a cup of coffee because it is pointless to stare at the screen&nbsp;hoping it will finish.&nbsp;&nbsp;In contrast,&nbsp;I am used to using&nbsp;database applications that I develop using MySQL which are amazingly fast.&nbsp; </P>
<P>Here is an example.&nbsp;&nbsp;I am in San Francisco and the&nbsp;slowest computer I have is an older&nbsp;single Celeron box with 256MB of memory running LAMP (Linux/Apache/MySQL/PHP)&nbsp;as a&nbsp;web server, hosted at a server farm in Texas.&nbsp;It runs&nbsp;eight&nbsp;database driven web sites.&nbsp; On one of those sites&nbsp;I have a web page that produces a report of hourly usage stats for six&nbsp;aircraft over 31 contiguous days.&nbsp; To do that it executes&nbsp;190 separate&nbsp;SQL queries with table joins, sorting of result tables, etc. against a set of tables&nbsp;some of which have 50,000 records. True, it isn't a&nbsp;very large database but a request for that page takes just 735ms round-trip. That includes the time it takes to execute the program that issues the queries and assembles the results and formats it all with HTML,&nbsp;AND the overhead of sending the request and response over the net.&nbsp; I just tested it using Firefox and the YSlow plug-in performance analysis tool and that was the worst time it recorded.&nbsp;&nbsp;I probably could have restructured it to do fewer queries, and be even faster,&nbsp;but why bother? &nbsp;</P>
<P>All I am trying to&nbsp;demonstrate here&nbsp;is that MySQL&nbsp;is fast and I&nbsp;expect that it could produce dramatically fast results searching email messages.&nbsp;&nbsp;Of course, different applications, different data sets and schemas and different platforms will have different performance.&nbsp; For instance, to use MySQL for Windows with&nbsp;Borland Delphi produced programs, as I&nbsp;do, requires the use of&nbsp;ODBC. Consequently MySQL isn't as fast as Sybase SQL-Anywhere on Windows because SQL-Anywhere is optimized for ODBC and uses ODBC internally.&nbsp;</P>
<P>Only you can determine the best solution.&nbsp; All I can do is make suggestions based on my experience.</P>
<P>Thanks for listening.</P>
<P>--Richard</P>
<P mce_keep="true">&nbsp;</P>
<P mce_keep="true">&nbsp;</P>
<P mce_keep="true">&nbsp;</P>