New, faster message database needed

NTxLS

posted Sep 13 '07 at 6:08 pm

Again, Not a Programmer but have done some Partitioning and changing the 'block sizes' can be of some use. For these system operations and some of the third party software some 'block sizes' should be rather large to prevent fragmenting while for storage of e-mail might be better set 'block size' to 2kbytes because of the smaller message sizes. How about setting a partition, or drive, for e-mail only with the smaller or smallest 'block size'? Then those other Partitions or dirves set to a larger size to help with the de-fragmentation of our larger files. Just an idea, bad maybe but just thought of this and plan on testing it for myself, that is if no-one has any ideas that may conflict.

Thank you for reading my ranting,

Again, Not a Programmer but have done some Partitioning and changing the 'block sizes' can be of some use. For these system operations and some of the third party software some 'block sizes' should be rather large to prevent fragmenting while for storage of e-mail might be better set 'block size' to 2kbytes because of the smaller message sizes. How about setting a partition, or drive, for e-mail only with the smaller or smallest 'block size'? Then those other Partitions or dirves set to a larger size to help with the de-fragmentation of our larger files. Just an idea, bad maybe but just thought of this and plan on testing it for myself, that is if no-one has any ideas that may conflict. Thank you for reading my ranting,

RichardXeon

posted Sep 2 '07 at 11:36 pm

I have used PMail for many years and use and appreciate many of the fine and innovative features however I have always had a problem with its slow search speed. As a programmer and database specialist I suspect that the primary problem is the use of an outdated and extremely inefficient database for storing messages. PMail is the only application I have which still uses the file system to store data the way PMail does. The huge number of files and confusing filenames and extensions is awkward and inconvenient. It is extremely slow in searching large folders of messages. It is also slow deleting messages because it has to rewrite entire collections of messages and fails if there is insufficient disk space to create the new version of the file. Some operations cause data such as annotations, color coding, locking, etc. to be lost. This is totally unnecessary and could be avoided by using a properly designed database.

I realize it is a non-trivial request but PMail could be orders of magnitude faster and could have numerous additional features if an SQL database was used. MySQL is an open source Windows compatible database that could be used for this purpose. It is very fast, reliable and stable. MySQL supports full-text indexing and blobs for storing text or binary data.

Storing messages in a true SQL database would allow new methods of categorization including associating a message with multiple folders without duplication. All messages could be stored in a single table permitting both folder style grouping, sorting,and searching as well as full system searches and any sub-set in between, This would also make backing up, restoring, and transporting the message database faster and simpler.

I hope some consideration for implementing a fast database will be considered.

--Richard

I have used PMail for many years and use and appreciate many of the fine and innovative features however I have always had a problem with its slow search speed. As a programmer and database specialist I suspect that the primary problem is the use of an outdated and extremely inefficient database for storing messages.&nbsp; PMail is the only application I have which still uses the file system to store data&nbsp;the way PMail does.&nbsp; The huge number of files and confusing filenames and extensions is awkward and inconvenient.&nbsp;&nbsp;It is extremely slow in searching large folders of messages. It is also&nbsp;slow deleting&nbsp;messages because it has to rewrite&nbsp;entire collections of messages and fails if there is insufficient disk space to create the new version of the file. Some operations cause data such as annotations, color coding, locking, etc. to be lost.&nbsp; This is totally unnecessary and could be avoided by using a properly designed database.&nbsp; I realize it is a non-trivial request but PMail could be orders of magnitude faster and could have numerous additional features if an SQL database was used.&nbsp; MySQL is an open source Windows compatible database that could be used for this purpose.&nbsp; It is very fast, reliable and&nbsp;stable.&nbsp; MySQL supports full-text indexing and blobs for storing text or binary data.&nbsp; Storing messages in a true SQL database would allow new methods of categorization including&nbsp;associating a message with multiple folders without duplication. All messages could be stored in a single table permitting both folder style grouping, sorting,and searching as well as&nbsp;full system searches and&nbsp;any sub-set&nbsp;in between, &nbsp;This would also make backing up, restoring, and transporting&nbsp;the message database faster and simpler. I hope some consideration for implementing a fast database will be considered.&nbsp; --Richard &nbsp;

David Harris

posted Sep 3 '07 at 3:59 am

This message raises a number of issues. First though, I have to correct some misapprehensions. Pegasus Mail is *far* from the only mail program to use monolithic folder file systems - Outlook Express, Thunderbird, AppleMail and Eudora ALL use very similar folder formats to Pegasus Mail - and indeed, Pegasus Mail's is rather more optimized than some of the others for the majority of operations. Also, the only operation that "causes data such as annotations etc to be lost" is the folder reindexing operation, which is a maintenance tool and necessarily has side-effects: it should almost never have to be used, and when it does, a small amount of data loss is inevitable.

The separate-folder vs database debate regarding mail folders is not new - it has been going on for many, many years. I have consistently come down on the side of using optimized folder formats instead of databases for a number of reasons:

If a database-based foldering system becomes damaged, there is a significant risk that ALL mail will be lost, and the best case is almost always that a significant amount of mail is lost. Using specific formats limits the damage to a single folder, and often to a single message within that folder. This is the overriding reason why I am reluctant to use database-based folder formats. Seventeen years experience has shown me that folder damage is inevitable - it cannot be avoided, and hence must be minimized as much as possible.
Most databases are not well optimized to handle the range of data sizes that mail can have: very few databases cope well with data where the range of sizes goes from 1K to 100MB or more within the same space.
Most databases implement large objects as blobs, and in many implementations these cannot be searched. Furthermore, they are very slow to parse and unravel (they have to be copied from the database to a disk where their MIME structure can be unravelled as required).
Database formats change, and this can create problems for data that might have to span many years. By contrast, folders that I created in Pegasus Mail back in 1990 can still be read and manipulated by current versions of the program, because I have control of the code.

Don't get me wrong here - I'm not saying that database-based foldering doesn't have some advantages, but I am convinced that they are seriously outweighed by the disadvantages.

In this case, though, I believe you are confusing two separate ideas. You are saying that the foldering format is generally inefficient, which it plainly isn't: I believe that what you actually mean is simply that searching is slow, which is a completely different issue, and does not necessarily have anything to do with the foldering format.

Now, I agree that Pegasus Mail's folder searches can be slow in many cases, but there are certain optimizations that vastly improve searching speed - in particular, searching only in headers, and using constraints-only searches. A constraints-only search (see the help on this - basically, you just specify a search based on the constraints fields, using no specific search term) is as fast as a search can possibly go, because it is done entirely in memory on the preloaded index structure.

That said, it would be useful if general searching could be made faster than it is, and I am working on that idea. What I'll probably end up doing is implementing a keyword database (probably using sqlite, because it's already there) and allowing you to mark certain folders as "indexed". Indexed folders will have the messages added to them analyzed and added to the keyword database, and a new search type will allow you to search on that database, making for much faster simple searching. The general form of the underlying folder structure, however, is very unlikely to change, simply because the way it works now is (in my opinion) the right way for a mail application to operate.

Cheers!

-- David --

This message raises a number of issues. First though, I have to correct some misapprehensions. Pegasus Mail is *far* from the only mail program to use monolithic folder file systems - Outlook Express, Thunderbird, AppleMail and Eudora ALL use very similar folder formats to Pegasus Mail - and indeed, Pegasus Mail's is rather more optimized than some of the others for the majority of operations. Also, the only operation that "causes data such as annotations etc to be lost" is the folder reindexing operation, which is a maintenance tool and necessarily has side-effects: it should almost never have to be used, and when it does, a small amount of data loss is inevitable. The separate-folder vs database debate regarding mail folders is not new - it has been going on for many, many years. I have consistently come down on the side of using optimized folder formats instead of databases for a number of reasons: <ul><li>If a database-based foldering system becomes damaged, there is a significant risk that ALL mail will be lost, and the best case is almost always that a significant amount of mail is lost. Using specific formats limits the damage to a single folder, and often to a single message within that folder. This is the overriding reason why I am reluctant to use database-based folder formats. Seventeen years experience has shown me that folder damage is inevitable - it cannot be avoided, and hence must be minimized as much as possible.</li><li>Most databases are not well optimized to handle the range of data sizes that mail can have: very few databases cope well with data where the range of sizes goes from 1K to 100MB or more within the same space.</li><li>Most databases implement large objects as blobs, and in many implementations these cannot be searched. Furthermore, they are very slow to parse and unravel (they have to be copied from the database to a disk where their MIME structure can be unravelled as required).</li><li>Database formats change, and this can create problems for data that might have to span many years. By contrast, folders that I created in Pegasus Mail back in 1990 can still be read and manipulated by current versions of the program, because I have control of the code.</li></ul>Don't get me wrong here - I'm not saying that database-based foldering doesn't have some advantages, but I am convinced that they are seriously outweighed by the disadvantages. In this case, though, I believe you are confusing two separate ideas. You are saying that the foldering format is generally inefficient, which it plainly isn't: I believe that what you actually mean is simply that searching is slow, which is a completely different issue, and does not necessarily have anything to do with the foldering format. Now, I agree that Pegasus Mail's folder searches can be slow in many cases, but there are certain optimizations that vastly improve searching speed - in particular, searching only in headers, and using constraints-only searches. A constraints-only search (see the help on this - basically, you just specify a search based on the constraints fields, using no specific search term) is as fast as a search can possibly go, because it is done entirely in memory on the preloaded index structure. That said, it would be useful if general searching could be made faster than it is, and I am working on that idea. What I'll probably end up doing is implementing a keyword database (probably using sqlite, because it's already there) and allowing you to mark certain folders as "indexed". Indexed folders will have the messages added to them analyzed and added to the keyword database, and a new search type will allow you to search on that database, making for much faster simple searching. The general form of the underlying folder structure, however, is very unlikely to change, simply because the way it works now is (in my opinion) the right way for a mail application to operate. Cheers! -- David --

RichardXeon

posted Sep 3 '07 at 12:44 pm

David,

Thank you for your quick response. First let me clarify a couple of things. I didn't say or suggest that PMail is the only mail program to use monolithic folder file systems. I said it is the only PROGRAM that I have that still uses that method. The only email program I have is PMail. I prefer it over all the others I have tried over the years. Also, I have had to reindex PMail folders on numerous occasions, sometimes without success. It is painful. I'm sorry but I can't agree that any data loss from reindexing is necessary, inevitable, or acceptable. Clearly it is inevitable with the current PMail file system but I know of no database system that loses data much less expects to lose data just from reindexing.

The bottom line on corruption is that any data storage system can be damaged. I have had non-recoverable data loss in my PMail system. However, anecdotal as it may be, in fifteen years of using Sybase SQL databases and ten with MySQL, on both Unix and Windows systems, I have never lost any data due to corruption. Perhaps I am lucky but I have never even suffered a corrupted MySQL index. Of course I have suffered data loss with other systems. In the mid-80's I wrote and marketed a database engine for Pascal programmers that created and used dBASE II and III database files, in part because dBASE itself frequently corrupted databases. But dBASE is ancient history and we are way past that horror. Regardless of the system, I believe that the BEST defense is to do regular backups and while I suppose it is debatable, I think it is easier to backup a single file instead of thousands of files.

I can't say what most databases do but I suggested using MySQL because it is reliable and supports a large number of data types including variable-length columns to conserve disk space. The docs say the largest text-blob column type can store 4,294,967,295 bytes, however that requires a 64-bit system and lots of memory. They say that the practical limits for the rest of us are "around some hundreds of megs per BLOB". Since it supports full-text indexing of text blobs I believe it can handle the search requirements. It even lets you search and then sort the results by a numeric relevance score that lets you display the results the way web search-engines do, with the "best" results first. Full-text searching can be done utilizing just the indexes so it can be fast.

Yes, the biggest problem I have with PMail is the search speed. I have a relatively fast dual-core computer with 2GB of memory and a large and reasonably fast SATA disk but sometimes when searching my entire email archive I might as well go get a cup of coffee because it is pointless to stare at the screen hoping it will finish. In contrast, I am used to using database applications that I develop using MySQL which are amazingly fast.

Here is an example. I am in San Francisco and the slowest computer I have is an older single Celeron box with 256MB of memory running LAMP (Linux/Apache/MySQL/PHP) as a web server, hosted at a server farm in Texas. It runs eight database driven web sites. On one of those sites I have a web page that produces a report of hourly usage stats for six aircraft over 31 contiguous days. To do that it executes 190 separate SQL queries with table joins, sorting of result tables, etc. against a set of tables some of which have 50,000 records. True, it isn't a very large database but a request for that page takes just 735ms round-trip. That includes the time it takes to execute the program that issues the queries and assembles the results and formats it all with HTML, AND the overhead of sending the request and response over the net. I just tested it using Firefox and the YSlow plug-in performance analysis tool and that was the worst time it recorded. I probably could have restructured it to do fewer queries, and be even faster, but why bother?

All I am trying to demonstrate here is that MySQL is fast and I expect that it could produce dramatically fast results searching email messages. Of course, different applications, different data sets and schemas and different platforms will have different performance. For instance, to use MySQL for Windows with Borland Delphi produced programs, as I do, requires the use of ODBC. Consequently MySQL isn't as fast as Sybase SQL-Anywhere on Windows because SQL-Anywhere is optimized for ODBC and uses ODBC internally.

Only you can determine the best solution. All I can do is make suggestions based on my experience.

Thanks for listening.

--Richard

David, Thank you for your quick response.&nbsp; First let me clarify a couple of things.&nbsp; I didn't say or suggest that PMail is the only mail program to use&nbsp;monolithic folder file systems. I said it is the only&nbsp;PROGRAM that I have that still uses that method.&nbsp; The only email program I&nbsp;have&nbsp;is PMail.&nbsp; I prefer it over all the others I have tried over the years. &nbsp;Also, I have had to reindex PMail folders on numerous occasions, sometimes without success.&nbsp; It is painful.&nbsp; I'm sorry but I can't agree that any data loss from reindexing is necessary,&nbsp; inevitable, or acceptable. Clearly it is inevitable with the current PMail file system but I know of no database system that loses data much less expects to lose data just from reindexing.&nbsp; The bottom line on corruption is that any data storage system can be damaged. I have had non-recoverable data loss in my&nbsp;PMail system. However, anecdotal as it may be, in fifteen&nbsp;years of using&nbsp;Sybase SQL databases and ten with MySQL, on both Unix and Windows systems, I have never lost any data due to corruption.&nbsp;Perhaps I am lucky but I have never&nbsp;even suffered a corrupted&nbsp;MySQL index.&nbsp; Of course I have suffered data loss with other systems. In the mid-80's I wrote and marketed a&nbsp;database engine for&nbsp;Pascal programmers that&nbsp;created and used&nbsp;dBASE II and III database files, in part because dBASE itself frequently corrupted databases.&nbsp;&nbsp;But dBASE is ancient history and we are way past that horror. Regardless of the system,&nbsp;I believe that the BEST defense is to do regular backups and while I suppose it is debatable, I think it is easier to backup a single file instead of&nbsp;thousands of files.&nbsp; I can't say what most databases do but I suggested using&nbsp;MySQL&nbsp;because it is reliable and supports a large number of data types including variable-length columns to conserve disk space. The docs say the&nbsp;largest text-blob column type can store 4,294,967,295 bytes, however that requires a 64-bit system and lots of memory.&nbsp;They say that the practical limits for the rest of us are "around some hundreds of megs per BLOB".&nbsp;&nbsp;Since it supports full-text indexing of text blobs I believe it can handle the search requirements. It even lets you search and then sort the results by a numeric relevance score that lets you display the results the way web&nbsp;search-engines do, with the "best" results first.&nbsp; Full-text searching can be done utilizing just the indexes so it can be&nbsp;fast. Yes, the&nbsp;biggest problem I have with PMail is the search speed.&nbsp;&nbsp;&nbsp;I have a relatively fast dual-core&nbsp;computer with 2GB of memory and a large and reasonably fast SATA disk but sometimes when searching my entire email archive I might as well go get a cup of coffee because it is pointless to stare at the screen&nbsp;hoping it will finish.&nbsp;&nbsp;In contrast,&nbsp;I am used to using&nbsp;database applications that I develop using MySQL which are amazingly fast.&nbsp; Here is an example.&nbsp;&nbsp;I am in San Francisco and the&nbsp;slowest computer I have is an older&nbsp;single Celeron box with 256MB of memory running LAMP (Linux/Apache/MySQL/PHP)&nbsp;as a&nbsp;web server, hosted at a server farm in Texas.&nbsp;It runs&nbsp;eight&nbsp;database driven web sites.&nbsp; On one of those sites&nbsp;I have a web page that produces a report of hourly usage stats for six&nbsp;aircraft over 31 contiguous days.&nbsp; To do that it executes&nbsp;190 separate&nbsp;SQL queries with table joins, sorting of result tables, etc. against a set of tables&nbsp;some of which have 50,000 records. True, it isn't a&nbsp;very large database but a request for that page takes just 735ms round-trip. That includes the time it takes to execute the program that issues the queries and assembles the results and formats it all with HTML,&nbsp;AND the overhead of sending the request and response over the net.&nbsp; I just tested it using Firefox and the YSlow plug-in performance analysis tool and that was the worst time it recorded.&nbsp;&nbsp;I probably could have restructured it to do fewer queries, and be even faster,&nbsp;but why bother? &nbsp; All I am trying to&nbsp;demonstrate here&nbsp;is that MySQL&nbsp;is fast and I&nbsp;expect that it could produce dramatically fast results searching email messages.&nbsp;&nbsp;Of course, different applications, different data sets and schemas and different platforms will have different performance.&nbsp; For instance, to use MySQL for Windows with&nbsp;Borland Delphi produced programs, as I&nbsp;do, requires the use of&nbsp;ODBC. Consequently MySQL isn't as fast as Sybase SQL-Anywhere on Windows because SQL-Anywhere is optimized for ODBC and uses ODBC internally.&nbsp; Only you can determine the best solution.&nbsp; All I can do is make suggestions based on my experience. Thanks for listening. --Richard &nbsp; &nbsp; &nbsp;

torstenrox

posted Sep 5 '07 at 4:43 pm

One question I have: with the only exception of Sylpheed/Claws (quite spartanic for business use), KMail (Linux) and a rather outdated programm called AK-Mail (I do not know the Outlook file format) all mail programs do have a kind of mbx-files (and this means sometimes very large files) instead of the simple principle one mail - one file (sometimes it is called mdir format).

Wouldn't this be something leading to a much more stable and faster solution (PMail is stable of course and rather fast compared to others, of course)?

Apologize my bad terminology, I am an user, not a coder.

One question I have: with the only exception of Sylpheed/Claws (quite spartanic for business use), KMail (Linux) and a rather outdated programm called AK-Mail (I do not know the Outlook file format) all mail programs do have a kind of mbx-files (and this means sometimes very large files) instead of the simple principle one mail - one file (sometimes it is called mdir format). Wouldn't this be something leading to a much more stable and faster solution (PMail is stable of course and rather fast compared to others, of course)?Apologize my bad terminology, I am an user, not a coder.

David Harris

posted Sep 6 '07 at 6:20 am

In its original form, Pegasus Mail used an mdir-like message store. A "Folder" was simply a directory containing files with the extension .CNR, each file being a separate message. You can still see occasional vestiges of this in the program - in rare cases, the program still looks for or uses the old extension to retain backwards compatibility.

The problem with this approach, as I fairly quickly discovered, is that it's very slow and horrendously wasteful. Pegasus Mail is widely-used in server-based or multiuser environments, and having a file per message quickly results in huge file systems, which slow the machine down and consume memory and disk space at quite a rate. As well, most operating systems allocate disk space in blocks of 4KB or more, and with most messages being around the 1 - 1.5KB mark in size, a huge amount of disk space gets wasted this way. By way of comparison, a monolithic folder file with 1000 messages will typically only waste around 2KB in total (half a block), where 1000 separate messages will waste around 2MB. Once there are many messages involved in the equation, factors like this start getting very important, even in these days of Terabyte drives.

The mdir-style approach has some significant advantages to it, though: it's much easier to share folders when they're in this format, and operations like deletion are trivially easy compared with monolithic foldering formats. For this reason, Pegasus Mail uses this mechanism for public folders, but as anyone who has tried it will tell you, once a public folder has more than about 2000 messages in it, it starts getting quite slow to open.

Cheers!

-- David --

In its original form, Pegasus Mail used an mdir-like message store. A "Folder" was simply a directory containing files with the extension .CNR, each file being a separate message. You can still see occasional vestiges of this in the program - in rare cases, the program still looks for or uses the old extension to retain backwards compatibility. The problem with this approach, as I fairly quickly discovered, is that it's very slow and horrendously wasteful. Pegasus Mail is widely-used in server-based or multiuser environments, and having a file per message quickly results in huge file systems, which slow the machine down and consume memory and disk space at quite a rate. As well, most operating systems allocate disk space in blocks of 4KB or more, and with most messages being around the 1 - 1.5KB mark in size, a huge amount of disk space gets wasted this way. By way of comparison, a monolithic folder file with 1000 messages will typically only waste around 2KB in total (half a block), where 1000 separate messages will waste around 2MB. Once there are many messages involved in the equation, factors like this start getting very important, even in these days of Terabyte drives. The mdir-style approach has some significant advantages to it, though: it's much easier to share folders when they're in this format, and operations like deletion are trivially easy compared with monolithic foldering formats. For this reason, Pegasus Mail uses this mechanism for public folders, but as anyone who has tried it will tell you, once a public folder has more than about 2000 messages in it, it starts getting quite slow to open. Cheers! -- David --

Barius

posted Sep 11 '07 at 1:24 am

I have seen this same topic come up several times (I brought it up myself once). I'm not sure of Dave's background in SQL databases, but like RichardXeon I am an SQL programmer and I can assure you that they are stable and far faster than any filesystem based solution (even flat-file driven forms of SQL like SQLite can be orders of magnitude faster). They also present some very beneficial advancements like the built-in capability of handling multiple parallel requests safely.

Dave's points are serious issues though. However, I would like to try to break them down a bit:

- Significant risk of losing all data upon corruption: Modern SQL dbs are highly stable and even many of the light-weight engines (e.g. SQLite) have journaling built-in (something Pegasus could really use as I believe most issues people have are related to system instability). Moreover, creating a backup is an operation that is typically supported internally to the database engine, and is thus trivially easy to implement (re: the 'dump' command). Also consider that backups can be compressed to make them more size-friendly. Most people I know keep backups of their email folders anyways, so space is rarely the primary consideration these days.

- Most databases are not well optimized for wide variations of data size: This is mostly based on your choice of storage method. However, engines like SQLite were specifically designed for 'general purpose' storage and are quite well optimized for varying data sizes. The fact that emails are generally 1-1.5k is not really relevant once you have more than a few of them, the database engine takes care of packing them efficiently. Also, it's interesting to note that while Pegasus seems to have an issue with .pmm files > 2Gb on Windows, SQLite is capable of storing many Gbs without a problem.

- Usage of blobs: I'm not sure what dbs Dave is refering too. To my knowledge, in the world of SQL (Postgres, MySQL, MSSQL) this statement hasn't been true for quite some time. These days using blobs is a choice made by the programmer at the time of table creation. If you want to put an arbitrary amount of text in, and get text back out that's not a problem. Further, depending on the choice of engine (some 'lite' engines will not support this), you can do regular expression queries in the SQL itself, allowing the engine to optimize and process it internally. FWIW, SQLite can be easily extended to allow regexp queries, but it would require a custom function that would likely be platform dependent (one implementation for Windoze, another for *nixes).

- DB formats change: Relying on a database engine does mean that you are relying on someone else's product, something that can be bad for backwards compatibility. I think Dave is right to think long and hard about this. However, I'm not of the opinion that this outweighs the potentials gains. All engines I've ever used have provided a straight-forward mechanism to update older formats.

In my opinion, the best reason not to consider a database is simply that it is not human readable (i.e. ASCII). Even so, many SQL engines are Free/free making the barrier to access low.

* Here's one example of an extended SQLite engine with enhancement for REGEXP (Windoze only): http://sqlite.phxsoftware.com/forums/t/348.aspx

I have seen this same topic come up several times (I brought it up myself once).&nbsp; I'm not sure of Dave's background in SQL databases, but like RichardXeon I am an SQL programmer and I can assure you that they are stable and far faster than any filesystem based solution (even flat-file driven forms of SQL like SQLite can be orders of magnitude faster).&nbsp; They also present some very beneficial advancements like the built-in capability of handling multiple parallel requests safely. Dave's points are serious issues though.&nbsp; However, I would like to try to break them down a bit:- Significant risk of losing all data upon corruption:&nbsp; Modern SQL dbs are highly stable and even many of the light-weight engines (e.g. SQLite) have journaling built-in (something Pegasus could really use as I believe most issues people have are related to system instability).&nbsp; Moreover, creating a backup is an operation that is typically supported internally to the database engine, and is thus trivially easy to implement (re: the 'dump' command).&nbsp; Also consider that backups can be compressed to make them more size-friendly.&nbsp; Most people I know keep backups of their email folders anyways, so space is rarely the primary consideration these days. - Most databases are not well optimized for wide variations of data size:&nbsp; This is mostly based on your choice of storage method.&nbsp; However, engines like SQLite were specifically designed for 'general purpose' storage and are quite well optimized for varying data sizes.&nbsp; The fact that emails are generally 1-1.5k is not really relevant once you have more than a few of them, the database engine takes care of packing them efficiently.&nbsp; Also, it's interesting to note that while Pegasus seems to have an issue with .pmm files &gt; 2Gb on Windows, SQLite is capable of storing many Gbs without a problem. - Usage of blobs:&nbsp; I'm not sure what dbs Dave is refering too.&nbsp; To my knowledge, in the world of SQL (Postgres, MySQL, MSSQL) this statement hasn't been true for quite some time.&nbsp; These days using blobs is a choice made by the programmer at the time of table creation.&nbsp; If you want to put an arbitrary amount of text in, and get text back out that's not a problem.&nbsp; Further, depending on the choice of engine (some 'lite' engines will not support this), you can do regular expression queries in the SQL itself, allowing the engine to optimize and process it internally.&nbsp; FWIW, SQLite can be easily extended to allow regexp queries, but it would require a custom function that would likely be platform dependent (one implementation for Windoze, another for *nixes). - DB formats change:&nbsp; Relying on a database engine does mean that you are relying on someone else's product, something that can be bad for backwards compatibility.&nbsp; I think Dave is right to think long and hard about this.&nbsp; However, I'm not of the opinion that this outweighs the potentials gains.&nbsp; All engines I've ever used have provided a straight-forward mechanism to update older formats. In my opinion, the best reason not to consider a database is simply that it is not human readable (i.e. ASCII).&nbsp; Even so, many SQL engines are Free/free making the barrier to access low. &nbsp;* Here's one example of an extended SQLite engine with enhancement for REGEXP (Windoze only): http://sqlite.phxsoftware.com/forums/t/348.aspx&nbsp;

Nighthawk

posted Sep 11 '07 at 2:50 am

The thing I like with the format at the moment is when a single email comes in and is curpt or causes problems with PMail for what ever reason (though now theya re far and few between) however it is easier to just delete the single file using windoze explore... that is causing pmail to crash and causing the problems

But I will agree that it is slow to search the way I speed things up is by sorting the mail properly and keeping to a preset sorting creita (everything in its place) when it comes in. Then when doing the search limiting the search to the correct folders becuse I have a better idea where the email should be...

I am looking at my mail folder now.. 9gb (big because of artwork attachments) and 7 years worth of emails and orders, quotes etc... I would not want to loose all that if the single database file becomes corupt... its also eailer for me to back this up across a couple of dvd's with out spliting a single file and then hoping it gets put back together agian correctly and all that...

On saying designing a few databases myself with FileMaker I can see the argument for both sides of the issue.

The thing I like with the format at the moment is when a single email comes in and is curpt or causes problems with PMail for what ever reason (though now theya re far and few between) however it is easier to just delete the single file using windoze explore... that is causing pmail to crash and causing the problems But I will agree that it is slow to search the way I speed things up is by sorting the mail properly and keeping to a preset sorting creita&nbsp;(everything in its place) when it comes in. Then when doing the search limiting the search to the correct folders becuse I have a better idea where the email should be... I am looking at my mail folder now.. 9gb (big because of artwork attachments) and 7 years worth of emails and orders, quotes etc... I would not want to loose all that if the single database file becomes corupt...&nbsp; its also eailer for me to back this up across a couple of dvd's with out spliting&nbsp;a single file and then hoping it gets put back together agian correctly and all that...&nbsp; On saying designing a few databases myself with FileMaker I can see the argument for both sides of the issue.

NTxLS

posted Sep 13 '07 at 5:52 pm

Not involved in programming may I jump in here?

Do NOT call this an 'argument', use the term "discussion" the 'arg' implies a total disagreement where the 'discus' is showing each others point of view and why. In my case it is better to share than to attempt to 'hammer' at these points.

Those that are for the SQL or any other, why not do a little experiment with your method of programming keeping the files the way they are and see if you can improve performance along with safety of 'data' without changing any part of PMail. If you can then use that method of searching for your info and share it with Mr. Harris as examples or demonstrations. You may find that cooperating with each other works much better than re-hashing. Plus David can have more of his valuable time for his projects. Do NOT get me wrong, not attempting to short circuit this exchange just making a suggestion.

Thank you for reading my ramblings,

Not involved in programming may I jump in here? Do NOT call this an 'argument', use the term "discussion" the 'arg' implies a total disagreement where the 'discus' is showing each others point of view and why. In my case it is better to share than to attempt to 'hammer' at these points. Those that are for the SQL or any other, why not do a little experiment with your method of programming keeping the files the way they are and see if you can improve performance along with safety of 'data' without changing any part of PMail. If you can then use that method of searching for your info and share it with Mr. Harris as examples or demonstrations. You may find that cooperating with each other works much better than re-hashing. Plus David can have more of his valuable time for his projects. Do NOT get me wrong, not attempting to short circuit this exchange just making a suggestion. Thank you for reading my ramblings,

Related Topics

Pending draft

Confirm move posts

Insufficient permissions

Select a different topic

Edit history