For the professional programmers out there ..
I started working on a user file for my telnet BBS. The file is organized as fixed length records, so random access to a specific record number is easy. I would like to be able to handle somewhere between 500 and 1000 records efficiently. (Assume a record is around 128 bytes.)
I am worried about the time it takes to search for a record. For example, when adding a user one must ensure that the name is not already taken. The naive method is to scan the entire file. With a few entries this is not bad, but if the number of entries was in the 100s then this would be noticeable. Especially with a floppy drive.
One solution is to implement some sort of index to organize the file in alphabetical order. That would let you do a binary search of the records, except that could really be poor because of the seeking back and forth through the file. For a small number of records it would make sense to table scan, but for a larger number of records it might make sense to go this route. Having the name and record number in the index would cut down on the disk seeks by allowing you to determine if a name was in use or not just by reading the index. Unfortunately, that would take a lot of space in memory if I really did get 500 users.
I'd really like any solution to not store data in memory, so the index in memory is probably not going to work. Memory is great, but it's not scalable. Even just storing names and record numbers in memory can consume more memory than I want to devote to this function. Having an index purely on disk means 2x the number of seeks and reads, and adding new entries requires rewriting the index file. That is better than rewriting the whole user file, but still expensive when TCP/IP packets are flying in and out on multiple connections.
I've been thinking about hybrid techniques too, like where the file is reorganized in alphabetical order nightly. That would allow one to use a binary search on the file, and then use a separate table scan for new users.
Thoughts?
I started working on a user file for my telnet BBS. The file is organized as fixed length records, so random access to a specific record number is easy. I would like to be able to handle somewhere between 500 and 1000 records efficiently. (Assume a record is around 128 bytes.)
I am worried about the time it takes to search for a record. For example, when adding a user one must ensure that the name is not already taken. The naive method is to scan the entire file. With a few entries this is not bad, but if the number of entries was in the 100s then this would be noticeable. Especially with a floppy drive.
One solution is to implement some sort of index to organize the file in alphabetical order. That would let you do a binary search of the records, except that could really be poor because of the seeking back and forth through the file. For a small number of records it would make sense to table scan, but for a larger number of records it might make sense to go this route. Having the name and record number in the index would cut down on the disk seeks by allowing you to determine if a name was in use or not just by reading the index. Unfortunately, that would take a lot of space in memory if I really did get 500 users.
I'd really like any solution to not store data in memory, so the index in memory is probably not going to work. Memory is great, but it's not scalable. Even just storing names and record numbers in memory can consume more memory than I want to devote to this function. Having an index purely on disk means 2x the number of seeks and reads, and adding new entries requires rewriting the index file. That is better than rewriting the whole user file, but still expensive when TCP/IP packets are flying in and out on multiple connections.
I've been thinking about hybrid techniques too, like where the file is reorganized in alphabetical order nightly. That would allow one to use a binary search on the file, and then use a separate table scan for new users.
Thoughts?