Difference between revisions of "MOSS/Project Ideas"

Revision as of 01:14, 5 April 2011

The initial list was generated over the lengthy period of MOSS's initial development. These project ideas mostly cover missing functionality. None of the missing functionality is known to cause gameplay issues so these did not really seem like bugs. These ideas are grouped by general type of change required, and sort of by increasing approximate difficulty level and sort of by decreasing value.

Bigger ideas, such as entirely new functionality now made possible by the release of the client, could go here too. Be realistic, though. Remember that protocol changes are not to be made lightly, because it causes an incompatibility problem. It is probably better to propose large changes in discussion format on the forums. Do not expect someone else to take up and implement your proposal.

The whole idea of this list is to provide some ideas for anyone wishing to dive into MOSS. Sometimes a little direction helps.

Terminology

The terms "frontend" and "backend" are used throughout this document. If the meaning is not clear from context, the terminology is explained in the Developer Notes.

The "vault" is basically the database, though it could be abstractly thought of as a collection of trees of vault nodes. Generally when referring to it as a set of node trees, "vault" is written and when referring to it as something with tables and rows, "DB" is written. The "vault server" is that portion of the backend concerned with handling the Vault* messages and working with the database.

Fix bugs

The bug tracker is here: https://foundry.openuru.org/jira/browse/MOSS

Detail work (little Uru-specific knowledge required)

Try disabling the Nagle algorithm on sockets. This should result in less latency to the client when small messages are involved. Make sure to test the whole game thoroughly.

Figure out and implement a good way to use autoconf to compile in default pathnames. This would apply to the default config file, key file, moss_serv binary, and log/auth/game/file directory locations. The idea is if I say --prefix=/foo the defaults become /foo/etc/moss.cfg and so forth instead of ./etc/moss.cfg. Don't forget autoconf lets you override the etcdir, bindir, etc. separately.

Treat SDL filenames case-insensitively. This is very annoying in Unix, since it requires using readdir() and strcasecmp() instead of just open().

Make it so the game/state directory location can be separately configured. This is so the whole MOSS install can be read-only and just the log directory and game/state moved somewhere read-write. (The easy workaround is to symlink a writable directory to game/state.)

Log the PID of all forked servers in moss.log.

See if we can switch to libevent for inter-thread communication instead of raw signals. This should be fairly portable and take advantage of the different per-OS lightweight signalling mechanisms.

Allow skipping overlong messages (messages MOSS thinks are too big). This requires, upon catching the exception, tossing out the message, and if it has not all been read in yet, keeping track of how much more to toss in the future. The RC4 state must be updated too for the tossed message length.

It would be nice to use atomic operations for refcounts instead of having to use (bigger, slower) mutexes for them. This does make MOSS not architecture-independent, which is part of why I have been resisting it. (See also signal handling next.)

Signal handlers and the operations to read and reset the values in the main loops should be using atomic operations.

Catch and handle std::bad_alloc whenever possible.

Fairly contained changes

Make it so game servers can be forked as separate processes. This helps a lot for debugging and also would help for age development, making it easier to change SDL and retest. Otherwise you have to shut down the whole frontend server or wait out the age's linger time. This work was started but not completed, and that code can be found inside FORK_GAME_TOO #ifdefs. If you do this, I suggest you consider separating FORK_GAME_TOO from FORK_ENABLE so that one could have forked game servers and *not* forked auth/file servers.

In the getpublicagelist() SQL function, there is a comment about the 50-hood limit. The current code returns the first 50 hoods. We should return the most recently used 50 hoods. The hard part is knowing which those are. This is mostly if not wholly DB/SQL work.

Age population limits are unimplemented. The value is in the vault, but the server does not look at it.

Figure out how to throw out clones and SDL when a player leaves the age. Please see the "clean up clone, SDL" section in GameServer::conn_shutdown(). Right now this is full of special cases: it handles the Kemo bugs and and Quabs specially (a different thing for each). It may be that this is required for best behavior, but I still feel like I've missed something despite heavy scrutiny of the data. This is more of a research project and not much of a code project.

MOUL parsed SDL nodes on the way into the vault. It probably only updated the node with fields having the Dirty bit set. In addition it added timestamps to the fields because the client does not put in the timestamps (reasonably so, as its time may be wrong). MOSS currently saves the SDL wholesale. This basically works, but with the timestamps properly added, the game server can merge the vault, age, and global SDL in any order and without strange rules. Note: if you do this, you also need to write a script or some other way to update all the SDL nodes in existing MOSS DBs.

The kickable SDL filtering algorithm is the same one I put in Alcugs ages ago. It is better than nothing but not as good as Plasma's. You will see more warping of kickables than with Cyan's server. This project is to come up with a better way of deciding what physical SDL to send to all the clients, based on those messages all the clients are sending to the server.

The server is probably supposed to use RelevanceRegion information to avoid sending SDL updates to clients who don't share a relevance region with the originating client. (The idea being to reduce network traffic, since hopefully the client already avoids rendering/physics on things in irrelevant regions.) Things work if the server doesn't filter, so MOSS doesn't. Also, I do not have strong evidence to believe this supposition is actually correct. Which is the other reason MOSS doesn't do it. Finding out, and implementing it if so, would be a nice project.

Figure out what the server is supposed to do with plSetNetGroupIDMsg and do it. Right now MOSS just drops it to no apparent ill effect. I am deeply suspicious of this message due to seeing it at aurally unpleasant times in UU. (Old, old notes: "may have to do with infinitely looping scraping noises".)

kCli2Auth_SendFriendInviteRequest is not fully supported (it is dropped on the floor).

Bigger changes

The following messages are unimplemented.
- kCli2Auth_VaultSetSeen -- unimplemented in client
- kCli2Auth_ScoreDelete -- never saw one, so couldn't implement it
- kCli2Auth_ScoreSetPoints -- never saw one, so couldn't implement it
- kCli2Auth_ScoreGetRanks -- never saw one, so couldn't implement it
- kCli2Csr_PingRequest -- no recorded Csr messages
- kCli2Csr_RegisterRequest
- kCli2Csr_LoginRequest
- kCsr2Cli_PingReply
- kCsr2Cli_RegisterReply
- kCsr2Cli_LoginReply

Add hood randomization to DRC hoods. Player-created hoods are randomized by the Python in the Nexus when the hood is created. DRC hoods, however, are created by the server. In MOSS's case, they are created entirely by SQL code. So adding DRC hood randomization means noticing when the DB has created a new DRC hood, creating and modifying SDL in the backend, and then finding and modifying the age SDL node for that new hood. All before responding to the client creating the new avatar and thus the new DRC hood.

The MessageQueue class has a concept of priorities. The idea was to enable dropping of lower-priority messages if a queue to a given client starts to back up. (The lowest priority being voice chat, then avatar movements, with world state changes being highest.) The measurement of queue length or throughput necessary for that is not implemented. So, neither is the dropping. (There is the very beginnings of some if DO_PRIORITIES is #defined.) As such, queue users don't do a good job of priority-setting, although VOICE and FRONT are properly used. The idea wasn't to order the queue (except for FRONT), but to skip sending, say, a VOICE message when we encounter one, if the queue congestion measure says to.

Vault security

The MOSS vault is as insecure as the MOUL vault. Need I say more? Here are some starter thoughts.

MOSS should parse incoming SDL messages and sanity-check them. See also adding timestamps.
MOSS could probably verify other blob contents too.
Anyone can save anything to any node, and can fetch any node if they know its type and ID. We can deny these requests if the user shouldn't have access. Possibly the notifier field can help.
Much more, I'm sure. Better vault security is possible.

Eliminate excessive byteswapping of opaque server IDs in BackendMessages. This is to say, in the backend server, when reading in the messages, it byte-swaps the IDs to store them, only ever uses them in equality tests with other IDs, and then byte-swaps them back when sending new messages. (On a little-endian box there's no swapping so it does not cost much, but it's the principle of the matter.) Unfortunately this really needs to be tested on a big-endian machine.

Try changing the File server to do ackless downloads, too. Maybe only if there is only one download at a time, though.

Scalability issues

Performance and scalability are always interesting questions. And frequently ignored, because they only matter for larger installations.

These are ideas unimplemented because they only really matter for large shards. Large as in having, let's say, 50+ concurrent busy users. (50 people standing in the city chatting doesn't actually do much.) Maybe MOSS is okay with that number now, in which case "large" is more. It is entirely unknown.

Performance testing. Instrumentation. There should be a load generator to do this. This was not high on the list during development, with a user base of 3.

See the last paragraph in the comments at the top of FileTransaction.h about how multiple file transactions of the same file should share a single opened file.

Allow speculative pre-caching of AgeInfoNodes and PlayerInfoNodes when node changed notifications come from the vault server and are forwarded to any connected client. Cache in the auth server, vault server, or both. The goal is to reduce latency on those requests the client freezes for.

The backend protocol supports game servers requesting batches of GameMgr IDs. However, there is no code on either side to actually do so. The idea for this was that more populated ages, like the city, would benefit from doing batch requests. Every person who links in with a marker game needs a new global game ID, and it would be nice to cut out any extra backend round trips for links, even though this particular round trip does not block the link process (aside from being something the backend is doing when it could be doing something else).

Database performance could be measured and the SQL or even tables optimized.

MOSS currently uses a single blocking connection to the DB. Implement and investigate performance of multiple DB connections, or asynchronous ones. Maybe the backend server has to become threaded to gain the best advantage.

In a big shard, we may need to split the backend server work up. The backend protocol is designed to support separating backend work between threads. The "classes" of backend messages in a bitfield are meant as a way to quickly route a backend message to an appropriate backend thread or process.

Frontend/backend interactions

Add encryption to the backend protocol. This itself is not a big deal (though you have to pick a good key negotiation, and do not forget to consider authentication mechanisms), but it does mean all the BackendMessage fill_buffer() code paths would be newly exercised, so a lot of regression testing would be in order.

It would be much easier on the DB if updates to markers are batched by the game server. This especially applies to capturing markers. See the comments in the setmarkerto() and startmarkergame() SQL functions.

The backend protocol allows for frontends hosting ages to specify only a subset of ages that can be hosted there. This is unused right now; the message has a field saying, "this is the length of the following age restriction list" and it is always zero. When there are no restrictions the backend can request any frontend to host any age.
- The need for splitting hostable ages is unclear, but in a very large shard you might, for example, put all city instances, or even only the public one, on a separate server. It would also allow someone to join a shard and host just their own age, or whatever.
- Since the frontend never sends restrictions, the backend currently has no way to store and use them. So there is work on both sides.
- Presumably which ages a frontend server can host would be a configuration option but it could be done instead by restricting the contents of the game/age directory. That would not allow even finer-grained restrictions, but this whole project kind of needs a real use case to flesh out what is needed.

Having one backend TCP connection per frontend server is probably not a good thing, in terms of efficiency, or of scalability with a cluster of machines. We could change it to have one backend connection per frontend process and funnel everything through it. The mostly-implemented but unused and untested MultiWriterMessageQueue was meant, in part, to support this (there are multiple threads writing, but only one thread reading, which writes to the socket).

Then when messages are received in the frontend there needs to be some way of passing the message to the right thread and signaling them something is present to be handled. This is all unimplemented but the format of the backend messages is meant to take care of this case (in particular, the two "server IDs" in the header are for dispatching messages to the correct thread).

However, essentially the same changes would be necessary if the backend server is folded into the main server so there is only one process; then "sending" messages to the backend server is just putting them on the (shared) queue that would have been used to funnel everything through one connection, and "receiving" them is just taking them off that queue since there's no TCP connection to deal with.

New functionality

I am guessing Cyan's server handles the GameMgr stuff with server-side scripting. At the time MOSS was implemented there was no expectation that new game types may be added so adding in a scripting language was not worth the problems incumbent with doing so. Also the overall value of scripting is not clear (especially for the cost in server complication). However, it could be done. The project would have to include embedding the scripting and then rewriting stuff using scripts.

Please note, a lot of effort went into getting the Heek right, handling ties, allowing people to accidentally stand up and sit back down in the game, and probably other details I forgot. If you want to script Heek, it really ought to still work as well.

Really adventurous hardcore project: destroy KI hacks. The server would have to be able to analyze all game traffic to decide whether it looks okay. To do it right the server would have to have and understand PRP files and do physics. Also "console" commands have to be wisely filtered. If you're daunted, join the club.

Redesigns

http://assets.openuru.org/wiki/icons/wip_30x30.PNG

Content to be advised.

@@ Line 16: / Line 16: @@
 into MOSS. Sometimes a little direction helps.
+== Terminology ==
+The terms "frontend" and "backend" are used throughout this document. If the meaning is not clear from context, the terminology is explained in the [[MOSS-Developer_Notes#terminology | Developer Notes]].
+The "vault" is basically the database, though it could be abstractly thought of as a collection of trees of vault nodes. Generally when referring to it as a set of node trees, "vault" is written and when referring to it as something with tables and rows, "DB" is written. The "vault server" is that portion of the backend concerned with handling the Vault* messages and working with the database.
 == Fix bugs ==

In Memoriam: Tai'lahr