Toldain Talks

Because reading me sure beats working!


Toldain started as an Everquest character. I've played him in EQ2, WoW, Vanguard, LOTRO, and Zork Online. And then EVE Online, where I'm 3 million years old, rather than my usual 3000. Currently I'm mostly playing DDO. But I still have fabulous red hair. In RL, I am a software developer who has worked on networked games, but not MMORPGS.

Monday, July 17, 2006

The Selling Station Saga

Recently, SOE has introduced an entirely new mechanism for selling. I was glad that they split the house vault and the selling vault, since I have been keeping tradeskill raw materials and fuel in my house vault, so that I could tradeskill in the facility that my guild has set up, with one of each tradeskill station in a house in South Qeynos.

The split meant that I wouldn't have to trip over one kind of good while dealing with the other, and increased the amount of storage that I had by a fair margin. But, I pondered, why was this change made? I couldn't have been to give us more storage.

There were other interesting details, too. If you dragged items into the selling interface too quickly, you got a message saying that a transaction was underway, and the item move did not take effect. You have to go into the interface to get your money too.

And furthermore, it was now impossible to sell from inventory. I was sad to see that go, since it gave on an opportunity to bypass the broker fee and split that gain between buyer and seller. Why did they do away with that?

My best guess is that the new selling mechanism was intended to address a certain kind of abusive play, which is known as duping. Selling an item involves multiple database transactions. There needs to be one to delete the item from the seller's inventory, and another to add the item to the buyer's inventory. There needs to be a database transaction to delete the gold from the buyer's reserves, and one to add it to the seller's.

Any of these transactions can fail to happen for benign reasons. A lagspike could happen at just the wrong time, or a server or a client could hang or crash. One would really like to not have customer's items or money disappear because of a lagspike, so the order of transactions is probably something like this: Add item to buyer's inventory, add money to seller's inventory, subtract item from seller's inventory, subtract money from buyer's inventory. Something like that.

Well, transactions can be made to fail for reasons other than benign ones. The bad guys, the dupers, will attempt to split this set of transactions, allowing the first two to happen, but stopping the second two. I'm not certain how they might do it, but here's one thought: Put an item up for sale in your vault, and get on Ventrilo (or Teamspeak) with your buddy. He gets set to hit the buy button on your item, while you get ready to click the button removing that item from the market. You coordinate both button pushes, hoping to split the transactions, and dupe the money, the item, or both.

It's not guaranteed to work every time, perhaps, but it will likely work some of the time, and that's good enough. By the way, I think this is why you couldn't remove an item from your vault while it was for sale. This would have made duping much easier.

If you get programmers involved, who can read the messages, known as packets, sent to and from clients, and figure out what they mean, even more exploits may be possible. Potentially, one could even write a "fake" EQ2 client, that maliciously mishandled transactions. It might also be possible to take items or gold from another player without compensation. It all depends on the transaction order.

I recall that the introduction of vault selling was accompanied by many issues. There was unplanned downtime, and the vault selling got turned off periodically. It seems likely that these problems were related to duping problems, and when they changed to giving you your gold directly rather than when you entered your house, this also was an anti-duping measure.

The new selling interface precludes such duping exploits by bundling all the transactions together, and "serializing" all transactions to the selling inventory. What this means is that adding things removing things to/from the selling vault goes through the server, and before such transactions can take place, the server checks to see if any other relevant transactions are currently running. If they are, then the transaction is cancelled with no effect, and the player must try again.

This is the standard method for preserving the integrity of database transactions. Of course, the SOE developers are game developers, not database developers, so they got lessons from the school of hard knocks.

Well, then late last week came the new special selling stations. These new containers can be placed only in the selling vault, and have a physical manifestation in your house, which, once again, allows one to bypass the broker fee. Great!

But in the days afterward, we noticed lots of lagspikes and pauses in the action on our server. At first we assumed this was a problem with our ISP, as it usually is. But then we noticed that other players, from other parts of the country with different ISP's, had the same lagspike at the same time, while Ventrilo wasn't affected. Not a local problem, so probably a server problem.

Then just a couple days ago, one of the daily updates contained this message: "A speculative fix has been made that may address the server lagspikes that have been ocurring recently".

Allow me to translate this from programmer-speak. Generally, this kind of message means, "We've been tearing our hair out for the last 36 hours (or more) trying to figure out what's causing these server pauses, and this is the best we could come up with. We're still not sure that it's really the problem, but we think that at least it won't make things worse. We hope."

Whenever something breaks, any good programmer asks "what just got changed?" In this case, it was the home-selling containers. How could this cause lagspikes on the server. Well, buying stuff from a home-selling container involves more database transactions. Transactions which all must go through the server, and which must be prevented from interfering with one another, via some kind of lockout mechanism.

Now it is only necessary to prevent transactions that involve the same character from interfering with one another, but one could imagine a programmer being proactive and serializing all transactions, regardless of the characters involved. But then, feeling generous, he decides to simply queue up transactions rather than cancel them when they overlap. This seemingly innocent policy decision would create an enormous bottleneck in the server, leading to pauses and poor performance.

Which is my guess for what happened, and a great example of why programming always ends up being much harder than it seems like from the outside.

In any case, I'm very glad to see the new selling stations. It's another item for Carpenters to make, and gives people a reason to have houses and to visit them.