TSWoW, Epoch and Performance: Part 1 - Stability
tldr: Project Epochs instability or extreme latency issue was not caused by tswow, trinitycore or windows.
Given the recent issues surrounding Project Epoch and the understandable concerns around tswows and “stock” TrinityCores stability and performance, I’ve decided to start writing a series of blog posts discussing both what I know about their performance characteristics in general, and what I learnt while working with Project Epoch for the past two weeks.
This is mostly intended for other developers familiar with tswow or TrinityCore. Those looking for drama on the internal procedures or how the Ascension move happened should look elsewhere. I’m not part of Project Epochs development team or the move to Ascension.
Because I want to allow Project Epoch to give their own commentary on the situation, this post is not going to contain a lot of information, and mostly a few assertions that I’ll ellaborate in a future post.
Despite my own personal feelings on Project Ascension, I don’t believe that moving to Ascension is bad for Project Epoch or its players. If getting more time to work on the server means we get more upstream commits, there can still come good out of it.
My relationship to Project Epoch
I’ve been active in the wow private server scene under various different names since around 2008, most recently as IHM or <o>. I was the primary developer of the tswow framework for the first 2-3 years of its public release. Project Epoch was one of the first major projects to adopt tswow back in 2021.
Two weeks ago, I reached out to Project Epoch when it became apparent that they were struggling to get their servers up and running during their launch.
During my time there, I mostly worked to identify and resolve the database performance issues and with the linux migration. I did this both under my main account ihm-tswow and the alt account ansbach533 in a poor attempt to retain a level of privacy, but messed up my push configuration and quickly revealed what was going on.
In this position, I had access to Project Epochs internal crash logs, performance metrics and discussions, but not to their modules or production servers, and had limited influence on the decisions being made.
Stability Issues
Project Epoch had two primary issues during its launch, both of which were very visible to players. The first was that the server had a database bottleneck which caused extremely slow logins and rollbacks. The second was that the server crashed a lot.
The database issue was identified by the async log that I wrote and was easily addressed. The crash issues were caused in all cases but one by customizations made to the Project Epoch core. There was a single crash related to a livescript because it was incorrectly written.
These issues do not happen on stock installations and standard configurations of tswow or TrinityCore, either on windows or linux.
Of course, tswow and TrinityCore can crash and can’t handle infinite amounts of players, but those were not the issues that Project Epoch hit.
Performance Issues
I’m not going to comment in detail on the specific performance characteristics of Project Epoch, as it uses a partition system (what players often refer to as “multithreading”) that I had little involvement in and that is not present in standard tswow or TrinityCore installations.
Project Epoch was able to run 5000 players mostly in starting zones with <100ms diffs before crashing to an unrelated issue, meaning the real number wasn’t known. This does not necessarily mean that tswow itself can do that, though I personally believe it’s not far off.
Instead, the posts in this series are intended to address the common understanding that TrinityCore is less performant than other cores such as AzerothCore and vmangos, how and where those cores actually differ, and how common performance bottlenecks can be addressed without extensive changes like map partitions, and more easily than many assume.
Whats next?
In the following weeks, I’m going to need to go back and focus on a little on our own business. However, we recognize that this represents an important turning point in the scene, with Project Ascension both increasingly shaping itself into a monopoly, and unless the current situation is addressed, keep many modders in the belief that their private core or infrastructure is necessary to keep any project at scale running.
Part of my motivation for assisting Project Epoch was specifically avoiding a move to Ascension, and regarless of the reason, I obviously failed. These posts are the start of the backup-plan.
Future Posts
I want to create a series of posts explaining what the actual performance bottlenecks in modern emulators are, how they can be addressed, and to what extent tswow does or does not impact performance and stability. My hope is that by covering these topics, I can raise interest in actually fixing the issues that exist, and dispel common performance myths.
A rough outline for the topics I want to cover are as follows:
- my perspective on what went wrong with Project Epochs launch
- tswow’s stability and performance overhead
- netcode, asio and packet buffers
- relocation notifiers, unit fields and batching
- automated bots and what they can and can’t measure
- auras
- threading models
- async chunk updates and partitions
- exploits and item logs
Production Server Workshop
As part of the assistance we provided Project Epoch, me and a friend with experience in the scene prepared a workshop intended to explain procedures of what we know of hosting large-scale wow servers in production, especially with limited or inexperienced staff.
We didn’t get the opportunity to actually hold it with Project Epoch, so instead we’re going to do it as a public workshop and adapt it for anyone interested in the topic. We don’t currently have a time set, and would like to hear from the community if this is something they are interested in.
The topics we have planned are:
- Production and Worldserver Security
- Stress Management
- Basic Operational Linux for Beginners (that can be taught to staff)
- Crash fixes and debugging in production
Footnotes
-
It was not the stairs that broke the Project Epoch launch on friday. It was my script that broke logs and caused the server to enter a crash loop. The stairs were fixed very quickly.
-
Even as we had differences in opinion, there was nothing close to infighting between me or any internal staff in the development channels.
-
It is true that I left the team on friday over non-technical incidents that I did not condone. The first of them was the public post about the friday launch attempt, the other was resolved and I won’t discuss it publicly.
-
ulmetrs was one of the nicest people I’ve met in the scene.