Blog

  • TokenAndPermUserStore Cache on x64 SQL 2005

    We’ve been having yet another performance issue that has taken awhile to fix.  This one isn’t related to code though.  However, it has been a nightmare to solve.  Thankfully we have some Microsoft guys on sight who were able to quickly diagnose the problem and get us moving along.

    Some background:

    • x64 SQL 2005 SP3
    • x64 Windows 2003 R2
    • 64GB RAM per cluster node
    • 12 HT enabled cores

    With that, we were noticing a lot of CPU activity even though we really shouldn’t have, but it was strange as CPU would peak at about 45% and the SQL would appear to become very sluggish.  Running ad hoc queries took forever, and it was causing our app to run slowly too.  In fact, we were normally front-end CPU bound, but when we were having these issues, the front-end sat at about 25% CPU, SQL was at 45% and there were no waits on the SQL server.  It was very annoying as there was no place to point the finger at.

    As I was digging through things, I noticed that the last geography we did we had set MDOP to 1 (see previous post).  As I was slowly going through various SQL configurations I noticed that it was set to 0 on all of the cluster nodes.  Needless to say, I talked to a few people, and 1 is apparently the default setting during their buildout, so I changed it. 

    Miraculously everything was solved!  A quality day’s worth of work!  Ran some validation steps, and then handed it off to our offshore team.  The next day, I notice that the results were looking a lot worse that what I was seeing.  I spin up LoadRunner (ugh, another post I should probably write about) and run a test.  My results are for crap too.  Looking at front-end CPU, I notice it is once again not maxing out, and SQL is running hot again. 

    I know how to fix this though, so I go and look at the MDOP setting.  Interesting, it is still set to 1.  Well, maybe something is hung someplace, so I set it to 0 then back to 1.  Run the test, problem solved, move on to something else. 

    Well, this continually happens for the next week, and so I send an email asking if anyone has seen this before to the local DBAs and the onsite Microsoft DBAs if they have ever seen MDOP “revert”, but not really.  Needless to say, they had never heard of this, but the Microsoft DBA quickly narrowed in on the fact that the cache plan is being dumped when I re-run that configuration.  Looking at some memory dumps, he quickly jumped on the TokenAndPermUserStore cache.  We also verified it by only flushing that cache and watching the improvements on the site. 

    The good news is that we are not the only ones having issues with this.  MS has actually tried to fix this issue since before SP2.  However, nothing, not even in SP3 has actually fixed it.  Yet, in SP3 they finally added a few trace flags that can be used to manually set the size of this specific cache.  Before I get to telling you how to fix it, here is what is going on.

    On the previous geography our SQL tier looked the same except that we only had 24GB of RAM on the SQL nodes, plus our databases were a lot larger because of legacy data.  Therefore, we have memory pressure on the box.  SQL wants to load all the databases into RAM, but it can’t.  Therefore, the caches are continually going through garbage collection.  Now, on this current project, we have more than doubled the amount of RAM, and our databases are are tiny in comparison.  In fact, all of our databases are effectively in memory.  Therefore, we have no memory pressure, and the caches are never collected.  They keep growing to sizes that make them useless as they are spending more time in CPU finding the corresponding item (the security token in this case) than just recreating it.

    Our SQL nodes are too big.  Who knew that was possible?

    Now the solution.  Microsoft has tons of articles on this, but the one that describes it the best is 927396.  The top bullet points explain exactly what we were seeing:

    • Queries that typically run faster take a longer time to finish running.
    • CPU utilization for the SQL Server process is more than usual.
    • When you experience decreased performance when you run an ad hoc query, you view the query from the sys.dm_exec_requests or sys.dm_os_waiting_tasks dynamic management view. However, the query does not appear to be waiting for any resource.
    • The size of the TokenAndPermUserStore cache store grows at a steady rate.
    • The size of the TokenAndPermUserStore cache store is in the order of several hundred megabytes (MB).
    • In some cases, execution of the DBCC FREEPROCCACHE command provides temporary relief.

    Now there are multiple ways of fixing it, it really depends on how many users are accessing your database.  For us, it is very few as we only have application accounts.  However, here are your options:

    1. Enable trace flag 4618 to set a quota of 1,024 entries.
    2. Enable both trace flag 4618 and 4610 to set a quota of 8,192 entries.
    3. Put a custom quota in the registry and enable trace flag 4621.

    We went with option #2, so we have the default settings, and guess what it works!  The downside is that the memory keeps increasing, but we haven’t had a performance issue.  I am guessing is that the used memory space is staying consistently the same size, but it is not reclaiming memory, which is causing a memory leak.  I am going to work with our DBA this next week to validate that assumption (and make sure it will reclaim the memory at some point), so I will keep this post updated based on what we found.

    In addition, I am going to switch MDOP back to 0, and see if SQL isn’t quite as dumb with parallelism as we think it is now.

     

  • Another Xbox 360 Dead

    That’s right folks, almost exactly three years after my first Xbox 360 died (from the exact same issue), my replacement one has also just bit the dust.  What is hilarious is that is was doing something so complex as streaming a video from my Media Center PC.  Strenuous indeed. 

    While my Xbox is technically out of standard warranty, because I am once again getting the three light, red ring of death, my warranty is extended 3 years which would expire in December of 2010.  Sadly unlike last time, they don’t send you a box, so I have to find adequate shipping materials to send it to them asap.

  • Base Perf Improvements

    Since I keep forgetting to check these base performance assumptions prior to digging in, I thought I would list them here with some description.  Yes, this is mostly for my personal edification, so deal with it.

    1. SQL – These items can be done when you see Buffer Latch waits
      1. Max Degree of Parallelism (DOP) – This setting should be set to 1.  By default it is set to 0, so it needs to be changed.  This setting becomes more important as CPU cores scale up.  It turns out that SQL is not good at parallelizing T-SQL queries.  When set to the default of 0, CPU is artificially increased due to SQL attempting to parallelize threads.  Some additional info can be found here.  This affects all versions of SQL.  Be sure to test first!
      2. T1118 Flag – This is a trace flag that much be added to the SQL startup parameters (-T1118).  This allows SQL to access multiple pages.  When you have a multi-core (processor) machine, you should definitely enable this flag on all instances of SQL.  In addition to this change you must do the following item to see benefits.  You can get more information here.
      3. File Partitioning – In addition to the T1118 flag, you also need to break out your files.  TempDB is the most common bottleneck that you should do this to.  In our testing it is at least worth while to create the number of data files (with autogrowth turned off) to 1/2 the number of cores you have.  We added all our This also helps other databases that have high contention.  Log files can also be broken out if lots or log contention is occurring.  However, in our cases, it has typically been limited to the data files.
    2. Web/App Server
      1. Wildcarding Disabled – By enabling wildcarding you can have pretty URLs that don’t end in an extension without using a rewrite module (not free for IIS6).  However, by enabling scriptmap wildcarding all files go through the asp.net handler.  This is obviously a huge CPU burden on the web server, plus an even larger burden if you are backing session in the database because the database CPU and IO goes through the charts.  Nothing like having session information for loading a JPG file!  You can see what not to do here.
      2. Kernel Mode SSL – This adds ~10% CPU relief on the web servers.  You can’t do anything with client certifications though, so be sure to test and read all the gotchas.  Find out more here.
      3. Debug = “False” – For ASP.NET applications, the <compilation debug=”false”/> causes three things to happen.  First, asp.net timeouts are removed.  Second, the temp asp.net dlls are batch compiled and in release mode.  When it is set to true, each ascx, asax, and aspx is compiled into it’s own dll in debug mode.  Third, is that you can’t step line by line in the assemblies, which allows for a large performance increase.
  • Black Logon Screen for RDP

    I get this every once in a while, and it is quite annoying.  Basically all the colors are messed up when you try to log in via RDP.  Everything appears black.  You can see your cursor, so you can log into machines, but it is very difficult when dialog boxes are around.  I found the solution over at server intellect, and wanted to slap it here so that I have an easy place to find it.

    A black Remote Desktop Login Screen when connecting to your Windows Server 2003 Server is caused by registry entries changed either through minor corruption or user error. To resolve it, replace the registry entries with the ones in the registry script below. Simply copy and paste the below script into a text file, and rename it “rdpreset.reg”. Right click on the resulting file, and select Merge.

    Windows Registry Editor Version 5.00

    [HKEY_USERS.DEFAULTControl PanelColors]
    “ActiveBorder”=“212 208 200”
    “ActiveTitle”=“10 36 106”
    “AppWorkSpace”=“128 128 128”
    “Background”=“102 111 116”
    “ButtonAlternateFace”=“181 181 181”
    “ButtonDkShadow”=“64 64 64”
    “ButtonFace”=“212 208 200”
    “ButtonHilight”=“255 255 255”
    “ButtonLight”=“212 208 200”
    “ButtonShadow”=“128 128 128”
    “ButtonText”=“0 0 0”
    “GradientActiveTitle”=“166 202 240”
    “GradientInactiveTitle”=“192 192 192”
    “GrayText”=“128 128 128”
    “Hilight”=“10 36 106”
    “HilightText”=“255 255 255”
    “HotTrackingColor”=“0 0 128”
    “InactiveBorder”=“212 208 200”
    “InactiveTitle”=“128 128 128”
    “InactiveTitleText”=“212 208 200”
    “InfoText”=“0 0 0”
    “InfoWindow”=“255 255 225”
    “Menu”=“212 208 200”
    “MenuText”=“0 0 0”
    “Scrollbar”=“212 208 200”
    “TitleText”=“255 255 255”
    “Window”=“255 255 255”
    “WindowFrame”=“0 0 0”
    “WindowText”=“0 0 0”
    This will correct the colors on the login screen to Windows Server default.

  • More CSS Fun

    As I am sure you gathered from other recent posts, we are using (read: testing) CSS for a lot of stuff.  Well, we found out a beautiful new issue when creating and editing projects from the command line. 

    When you look at MSDN about adding a filter to a Project, you are presented with the following syntax:

    /Filter “+text1.txt -*.exe -dir1 +…*.jpg”

    Needless to say, that is not correct.  For AddProj, they syntax has to be correct, but for EditProj, syntax is a lot more lax (the below is the syntax for EditProj).  Regardless, everything in the quotes is incorrect.  Instead a filter should look like the following:

    Filter=+“text.txt” -”*.exe” -“dir1” +”…*.jpg”

    They are either updating the MSDN article or creating a KB article about this.

  • Windows Home Server

    With the release of Power Pack 3 for Windows Home Server, I have been thinking about trying it out.  Having an MSDN subscription and a virtual machine host made this quite a bit easier, as I didn’t need to purchase any additional hardware to use it. 

    Prior to installing WHS, I had a Windows 2008 virtual machine that I installed all my stuff on and had shares.  It definitely worked, but some of the cool features of WHS kept me wanting to move in that direction.  Things like automated backups of client machines, a console, better managed and accessible shares, and Media Center tie-ins (I have a Win7 virtual machine as a Media Center and an Xbox 360 as the extender).  All in all, I think it is definitely going to be a move in the right direction, but that doesn’t mean there weren’t issues getting there.  I am going to outline a bunch of them here in the hopes that others can find comfort in the work I have done to resolve them.

    1. Installation – As I mentioned before, I was installing it to a virtual machine.  No big deal right?  Wrong.  My virtual machine host does not have a keyboard, monitor, or mouse hooked up to it normally.  I just have it sitting in the closet chugging a long.  The downside to that is when I am building a new virtual machine, it had better not need the mouse until I can directly connect to the machine or install the integration drivers.  This is because in Hyper-V when you are using Remote Desktop to manage the host, until you are able to install the drivers, you have no mouse.  Well, sadly, with the MSDN version (not sure about pressed CDs), the tabs are not correct with the EULA Accept page.  There is no way for you to accept the EULA to progress through the installation.  Boo.  That means I had to drag a monitor, keyboard, and mouse into the closet to get things working.  It just seems silly that this is an issue for something that is meant to run headless anyways (obviously not during the installation, but still).
    2. Domain Membership – Do not add the WHS machine to a domain.  I know you want to if you are running a domain at home, but don’t, just don’t.  I really wanted to run it as a domain member too, but there are just too many issues, and tricks MS has done to make it not worth your while.  Things such as the machine powering down every 48 hours when it is a member (annoying at best), to the console crashing after adding domain members to the local windows home group manually.  While I know you can work around most of these things (except for the console crashing), what benefit are you really getting?  Just leave it as a member of a workgroup.
    3. Firewall Rules – Much like domain membership, the best answer here is just to turn off the Windows Firewall on the box.  I am not sure if it was because it updated to Windows 2003 SP2 which enabled it by default after the base install or what, but there was nothing but headaches with the Windows Firewall enabled.  Some of the more quality examples were:
      • The website you are supposed to access resides on ports 55000/56000.  That was not allowed in the rules by default.
      • When setting up the Media Center connector, it uses DCOM to connect.  DCOM typically uses a random list of high ports (1024-65535) to do its bidding.  Granted, you can change the ports to only use a few, neither of which is added into the firewall rules.

      Just silly stuff like that, which take awhile to troubleshoot, when they don’t need to.  Turn it off.

    4. Carbonite Online Backup – This is the current beast I am dealing with and it is a fun one.  Carbonite works exactly how I want it to, and I have been using it on my Windows 2008 machine to do online backups for over a year now.  It works.  It isn’t fast, but it works.  Moving it to WHS has been a fun test of intelligence.  In my setup I had 2 drives, one was 80GB (the minimum to get it setup because I just wanted to play at first) and one 800GB.  Everything works great, except for Carbonite.  For the time being, I just wanted to backup the data in some of the shares.  No problem, right?  Just point it to where the share is located on disk (D:sharessharename) and we are good to go.  Wrong!  WHS has a special drive configuration that allows you to simply add drives whenever and whatever size and it will automatically add that space to your shares.  It does this via junctions, tombstone files, and a service that manages where all the files are stored.  I won’t delve into it here, but you can read all about it yourself.  Basically the files in your share are elaborate shortcuts to the actual files, which are spread across all your drives, and then the files are managed by the service, and then could move every hour.  So, when you point Carbonite to D:sharessharename, you are effectively backing up the shortcuts.  Instead you need to point it to the junction point and all the files there (these are hidden system files) at C:fsDE.  As I mentioned, these files are always moving around though based on the service.  Therefore, the only way for Carbonite to work correctly on a WHS is if you have a single large drive, which is exactly what I am going to do.  As soon as you add another one, files could get moved, and your backups will not be complete.
    5. Protocol Mismatch – For the longest time I was receiving the following error whenever I was trying to install the connector on a client machine (Windows 2003, Windows 2008, Win 7)

      Protocol mismatch. This computer uses protocol version 6.0.2030.2, but partner computer [server] uses protocol version 6.0.2030.0. A connection cannot be established

      This was because the http://server:55000/enrollid/id.aspx webservice that the connector uses on the WHS machine was returning the incorrect version.  The only way to fix it that I found was by installing the final build of Power Pack 3.  The beta didn’t work, nor did downgrading to a previous PP.

    Hopefully this little guide helps someone else out there.  As I tinker with it some more, I will probably add additional articles, but it is amazing how long it has taken me to get this silly machine up and running, especially when this is supposed to be an appliance.  I know that MS is learning a lot from this, but the polish is still a bit missing, and this is after 3 Service Packs (Power Packs).

  • Copy Files Between Untrusted Domains Programmatically

    Normally on a Windows machine when you attempt to access a share in a different domain you are prompted for credentials.  However, there are no copy utilities (copy, xcopy, robocopy, etc) that pass credentials in the command line.  Also, as it is an untrusted domain, you simply can’t do a RunAs (need to manually type in a password for that anyways).

    The answer was so obvious, I can’t believe it didn’t come to me right away.  However, net use, the simplest of commands, is your friend again.  Using the following command you can initiate the connection and then copy between the machines as much as you like via normal UNC path.

    net use [server][path] [password] /user:[username]

    copy [server][path][more path]

    net use [server][path] /delete /y

    By using the following syntax there is no mapped drive.  There is simply a stored connection with the associated credentials for the life of the profile or until the delete command is run. 

  • PSExec

    On my current project they deploy PSExec to all machines.  The downside?  It is a version from 2004.  Yes, that is right from over 5 years ago.  Needless to say, there has been nothing but problems with it.  Granted, PSExec is probably one of the easiest things to upgrade as you just need to upgrade the version that you are calling PSExec from.  It then copies itself out to the machine and does it’s job.  However, if the old version was ever tried against a machine, you may be in some weird state like we were where PSExec wouldn’t start, or it would and wouldn’t close, etc.

    Stumbling along the internet I found someone else who had this issue.  However, his processes doesn’t always work because there may be something hanging onto the PSExecsvc.exe file still.  Instead I recommend using the following batch file.

    sc %1 stop psexesvc
    sc %1 delete psexesvc
    del %1admin$psexesvc.exe
    del %1admin$system32psexesvc.exe

     

    It just deletes the service before it tries to delete the files.  Works like a charm.

  • CSS Project Destination Authentication Account

    Been a long time again, but I have a few updates to put up here.  They may not help a lot of people, but they are good things that I want to remember 🙂

    Commerce Server uses a pretty antiquated system for doing content and code promotions.  It is called Commerce Server Staging (CSS).  We have set it up, but want to better automate the deployment of all the CSS projects (jobs).  For the life of me, I could not figure out how to add in the destination authentication account programmatically.  Thankfully it wasn’t something short sighted I was missing.  After discussing with Microsoft for awhile, there is apparently some undocumented switches on CSS.exe.  After you add the project (css.exe addproj), you then need to edit the project with the following syntax:

    css.exe editproj [project] Destination=[server];[username];[password]

    Yay to automation!

    Update:
    Turns out you can do this with the Addproj also doing the same way with the semicolon separated items for the destination.

  • Xbox 360 and Netflix

    Netflix streaming straight to the Xbox 360 has got to be the coolest thing ever.  Last weekend we watched two movies that way.  I just wish they would get more movies on the streaming feature.  Screw DVDs moving forward.  It’s all about the streaming media.