In this post I would like to cover the most common WEM issues and related troubleshooting procedures.

Change log:

2020-11-22:  WEM database creation: The target principal name is incorrect.  Cannot generate SSPI context
2020-11-26:  WEM database restore: SQL Server Service Broker for the current database is not enabled
2020-11-27:  WEM database is to big: Database query execution timeout expired
2020-12-07:  WEM admin console fails to connect with error: “Error while connecting to the specified Infrastructure Server”

Symptom: The target principal name is incorrect.  Cannot generate SSPI context.

Error description

During WEM site configuration, database creation wizard returns the following error message when attempting to connect to the SQL server: “The target principal name is incorrect.  Cannot generate SSPI context.”. This can be observed when SQL service is configured to use domain service account.

It can be displayed in ODBC connection test:

ODBC

or during the WEM database creation step:

Database creation log is showing the error

How to verify root cause issue ?

This error is not displayed in the event log. In order to verify if this is permission issue open SQL server log viewer and look for events from source Server.

 

Problem Cause

Explanation is provided by Microsoft in this KB article

If you run the SQL Server service under the LocalSystem account, the SPN is automatically registered and Kerberos authentication interacts successfully with the computer that is running SQL Server. However, if you run the SQL Server service under a domain account or under a local account, the attempt to create the SPN will fail in most cases because the domain account and the local account do not have the right to set their own SPNs. When the SPN creation is not successful, this means that no SPN is set up for the computer that is running SQL Server. If you test by using a domain administrator account as the SQL Server service account, the SPN is successfully created because the domain administrator-level credentials that you must have to create an SPN are present.

Solution

I do prefer to fix this issue with manual creation of SPN. You can find also suggestion to create SPNs dynamically for the SQL Server instances but this can cause some problems in clustered SQL configurations.  Step by step instruction to create SPN is the following:

  1. Open command prompt and list created SPN for SQL server object with the following command
    # 
    setspn -l SQL01 
    #

    SPN for the NetBIOS name of the SQL Server will look like: MSSQLSvc/SQLServerName:1433

    SPN for the FQDN of the SQL Server will look like: MSSQLSvc/SQLServerFQDName:1433

    Result is displayed in the figure below

  2. To change the SQL Server service account from local system to a domain user account remove current SPN from MSSQLSvc/SQLServerName:1433 computer account and add to the domain account. Command is the following:
    # 
    setspn -D MSSQLSvc/SQL01.LAB.citrix24.ctx:1433 SQL01 
    # 
    setspn -A MSSQLSvc/SQL01.LAB.citrix24.ctx:1433 LAB\SQLSrv 
    #

    Result is displayed in the figure below

  3. Verify domain user SPN is registered correctly with the following command
    # 
    setspn -L LAB\SQLsrv 
    #

    Result is displayed in the figure below

  4. Test connectivity to SQL server. Connections to SQL Server should now succeed!

 

Symptom: SQL Server Service Broker for the current database is not enabled.

Error description

Error from source Norskale Broker Service is created in the event log Windows Logs -> Applications and Services Logs -> Norskale Broker Service

How to verify root cause issue ?

Verify if service broker is disabled on on the restored database. Open a new query window in SQL Management Console and execute the following command:

#
SELECT is_broker_enabled FROM sys.databases WHERE name = 'DATABASE NAME';
#

Result is displayed in the figure below. State 0 means service broker is disabled.

 

Problem Cause

When you restore a database from backup, the service broker is disabled by default. Every database has a unique ID used by Service Broker. This ID must be unique across all databases in a SQL instance. If you already have another database with the same unique ID used by service broker then you can create a new unique ID for service broker with below command.

Solution

To enable the service broker on the restored database use the below command.

#
ALTER DATABASE [DATABASE NAME] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE
#

Result is displayed in the figure below

Problem solved 🙂

Symptom: Database query execution timeout expired.

Error description

During normal operations in WEM Administration Console when you go to menu Administration -> Agents -> Registrations, console is freezed and after app 5 minutes you are flip over to the main screen. At the same time in the right bottom corner small popup is displayed with information: Connection to Infrastructure server has been lost.

How to verify root cause issue ?

You can confirm if an issue exist by checking the following:

  1. Error is displayed in the event log Windows Logs -> Applications and Services Logs -> Norskale Broker Service
  2. Open SQL Management Console and run report Disk Usage by Top Tables
  3. Sort results by number of record

Problem Cause

Database maintenance option is not enabled by default and old records are not removed from the database. As a result table VUEMAgentsLog contains to many records and there is a timeout when broker is trying to display infomration about agent’s connections.  Based on my experience when total number of records is greather than 50 mln WEM broker might experience problems with timeout. In addition when the number of records in VUEMAgentsLog table exceed this treshold you are not able to successfully complete database maintenance task because of the timeout. Classic chicken and egg scenario. You have to pay special attention when you face this problem because database cleanup is very log intensive operation and it could fill up server disk where database log is located.

But why database is growing so fast ?  Answer is simple:

  • One agent is registering 15-30 records per hour, 500 – 800 records per day
  • Registration interval is not configurable

The result of agent registration can be checked in WEM Administration Console, Administration -> Logging

Solution

In order to clean database follow the steps listed below.

  1. Take a backup of WEM database
  2. Check free disk space on disks where database and log files are stored. See recommendations below.
  3. Login to SQL server and run report Disk Usage by Database. Note database size.
  4. On SQL server run the following sql command to delete the agent logs. In my case I decided to leave information from last 30 days.
    # 
    use C24_LAB_WEM 
    # 
    DELETE FROM VUEMAgentsLog WHERE Timestamp < DATEADD(day, -30, GETDATE()) AND Action = 1 
    #

    When query is completed successfully, let the server finish background operation for next 10 min and then run reports:

    a. Report Disk Usage by Database


    b. Report Disk Usage by Top Tables
  5. Open WEM Infrastructure Service Configuration console, goto Database Maintenance tab and check option Enable scheduled database maintenance. Press the button Save Configuration – this will restart broker service ! Task should complete operation in 10-15 minutes. See recommendations below.

    When task is completed check results -> Report Disk Usage by Database.
  6. Decide if you want to shrink database to release allocated space. Follow recomendations from your DB team but from performance point of view it’s good to leave some space for data allocation. See recommendations below

 

Recommendations:

  • In order to avoid problems with database filling up server disks, scheduled database maitenance task must be enabled.
  • Before you run the query to remove records from VUEMAgentsLog make sure you have sufficient free disk space. As a general rule you can assume the following
    – Required space for SQL data file (where tempDB is located) – the whole VUEMAgentsLog table will be loaded into tempDB  – make sure you have free disk space equal the current size of mdf file + free disk space needed for normal SQL server operations. Consult your SQL team to consfigure proper disk size. Database mdf size is shown in step nr 3
    – Required space for SQL log file – TBA
  • Based on used WEM reporting features, configure proper parameters for statistics and system monitoring retention period.
  • If you have multiple WEM brokers, configure this option only on one broker.
  • If you decide to shrink database files leave data file at 1 GB size.
  • Configure monitoring system to raise an alert when database size is greather than 1 GB. Include data and log files in the monitoring

 

Symptom: WEM admin console fails to connect with error: “Error while connecting to the specified Infrastructure Server”

Error description

The WEM Administration Console errors out while connecting to the broker with a generic error: “Error while connecting to the specified Infrastructure Server”.

How to verify root cause issue ?

You can confirm if an issue exist by checking the following:

  1. During connecting to broker administration console an error is displayed as shown in the figure below.
  2. Looking into the the WEM admin console debug logs the following error is reported:Exception -> ConnectToBroker.Run() : System.ArgumentException : Value was invalid.
    Parameter name: sddlForm

Problem Cause

Corrupt entries found in the Name column of the VUEMUsers table in the WEM database. When attempting to read the user statistics, the console cannot translate the corrupted entries as they do not have the correct SID format and thus errors out.

One of the potential causes of the corrupt SID recording is a malfunctioning of the Netlogon service on a server which is mistakenly identified as having opened a session with an incorrect SID. If the issue re-occurs, compare the LastAgentId of the corrupt SID user record with the VUEMAgents table to determine which server the recording is coming from, then restart the Netlogon service there

Solution

In order to fix corrupted SID entries follow the steps listed below.

  1. Take a backup of WEM database
  2. Identify corrupted entries
    On SQL server run the following sql command to see corrupted entries

    #
    Use VDA_LAB_WEM
    #
    SELECT * FROM VUEMUsers WHERE Name LIKE '%\%'
    #

    The result is shown in the figure below

  3. Remove records from the WEM database
    On SQL server run the following sql command to remove records with incorrect entries

    # 
    Use VDA_LAB_WEM 
    #
    DELETE FROM VUEMUserStatistics WHERE UserId IN (SELECT IdUser FROM VUEMUsers WHERE Name LIKE '%\%')
    DELETE FROM VUEMUsers WHERE IdUser  IN (SELECT IdUser FROM VUEMUsers WHERE Name LIKE '%\%')
    #

    The complete query to remove incorrect entries is shown below

    # 
    Use VDA_LAB_WEM 
    # 
    SELECT * FROM VUEMUsers WHERE Name LIKE '%\%' 
    #
    DELETE FROM VUEMUserStatistics WHERE UserId IN (SELECT IdUser FROM VUEMUsers WHERE Name LIKE '%\%')
    DELETE FROM VUEMUsers WHERE IdUser  IN (SELECT IdUser FROM VUEMUsers WHERE Name LIKE '%\%')
    #

Recommendations:

  • If problem with corrupted SID entries is consider to create an job on SQL server or scheduled task on management server to regularly check if corrupted SID does exisit in a WEM database and delete this entry.