Quantcast
Channel: Kevin Holman's System Center Blog
Viewing all 179 articles
Browse latest View live

UR12 for SCOM 2012 R2 – Step by Step

$
0
0

 

image

KB Article for OpsMgr:  https://support.microsoft.com/en-us/help/3209587/system-center-2012-r2-om-ur12

Download catalog site:  http://www.catalog.update.microsoft.com/Search.aspx?q=3209587

 

 

NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2012R2 and never applied an update rollup – you can go straight to the latest one available.  If you applied an older one (such as UR3) you can always go straight to the latest one!

Key Fixes:

  • SCOM 2016 Upgrade failure:  When you try to upgrade System Center 2012 R2 Operations Manager Reporting Server to System Center 2016 Operations Manager reporting server, the upgrade fails for the following configuration:

    • Server A is configured as System Center 2012 R2 Operations Manager including Management Server.
    • Server B is configured as System Center 2012 R2 Operations Manager, including Operations Manager Database (OpsMgrDB), Operations Manager Data Warehouse (OpsMgrDW) and Operations Manager Reporting Server.
    • ( X ) Management Server Upgraded Check – The management server to which this component reports has not been upgraded.
  • Recovery tasks on “Computer Not Reachable” messages in the System Center Operations Manager Monitor generate failed logons for System Center Operations Manager Agents that are not part of the same domain as the Management Groups.
  • Resource pools:  When a Management Server is removed from the All Management Servers Resource Pool, the monitoring host process do not update the Type Space Cache.
  • SHA2 support for certificates:  SHA1 is deprecated for the System Center 2012 R2 Operations Manager Agent and SHA2 is now supported.
  • Override fixes:  Because of incorrect computations of configuration and overrides, some managed entities go into an unmonitored state. This behavior is accompanied by event 1215 errors that are logged in the Operations Manager log.
  • IntelliTrace Profiling workflows fail on certain Windows operating system versions. The workflow cannot resolve Shell32 interface issues correctly.
  • Notifications:  There is a character limitation of 50 characters on the custom fields in the notification subscription criteria. This update increases the size of the limitation to 255 characters.
  • OMS:  You cannot add Windows Client computers for Operational Insights (OMS) monitoring. This update fixes the OMS Managed Computers wizard in the System Center Operations Manager Administration pane to let you search or add Windows Client computers.
  • When you use the Unix Process Monitoring Template wizard to add a new template to the monitor processes on UNIX servers, the monitored data is not inserted into the database. This issue occurs until the Monitoring Host is restarted. Additionally, the following is logged in the Operations Manager log file:

    Log Name:      Operations Manager
    Source:        Health Service Modules
    Event ID:      10801
    Level:         Error
    Description:    Discovery data couldn’t be inserted to the database. This could have happened because of one of the following reasons:
          – Discovery data is stale. The discovery data is generated by an MP recently deleted.
          – Database connectivity problems or database running out of space.
          – Discovery data received is not valid.

    Additionally, you may receive the following exception, which causes this issue to occur:

    Exception:

    Exception type:   Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistException
    Message:          ManagedTypeId = ccf81b2f-4b92-bbaf-f53e-d42cd9591c1c
    InnerException:   <none>
    StackTrace (generated):   SP IP Function   000000000EE4EF10 00007FF8789773D5 Microsoft_EnterpriseManagement_DataAccessLayer!Microsoft.EnterpriseManagement.DataAccessLayer.TypeSpaceData.IsDerivedFrom(System.Guid, System.Guid)+0x385

     

New Linux operating systems supported
  • RHEL 7 on Power8 is now supported in System Center2012 R2 Operations Manager
     
Issues that are fixed in the UNIX and Linux management packs
  • SHA2 support:  SHA1 is deprecated and SHA2 is now supported on the management server that’s used to sign the UNIX/Linux Operations Manager i (OMi) certificate.
  • OMI and FIPS support:  OMi start attempts fail on all FIPS-enabled UNIX/Linux systems. This fix updates the agents to support FIPS-enabled systems.
  • HPUX fix:  The Average Physical disk sec/transfer performance counters are not displayed for Hewlett Packard systems.
  • Solaris 10 fix:  OMi displays incorrect memory information for Solaris 10 systems.
  • SLES fix:  The Network Adapter Performance counter is not displayed for SLES 12 x64 platforms in the console.
  • HPUX fix:  You can’t discover file systems on HPUX 11.31 IA-64 computers that have more than 128 disks. Previously, only 128 VGs were supported. This update extends support to 256 VGs.
  • JBOSS fix:  Deep monitoring can’t be started successfully for some Jboss applications because the discovery of the Jboss application server sets the DiskPath for the Jboss server incorrect. Deep monitoring is not started in JBoss stand-alone mode when a nondefault configuration is used. This update provides additional support for JBoss stand-alone mode.

 

 

 

Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
    • Management servers
    • Audit Collection servers 
    • Gateway servers
    • Web console server role computers
    • Operations console role computers
    • Reporting
  2. Apply SQL scripts.
  3. Manually import the management packs.
  4. Update Agents
  5. Unix/Linux management packs and agent updates.

 

 

1.  Management Servers

image

It doesn’t matter which management server I start with.  There is no need to begin with whomever holds the “RMSe” role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

I can apply this update manually via the MSP files, or I can use Windows Update.  I have 2 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

image

 

Then extract the contents:

image

 

Once I have the MSP files, I am ready to start applying the update to each server by role.

***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator role to the SQL database instances that host your OpsMgr databases.

 

My first server is a Management Server Role, and the Web Console Role, and has the OpsMgr Console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

image

 

This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure. 

 

You can check the application log for the MsiInstaller events to show completion:

Log Name:      Application
Source:        MsiInstaller
Date:          8/31/2016 9:01:13 AM
Event ID:      1036
Description:
Windows Installer installed an update. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2012 R2 Operations Manager UR11 Update Patch. Installation success or error status: 0.

 

You can also spot check a couple DLL files for the file version attribute. 

image

 

Next up – run the Web Console update:

image

 

This runs much faster.   A quick file spot check:

image

 

Lastly – install the console update (make sure your console is closed):

image

 

A quick file spot check:

image

 

 

Additional Management Servers:

image

I now move on to my additional management servers, applying the server update, then the console update and web console update where applicable.

On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products: 

image

 

The applicable updates show up under optional – so I tick the boxes and apply these updates.

image

 

After a reboot – go back and verify the update was a success by spot checking some file versions like we did above.

 

 

Updating ACS (Audit Collection Services)

image

 

 

You would only need to update ACS if you had installed this optional role.

On any Audit Collection Collector servers, you should run the update included:

image

 

image

 

A spot check of the files:

image

 

 

 

Updating Gateways:

image

I can use Windows Update or manual installation.

image

The update launches a UI and quickly finishes.

I was prompted for a reboot.

image

 

Then I will spot check the DLL’s:

image

 

I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

image

***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

 

I can also apply the GW update via Windows Update:

image

 

 

Reporting Server Role Update

image

I kick off the MSP from en elevated command prompt:

image

 

This runs VERY fast and does not provide any feedback on success or failure.

image

 

 

 

 

 

 

2. Apply the SQL Scripts

In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

%SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

(note – your path may vary slightly depending on if you have an upgraded environment or clean install)

image

First – let’s run the script to update the OperationsManagerDW (Data Warehouse) database.  Open a SQL management studio query window, connect it to your Operations Manager DataWarehouse database, and then open the script file (UR_Datawarehouse.sql).  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

If you see a warning about line endings, choose Yes to continue.

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

You will see the following (or similar) output:   “Command(s) completes successfully”

 

 

image

Next – let’s run the script to update the OperationsManager (Operations) database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file (update_rollup_mom_db.sql).  Make sure it is pointing to your OperationsManager database, then execute the script.

You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

 

image

Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  

 

I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

You will see the following (or similar) output: 

image

or

image

 

IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment with lots of activity, you will almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

Technical tidbit:   Even if you previously ran this script in any previous UR deployment, you should run this again in this update, as the script body can change with updated UR’s.

 

 

 

3. Manually import the management packs

image

There are 58 management packs in this update!   Most of these we don’t need – so read carefully.

The path for these is on your management server, after you have installed the “Server” update:

\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups

However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  I will remove all the MP’s for other languages (keeping only ENU), and I am left with the following:

image

 

What NOT to import:

The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service (OMS), (Previously known as Advisor, and Operations Insights).

The APM MP’s are only needed if you are using the APM feature in SCOM.

The Alert Attachment and TFS MP bundle is only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

However, the Image and Visualization libraries deal with Dashboard updates, and these always need to be updated.

I import all of these shown without issue.

 

 

 

4.  Update Agents

image

Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

One the Management servers where I used Windows Update to patch them, their agents did not show up in this list.  Only agents where I manually patched their management server showed up in this list.  FYI.   The experience is NOT the same when using Windows Update vs manual.  If yours don’t show up – you can try running the update for that management server again – manually.

image

If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending.

In this case – my agents that were reporting to a management server that was updated using Windows Update – did NOT place agents into pending.  Only the agents reporting to the management server for which I manually executed the patch worked.

I manually re-ran the server MSP file manually on these management servers, from an elevated command prompt, and they all showed up.

You can approve these – which will result in a success message once complete:

image

 

Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

image

 

 

 

5.  Update Unix/Linux MPs and Agents

image

The current Linux MP’s can be downloaded from:

https://www.microsoft.com/en-us/download/details.aspx?id=29696

7.5.1068.0 is the SCOM 2012 R2 UR12 release version.  

****Note – take GREAT care when downloading – that you select the correct download for SCOM 2012 R2.  You must scroll down in the list and select the MSI for 2012 R2:

image

 

Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

Update any MP’s you are already using.   These are mine for RHEL, SUSE, and the Universal Linux libraries. 

image 

 

You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

Next – you need to restart the “Microsoft Monitoring Agent” service on any management servers which manage Linux systems.  I don’t know why – but my MP’s never drop/update the UNIX/Linux agent files in the \Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKits folder until this service is restarted.

Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

image

 

You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

Finally:

image

 

 

 

6.  Update the remaining deployed consoles

image

This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

You can use Help > About to being up a dialog box to check your console version:

image

 

 

 

Review:

image

Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

 

 

Known issues:

See the existing list of known issues documented in the KB article.

1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop all the SCOM services on the management servers, and/or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

——————————————————
(1 row(s) affected)
(1 row(s) affected)
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Msg 3727, Level 16, State 0, Line 1
Could not drop constraint. See previous errors.
——————————————————–


Removing unwanted product connectors in SCOM 2012 R2 and SCOM 2016

$
0
0

 

image

 

There are certain management packs that will use product connectors to insert discovery data into SCOM, or to manage alerts.  Sometimes, you might find that the vendor did not provide a way to remove these product connectors, when you decide to stop using the MP or solution which created them.

Below is a PowerShell script that will allow you to remove them.  It supports inputting the product connector by name, or by a wildcard match using “*”.  The script will output all the matching connectors and allow you to choose to continue if you want them deleted.

***NEVER remove any product connector unless you are absolutely SURE of its origin and that it is one you want to delete.  It is a good idea to back up your databases before deleting connectors.

 

#=================================================== # # Delete a SCOM product connector in SCOM 2012 R2 and SCOM 2016 # # v1.0 # #=================================================== #=================================================== # Constants section - make changes here $connectorName = "foo.connector*" #This can be a full name or a partial name match with a wildcard (*) - so take great care here $servername="localhost" #=================================================== $mg = Get-SCOMManagementGroup -ComputerName $servername $admin = $mg.GetConnectorFrameworkAdministration() $connectors = $admin.GetMonitoringConnectors() $subs = $admin.GetConnectorSubscriptions() $ToBeDeletedNames = @() $ToBeDeletedList = $admin.GetMonitoringConnectors() | where {$_.Name -like "$connectorname"} FOREACH ($ToBeDeletedConn in $ToBeDeletedList) { [array]$ToBeDeletedNames += "`n" + $ToBeDeletedConn.Displayname } Write-Host "About to delete the following connectors: " $ToBeDeletedNames Write-Host "Press Y to continue, or any other key to stop" $response = Read-Host if ( $response -ne "Y" ) { Write-Host "Cancelling" exit } ########################################################################################## # Delete a connector’s Subscription ########################################################################################## function Delete-Subscription([String] $name) { foreach($testconnector in $connectors) { if($testconnector.Name -match $name) { $connector = Get-SCOMConnector -id $testconnector.id write-host "Found match on connector:" $connector.Displayname foreach($sub in $subs) { if($sub.MonitoringConnectorId -eq $connector.id) { write-host "Deleting subscription:" $sub.DisplayName "with ID" $sub.MonitoringConnectorId $admin.DeleteConnectorSubscription($admin.GetConnectorSubscription($sub.Id)) } } } } } ########################################################################################## # Removes a connector with the specified name. ########################################################################################## function Remove-Connector([String] $name) { $testConnector = $null foreach($connector in $connectors) { IF ($connector.Name -like $name) { $testConnector = Get-SCOMConnector -id $connector.id write-host "Found match on connector:" $connector.Displayname IF ($testConnector -ne $null) { IF ($testConnector.Initialized) { Write-Host "Found connector is initialized, disconnecting all alerts subscribe to the connector" FOREACH ($alert in $testConnector.GetMonitoringAlerts()) { $alert.ConnectorId = $null; $alert.Update("Delete Connector"); } Write-Host "Setting connector to UnInitialized" $testConnector.Uninitialize() } Write-Host "Deleting connector:" $testConnector.DisplayName "with ID" $testConnector.Id $connectorIdForTest = $admin.Cleanup($testConnector) } } } } write-host "Starting Delete-Subscription function" Delete-Subscription $connectorName write-host "Starting Remove-Connector function" Remove-Connector $connectorName

How to test your ACS filter to ensure it is valid

$
0
0

 

ACS (Audit Collection Services) in SCOM uses a WMI filter to reject certain events from being collected and stored in the Audit database.

This filter supports about 4800 characters, so the filters can get very large and very advanced.  It is important to test these before implementing to ensure you are getting a valid filter. 

 

ACS uses WQL queries.  https://msdn.microsoft.com/en-us/library/aa394606(v=vs.85).aspx

 

I recently had a customer trying to exclude a specific EventId from being collected, but ONLY when a specific parameter was present.

In the ACS event queue, the event parameters are mapped to specific “String” ID’s, which don’t match up to the parameter header or number….. so we must match on the specific StringID value.  The easies way to get this is to collect the event, and then search for it in the ACD DB.  In this case, my customer wanted to exclude event ID 4648, but only when String06 = “C:\Windows\System32\svchost.exe”

Seems easy enough?

Here is the first ACS filter we used:   SELECT * FROM AdtsEvent WHERE NOT (EventId=4648 AND String06=’C:\Windows\System32\svchost.exe’)

 

However, it didn’t work.  We still collected the 4648 event, even with this match in String06. 

image

 

 

One thing to always do before implementing a change in your ACS filter – is to TEST the syntax using WBEMTEST:

 

Open WBEMTEST on the ACS Collector

Connect to root\default

Select “Notification Query” button.

Paste in your exact query you want to use.

image

Hit “Apply”

image

 

 

I reached out to Jimmy Harper on this, as he is an ACS guru.

My rookie mistake.  I forgot that the “Backslash” is not allowed in a WMI query.  Backslash is a special character used to escape other special characters.  See:  https://msdn.microsoft.com/en-us/library/aa394054(v=vs.85).aspx

So in this case – if we want to use a backslash character, we need to escape it with another backslash.  Here is my new query:

SELECT * FROM AdtsEvent WHERE NOT (EventId=4648 AND String06=’C:\\Windows\\System32\\svchost.exe’)

I can test this in wbemtest and it works just fine.

UR2 for SCOM 2016 – Step by Step

$
0
0

 

image

 

KB Article for OpsMgr:  https://support.microsoft.com/en-us/help/3209591/update-rollup-2-for-system-center-2016-operations-manager

Download catalog site:  http://www.catalog.update.microsoft.com/Search.aspx?q=3209591

Recommended hotfix page:  https://blogs.technet.microsoft.com/kevinholman/2009/01/27/which-hotfixes-should-i-apply/ 

 

NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2016 and never applied an update rollup – you can go straight to the latest one available. 

 
 
Key fixes:
  • When you use the Unix Process Monitoring Template wizard (adding a new template) to monitor processes on UNIX servers, the monitored data is not inserted into the database because of the following failure:

    Source: Health Service Modules
    Event ID: 10801
    Description: Discovery data couldn’t be inserted to the database. This could have happened because of one of the following reasons:
    – Discovery data is stale. The discovery data is generated by an MP recently deleted.
    – Database connectivity problems or database running out of space.
    – Discovery data received is not valid.

    The following is the underlying exception that causes this issue:

    Exception type: Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistException Message: ManagedTypeId = ccf81b2f-4b92-bbaf-f53e-d42cd9591c1c InnerException: StackTrace (generated): SP IP Function 000000000EE4EF10 00007FF8789773D5 Microsoft_EnterpriseManagement_DataAccessLayer!Microsoft.EnterpriseManagement.DataAccessLayer.TypeSpaceData.IsDerivedFrom(System.Guid, System.Guid)+0x385

  • When a management server is removed from the All Management Servers Resource Pool, the monitoring host process does not update the TypeSpaceCache.
  • When alerts are closed from the Alerts view after you run a Search, the closed Alerts still appear in the View when the Search is cleared.
  • When you press Ctrl+C to copy an alert in Operations Manager Alert view and then press Ctrl+V to paste it to Notepad, the Created time is in UTC time, not local time.
  • Groups disappear from Group view after they are added to a Distributed Application.
  • IM notifications from Operating Manager to Skype fail when an incorrect exception causes NullReferenceException in the SipNotificationTransport.Send method.
  • When the maintenance mode option for the dependency monitor is set to “Ignore,” and the group (consisting of the server to which this dependency monitor is targeted) is put in Maintenance mode, the state of the monitor changes to critical and does not ignore maintenance mode.
  • Because of a rare scenario of incorrect computation of configuration and overrides, some managed entities may go into an unmonitored state. This behavior is accompanied by 1215 events that are written to the Operations Manager log.
  • Recovery tasks on “Computer Not Reachable” Operations Manager Monitor generate failed logons on SCOM Agents that are not part of the same domain as the management groups.
  • The ManagementGroupCollectionAlertsCountRule workflow fails and generates a “Power Shell Script failed to run” alert.
  • Get-SCOMGroup cmdlet fails when thousands of groups are created in Operations Manager.
  • Organizational unit properties for computers that are running Windows are not discovered or populated. This discovery is part of the System Center Internal Library MP. After this update, organizational unit properties will be discovered for all computers that are running Windows.
  • When the Operations Manager Health Service agent starts, and the agent is configured for AD integration, if the agent cannot contact Active Directory at all, it immediately goes dormant and stops trying to connect and obtain the policy from Active Directory.

 

Issues that are fixed in the UNIX and Linux management packs
  • SHA1 is deprecated, and SHA256 certificates are now supported on the management server that’s used to sign the Unix/Linux OMI certificate.
  • OMI does not work on Linux servers configured for FIPS compliance.
  • Avg. Physical disk sec/transfer performance counters are not displayed for Hewlett Packard computers.
  • OMI displays incorrect Memory information on Solaris 10 computers.
  • Network adapter performance is not displayed for SLES 12 x64 platform in the Operations Manager Console.
  • Cannot discover file systems on HPUX 11.31 IA-64 computers with more than 128 disks. Previously it supported only 128 VGs. Now support is extended to 256 VGs.
  • Deep monitoring cannot be started successfully on some JBoss applications because the discovery of the JBoss application server sets the Disk Path for the JBoss server incorrectly. Deep monitoring was not being started in JBoss stand-alone mode when a nondefault configuration was used.
 
 
 


Lets get started.

From reading the KB article – the order of operations is:

  1. Install the update rollup package on the following server infrastructure:
  • Management server or servers
  • Web console server role computers
  • Gateway
  • Operations console role computers
  • Apply SQL scripts.
  • Manually import the management packs.
  • Apply Agent Updates
  • Update Nano Agents
  • Update Unix/Linux MP’s and Agents
  •  

    Perfect!

     

     

     

    1.  Management Servers

    image_thumb3

    Since there is no RMS anymore, it doesn’t matter which management server I start with.  There is no need to begin with whomever holds the “RMSe” role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

    I can apply this update manually via the MSP files, or I can use Windows Update.  I have 2 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

    The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location, and then extract the contents:

     

     

     

    Once I have the MSP files, I am ready to start applying the update to each server by role.

    ***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator role to the SQL database instances that host your OpsMgr databases.

    My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

    image

     

    This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure….  but I did get:

    image

     

    You can check the application log for the MsiInstaller events to show completion:

    Log Name:      Application
    Source:        MsiInstaller
    Event ID:      1036
    Computer:      SCOM1.opsmgr.net
    Description:
    Windows Installer installed an update. Product Name: System Center Operations Manager 2016 Server. Product Version: 7.2.11719.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2016 Operations Manager UR2 Update Patch. Installation success or error status: 0.

     

    You can also spot check a couple DLL files for the file version attribute. 

    image

     

    Next up – run the Web Console update:

    image

     

    This runs much faster.   A quick file spot check:

    image

     

    Lastly – install the console update (make sure your console is closed):

    image

     

    A quick file spot check:

    image

     

    Or help/about in the console:

    image

     

     

     

     

     

    Additional Management Servers:

    image75

    Windows Update contains the UR2 patches for SCOM 2016.   For my second Management Server – I will demonstrate that:

    image

     

    Then:

    image

     

     
     

     

    Updating Gateways:

    image

    Generally I can use Windows Update or manual installation.  I will proceed with manual:

    image

     

    The update launches a UI and quickly finishes.

     

    Then I will spot check the DLL’s:

    image

     

    I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

    image

     

    ***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

     

    I can also apply the GW update via Windows Update:

    image

     

     

     

     

     
     
     
     
    2. Apply the SQL Scripts

    image99

     

    In the path on your management servers, where you installed/extracted the update, there is ONE SQL script file: 

    %SystemDrive%\Program Files\Microsoft System Center 2016\Operations Manager\Server\SQL Script for Update Rollups

    (note – your path may vary slightly depending on if you have an upgraded environment or clean install)

     

    Next – let’s run the script to update the OperationsManager (Operations) database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file (update_rollup_mom_db.sql).  Make sure it is pointing to your OperationsManager database, then execute the script.

    You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

    image

     

    Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  

    I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

    You will see the following (or similar) output: 

    image10

     

    IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment with lots of activity, you will almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

    Technical tidbit:   Even if you previously ran this script in any previous UR deployment, you should run this again in this update, as the script body can change with updated UR’s.

     
     
     
     
    3. Manually import the management packs

    image18

     

    There are 30 management packs in this update!   Most of these we don’t need – so read carefully.

     

    The path for these is on your management server, after you have installed the “Server” update:

    \Program Files\Microsoft System Center 2016\Operations Manager\Server\Management Packs for Update Rollups

    However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  

    This is the initial import list: 

    image

    image

     

    What NOT to import:

    The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service, (Previously known as Advisor, and Operations Insights).

    DON’T import ALL the languages – ONLY ENU, or any other languages you might require.

    The Alert Attachment MP update is only needed if you are already using that MP for very specific other MP’s that depend on it (rare)

    The IntelliTrace Profiling MP requires IIS MP’s and is only used if you want this feature in conjunction with APM.

     

    So I remove what I don’t want or need – and I have this:

    image

     

    These import without issue.

     

     

    4.  Update Agents

    image24

    Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

    image

     

    If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending by design.

    You can approve these – which will result in a success message once complete:

    image30

     
     
     
    5.  Update Unix/Linux MPs and Agents

    image36

     

    The “UR2” Linux MP’s and agents have been updated to align with UR2 for SCOM 2016.  You can get them here:

    https://www.microsoft.com/en-us/download/details.aspx?id=29696

    The current version of these MP’s for SCOM 2016 UR2 is 7.6.1072.0 – and includes agents with version 1.6.2-338

     

    Make sure you download the correct version for your SCOM deployment:

    image

     

    Download, extract, and import ONLY the Linux/UNIX MP’s that are relevant to the OS versions that you want to monitor:

    image

     

    This will take a considerable amount of time to import, and consume a lot of CPU on the management servers and SQL server until complete.

    Once it has completed, you will need to restart the Healthservice (Microsoft Monitoring Agent) on each management server, in order to get them to update their agent files at \Program Files\Microsoft System Center 2016\Operations Manager\Server\AgentManagement\UnixAgents

    You should see the new files dropped with new timestamps:

    image

     

     

    Now you can deploy the agent updates:

    image

     

    image

     

    Next – you decide if you want to input credentials for the SSH connection and upgrade, or if you have existing RunAs accounts that are set up to do the job (Agent Maintenance/SSH Account)

    image

     

    image

     

    image

     

    If you have any issues, make sure your SUDOERS file has the correct information pertaining to agent upgrade:

    https://blogs.technet.microsoft.com/kevinholman/2016/11/11/monitoring-unixlinux-with-opsmgr-2016/

     

     

    6.  Update the remaining deployed consoles

    image

     

    This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

     

    Review:

    image60

    Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

     

    Known Issues:

     

    1.  The OU property on Windows Computers only works if the computer object is in an actual OU and not the built in “Computers” container.  The problem is in a script in “Microsoft.SystemCenter.WindowsComputerPropertyDiscovery” datasource.  This works fine if the domain computer accounts exist in a custom OU.

    2.  The “PatchList” property on the HealthService doesn’t work.  This was re-written in PowerShell for SCOM 2016, and it looks like we still have a bug in the PatchList script.  Until this is resolved, I’d recommend taking a look at https://blogs.technet.microsoft.com/kevinholman/2017/02/26/scom-agent-version-addendum-management-pack/ because I offer a better solution than patchlist, actually changing the discovered HealthService version property, which shows up in the Agent Managed screen.

    3.  When running the update for ACS – it states the update is running for Operations Manager 2012.  The UI code was not updated as ACS is largely unchanged from SCOM 2012. Not a concern.

    4.  If you use Windows Update to deploy the UR2 packages, you might see Windows Update fall into a loop where it re-applies these patches over and over.  I have seen this on a couple systems now and this is under investigation. 

    SCOM Agent Version Addendum Management Pack

    $
    0
    0

     

    image

     

    One of the pain points in SCOM is keeping your agents up to date with your current UR level.

    What makes this worse, is the view in the SCOM Console for “Agent Managed” where you could actually fix agents with a “repair” does NOT show you the UR level of the agent.

    That has changed with this MP.

    This management pack sample will disable the built in discovery for “Microsoft.SystemCenter.DiscoverHealthServiceProperties” which has a display name of “Discover Health Service Properties”

    It will replace it with a new discovery that attempts to get the Agent Version from a file, that gets updated with every agent update.

     

    ***Note:  This discovery requires PowerShell on the agents in order to work.  If you still have old Windows Server 2003 or 2008 agents and have not installed PowerShell, it will simply not return updated version data for those.  This is not a concern for SCOM 2016, since all SCOM 2016 agents require PowerShell as a minimum requirement.  However, for SCOM 2012 R2 it is possible to have agents without PowerShell.  Those will simply retain their previous

    In SCOM 2016, it will change this:

     

    image

     

    to this:

    image

     

    Now, you can sort by this column, and if you find agents that are not up to date, simply multi-select the agents you want to fix, and run a “Repair” against them:

     

    image

     

    If you multi-select, you just need to ensure all agents you are selecting report to the same Management Server, and have “Remotely Manageable” set to Yes.  If you want to change manually installed agents back to Remotely Manageable – see:  https://blogs.technet.microsoft.com/kevinholman/2010/02/20/how-to-get-your-agents-back-to-remotely-manageable-in-opsmgr-2007-r2/

     

    This works in SCOM 2012 as well, with the caveat posted above about PowerShell:

     

    image

     

     

    You can download this addendum MP here:

    https://gallery.technet.microsoft.com/SCOM-Agent-Version-b0bbdfb3

    Recommended registry tweaks for SCOM 2016 management servers

    $
    0
    0

     

    image

    I will start with what people want most – the “list”:

     

    These are the most common changes and settings I recommend to adjust on SCOM management servers. 

    Simply run these from an elevated command prompt on all your management servers.

     

    reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "State Queue Items" /t REG_DWORD /d 20480 /f reg add "HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters" /v "Persistence Checkpoint Depth Maximum" /t REG_DWORD /d 104857600 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPool" /t REG_DWORD /d 1 /f reg add "HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL" /v "DALInitiateClearPoolSeconds" /t REG_DWORD /d 60 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0" /v "GroupCalcPollingIntervalMilliseconds" /t REG_DWORD /d 900000 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Command Timeout Seconds" /t REG_DWORD /d 1800 /f reg add "HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse" /v "Deployment Command Timeout Seconds" /t REG_DWORD /d 86400 /f

     

    I will explain each setting in detail below:

     

    1.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
    REG_DWORD Decimal Value:        State Queue Items = 20480

    SCOM 2016 default existing registry value:   (not present) 

    SCOM 2016 default value in code:   10240

    Description:  This sets the maximum size of healthservice internal state queue.  It should be equal or larger than the number of monitor based workflows running in a healthservice.  Too small of a value, or too many workflows will cause state change loss.  http://blogs.msdn.com/b/rslaten/archive/2008/08/27/event-5206.aspx

     

    2.  HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\
    REG_DWORD Decimal Value:  Persistence Checkpoint Depth Maximum = 104857600

    SCOM 2016 default existing registry value = 20971520

    Description:  Management Servers that host a large amount of agentless objects, which results in the MS running a large number of workflows: (network/URL/Linux/3rd party/VEEAM)  This is an ESE DB setting which controls how often ESE writes to disk.  A larger value will decrease disk IO caused by the SCOM healthservice but increase ESE recovery time in the case of a healthservice crash.

     

    3.  HKLM\SOFTWARE\Microsoft\System Center\2010\Common\DAL\
    REG_DWORD Decimal Value:
      DALInitiateClearPool = 1
      DALInitiateClearPoolSeconds = 60

    SCOM 2016 existing registry value:   not present

    Description:  This is a critical setting on ALL management servers in ANY management group.  This setting configures the SDK service to attempt a reconnection to SQL server upon disconnection, on a regular basis.  Without these settings, an extended SQL outage can cause a management server to never reconnect back to SQL when SQL comes back online after an outage.   Per:  http://support.microsoft.com/kb/2913046/en-us  All management servers in a management group should get the registry change.

     

    4.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\
    REG_DWORD Decimal Value:       GroupCalcPollingIntervalMilliseconds = 900000

    SCOM 2016 existing registry value:  (not present)

    SCOM 2016 default code value:  30000 (30 seconds)

    Description:  This setting will slow down how often group calculation runs to find changes in group memberships.  Group calculation can be very expensive, especially with a large number of groups, large agent count, or complex group membership expressions.  Slowing this down will help keep groupcalc from consuming all the healthservice and database I/O.  900000 is every 15 minutes.

     

    5.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
    REG_DWORD Decimal Value:    Command Timeout Seconds = 1800

    SCOM 2016 existing registry value:  (not preset)

    SCOM 2016 default code value:  600

    Description:  This helps with dataset maintenance as the default timeout of 10 minutes is often too short.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance.  This is a very common issue.   http://blogs.technet.com/b/kevinholman/archive/2010/08/30/the-31552-event-or-why-is-my-data-warehouse-server-consuming-so-much-cpu.aspx  This should be adjusted to however long it takes aggregations or other maintenance to run in your environment.  We need this to complete in less than one hour, so if it takes more than 30 minutes to complete, you really need to investigate why it is so slow, either from too much data or SQL performance issues.

     

    6.  HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Data Warehouse\
    REG_DWORD Decimal Value:    Deployment Command Timeout Seconds = 86400

    SCOM 2016 existing registry value:  (not preset)

    SCOM 2016 default code value:  10800 (3 hours)

    Description:  This helps with deployment of heavy handed scripts that are applied during version upgrades and cumulative updates.  Customers often see blocking on the DW database for creating indexes, and this causes the script not to be able to deployed in the default of 3 hours.  Setting this value to allow for one full day to deploy the script resolves most customer issues.  Setting this to a longer value helps reduce the 31552 events you might see with standard database maintenance after a version upgrade or UR deployment.  This is a very common issue in large environments are very large warehouse databases.

     

     

    Ok, that covers the “standard” stuff.

     

    I will cover one other registry modification that is RARELY needed.  You should ONLY change this one if directed to by Microsoft support.

    WARNING:

    If you make changes to this setting, the same change must be made on ALL management servers, otherwise the resource pools will constantly fail.  All management servers must have identical settings here.  If you add a management server in the future, this setting must be applied immediately if you modified it on other management servers, or you will see your resource pools constantly committing suicide and failing over to other management servers, reinitializing all workflows in a loop.   All the other settings in this article are generally beneficial.  This specific one for PoolManager should receive great scrutiny before changing, due to the risks.  It is NOT included in my reg-add list above for good reason.

     

    HKLM\SYSTEM\CurrentControlSet\services\HealthService\Parameters\PoolManager\
    REG_DWORD Decimal Value:
    PoolLeaseRequestPeriodSeconds = 600
        PoolNetworkLatencySeconds = 120

    SCOM 2016 existing registry value:  not present (must create PoolManager key and both values)  Default code value =  120/30 seconds

    This is VERY RARE to change, and in general I only recommend changing this under advisement from a support case.  The resource pools work quite well on their own, and I have worked with very large environments that did not need these to be modified.  This is more common when you are dealing with a rare condition, such as management group spread across datacenters with high latency links, DR sites, MASSIVE number of workflows running on management servers, etc.

    Management Pack authoring the REALLY fast and easy way, using Silect MP Author and Fragments

    $
    0
    0

     

     

    image             image

    Silect MP Author Professional just added support for Visual studio fragments.  If you didn’t get to attend the webinar on this – here is the recording.

    MP Authoring just got really easy, and really FAST.  Check out the video and see how using MP fragments can take your SCOM environment to a whole new level.

     

     

    Link to recording:  https://youtu.be/E5nnuvPikFw

    Silect MP Author Pro:  http://www.silect.com/mp-author-professional/

    Kevin Holman’s Fragment Library:  https://gallery.technet.microsoft.com/SCOM-Management-Pack-VSAE-2c506737

    Using fragments in Visual Studio (Previous session) recording:  https://youtu.be/9CpUrT983Gc

    Enable proxy as a default setting in SCOM 2016

    $
    0
    0

     

    system_center_operations_manager_replacement_icon_by_flakshack-d5mxgid

     

    The default setting for new SCOM agents is that Agent Proxy is disabled.  You can enable this agent by agent, or for specific agents with script automations.  I find this to be a clumsy task, and more and more management packs require this capability to be enabled, like Active Directory, SharePoint, Exchange, Clustering, Skype, etc.  At some point, it makes a lot more sense to just enable this as a default setting, and that is what I advise my customers.

    Set it, and forget it.  One of the FIRST things I do after installing SCOM.

    (This also works just fine and exactly the same way in SCOM 2012, 2012 SP1, and 2012R2.)

     

    On a SCOM management server:  Open up any PowerShell session (SCOM shell or regular old PowerShell)

    add-pssnapin "Microsoft.EnterpriseManagement.OperationsManager.Client"; new-managementGroupConnection -ConnectionString:localhost; set-location "OperationsManagerMonitoring::"; Set-DefaultSetting -Name HealthService\ProxyingEnabled -Value True

    If you want to use this remotely – change “localhost” above to the FQDN of your SCOM server.

     

    In order to inspect this setting, you can run:

    add-pssnapin "Microsoft.EnterpriseManagement.OperationsManager.Client"; new-managementGroupConnection -ConnectionString:localhost; set-location "OperationsManagerMonitoring::"; Get-DefaultSetting

    Agent Management Pack – Making a SCOM Admin’s life a little easier

    $
    0
    0

     

    This is a little example MP for some things that are possible with SCOM.  It also serves as a good example MP on how to write classes, discoveries, and most importantly many task examples for command line, VBscript, and PowerShell.

    I didn’t write all these – a bunch of ideas came from Jimmy Harper, Matt Taylor, and Tim McFadden and their MP’s.  This was more to combine lots of useful administration in one place.

     

    First – useful discovered properties:

     

    image

     

    image

     

    The “real” agent version

    The UR level of the agent

    Any Management Groups that the agent belongs to.  This is nice to see for old management groups that get left behind.

    A check if PowerShell is installed and what version.  This is important because PowerShell 2.0 is required on all agents if you want to move to SCOM 2016.

    CLR .NET runtime version available to PowerShell

    OS Version and Name

    Primary and Failover management servers.  I am getting this straight from the agents config XML file, sometimes agents might not be configured as you think – this is from the authoritative source…. what’s in that specific agents config.

    Lastly, the default Agent Action account.  Helpful to find any agents where someone installed incorrectly.

     

    Next up – the tasks:

     

    image

     

    One of the problems with tasks, is that they are scoped to a specific class.  Some cool tasks are attached to Windows Computer, some to HealthService, some to specific app classes.  Or – people write tasks and scope to System.Entity.  This places the task in ALL views.  That’s handy, but if everyone did that we’d have an unusable console for tasks.

    Computer Management – duh.

     

    Create Test Event – this task creates event 100 with source TEST in the app event log, and there is a rule in the MP to generate an info alert.  This will let you test end to end agent function, and notifications.

    image

     

    Execute any PowerShell – this task accepts one parameter – “ScriptBody” which allows you to pass any powershell statements and they will execute locally on the agent and return output:

    image

    image

     

    Execute any Service Restart – this will take a servicename as a parameter and restart the service on any agent on demand.  You should NOT use this for the Healthservice – there is a special task for that:

    image

     

    Execute any Software from Share – this task will accept an executable or command line including an e4xecutable, and a share path which contains the software, and it will run it locally on the agent.  This is useful to install missing UR updates, or any other software you want deployed.  This will require that “Domain Computers” have read access to the files on the share.

    image

     

    Export Event Log – this task will export any local event log and save the export to a share.  It will require that the “Domain Computers” have write access to the share.

    image

     

    HealthService – Flush – This task will stop the agent service, delete the health service store, cache, and config, and start the service back up, provoking a complete refresh of the agents config, management packs, and ESE database store.

     

    HealthService – Restart – This is a special task which will reliably bounce the HealthService on agents using an “out of band” script process.  Many scripts to bounce the agent service fail because when the service stops, the script to start it back up is destroyed from memory.

     

    Management Group – ADD and Management Group – REMOVE – these are script based tasks to add or remove a management group from an agent

    Ping – (Console Task) – Duh

    Remote Desktop – (Console Task) – Duh

     

    Do you have other useful agent management tasks that you think should be in a pack like this?  Or discovered properties that are useful as well?  I welcome your feedback.

     

     

    Additionally – I have created two versions of this MP.  One with everything above, and one without the “risky” tasks, like exposing the ability to execute any PowerShell, restart any service, and install any software from a share.  If those are things you don’t ever want exposed in your SCOM environment – import the other MP.  You can control who sees which tasks, but by default operators will see tasks.

     

     

    Download the MP here:  https://gallery.technet.microsoft.com/SCOM-Agent-Management-b96680d5

    How does CPU monitoring work in the Windows Server 2016 management pack?

    $
    0
    0

     

    image

     

    First – let me warn you.  The way SCOM monitors Processor time is *incredibly* complicated.  If you don’t like it – there is *NOTHING* wrong with nuking this from orbit (disable via override) and just create your own very simply consecutive samples monitor.  That said, while complicated and difficult to understand, it is very powerful and useful, and limits “noise”.

     

    Ok, all warnings aside – lets figure out how this works.

     

    In the Windows Server 2016 OS Management Pack, there is a built in monitor which evaluates the Processor load.  This monitor (Total CPU Utilization Percentage or Microsoft.Windows.Server.10.0.OperatingSystem.TotalCPUUtilization) targets the “Windows Server 2016 Operating System” class.

    It runs every 15 minutes, and evaluates after 3 samples.  The samples are not consecutive samples as the product knowledge states – they are AVERAGE samples. 

    Like previous versions of the CPU monitor, this is often misunderstood.  This monitor does not use a native perfmon module, it runs a PowerShell script.  The script evaluates TWO DIFFERENT perfmon counters:

    Processor Information / % Processor Time / _Total  (default threshold 95)

    System / Processor Queue Length (default threshold 15)

     

    BOTH of these above thresholds must be met, before we will create a monitor state change/alert.  This means that even if your server is stuck at 100% CPU utilization, it will not genet an alert most of the time.  Smile 

    The default threshold of “15” is multiplied times the number of logical CPU’s for the server.  So on a typical VM with 4 virtual CPU’s, this means that the value of SYSTEM\Processor Queue Length must be great than (15*4) = 60.  Not only that, but the value must be above 60 for the average of any three consecutive samples.  This is incredibly high.

    What this means, is that it is VERY unlikely this monitor will ever trigger, unless your system is absolutely HAMMERED.  If you like this, great!  If you don’t like this, then you have two options. 

    1)  Re-write your own monitor and make it a very simple consecutive or average samples threshold performance monitor.

    2)  Override the default monitor – but set the “CPU Queue Length” threshold to “zero” as in the picture below:

    image

    This will result in the equation ignoring the CPU queue length requirement, and make the monitor consider “% Processor Time” only.  If you find this is too noisy, you can use the CPU queue length, but use lower value than the default of 15.  Another thing to keep in mind, this is a PowerShell script based monitor, so if you want to run this VERY frequently (the default is every 15 minutes) then consider replacing it with a less impactful native perfmon based monitor.

    The default monitor has a recovery on it – that will output the top consuming processes to health explorer state change context:

    image

    Note – the numbers are not exactly correct – my “ProcessorHog” process was consuming 100% of the CPU…. but this server has 32 cores, so it looks like you need to multiply by the number of cores to understand the ACTUAL utilization consumed by a process.  This is a typical Windows problem in how windows looks at processes, not a SCOM issue.

     

     

     

    Ok – so that covers the basic monitoring of the CPU, from an _Total perspective.

     

    What about monitoring individual *logical processors* like virtual CPU’s or actual cores on physical servers?  Can we do that?

    Yes, yes we can. 

    First – let me start by saying – I DON’T recommend you do this.  In fact, I recommend AGAINST this.  This type of monitoring is INCREDIBLY detailed, and creates a huge instance space in SCOM that will only serve to slow down your environment, console, and increase config and monitoring load.  It should only be leveraged where you have a very specific need to monitor individual logical processing cores for very specific reasons, which should be rare.

    There is a VERY specific scenario where this type of monitoring might be useful…. that is when an individual single threaded process “runs away” on CPU 0, core 0.  This has been seen on Skype servers and will impact server performance.  So if you MUST monitor for this condition, you can consider discovering these individual CPU’s.  I still don’t recommend it and certainly not across the board.

     

    Ok, all warnings aside – lets figure out how this works.

    There is an optional discovery (disabled by default) in the Windows Server 2016 Operating System (Discovery) management pack, to discover individual CPU’s:  “Discover Windows CPUs” (Microsoft.Windows.Server.10.0.CPU.Discovery)  This discovery runs once a day, and calls the Microsoft.Windows.Server.10.0.CPUDiscovery.ModuleType datasource.  This datasource runs a PowerShell script that discovers two object types:

    1.  Microsoft.Windows.Server.10.0.Processor (Windows Server 2016 Processor)

    2.  Microsoft.Windows.Server.10.0.LogicalProcessor (Windows Server 2016 Logical Processor)

    If you enable this discovery – you will discover both types:

     

    Let’s start with “Windows Server 2016 Processor”.  This class discovers actual physical or virtual Processors in sockets, as they are exposed to the OS by physical hardware or the virtualization layer.  See example below:

    Physical server:

    image

    VM guest:

    image

     

    By contrast – the “Windows Server 2016 Logical Processor” class shows instances of physical or virtual “Logical Processors” which will be virtual processors on a VM, and logical CPU’s exposed to the physical layer – either actual cores or hyper-threaded cores:

    image

     

    The former is how all our previous monitoring worked for individual CPU monitoring, which is pretty much worthless.  If we need to monitor cores, we generally don’t care about “sockets”.

    The latter is new for Windows Server 2016 management pack, which actually discovers individual logical CPU’s as seen by the OS.

     

    Now – lets look at the monitoring provided out of the box.

    IF you enable discover the individual CPU discovery, there are three monitors targeting the “Windows Server 2016 Processor” class, one of which is enabled out of the box.  This is “CPU percentage Utilization”  It runs every three minutes, 5 samples, with a threshold of “10”.  It is also a PowerShell script based monitor.

    Comments on above:

    1.  Monitoring for individual “socket” utilization seems really silly to me, and not useful at all.  You probably should not use this.

    2.  The default threshold of “10” is WAY too low…. I have no idea why we would use that. 

    3.  The counter uses “Processor” perfmon object instead of the newer “Processor Information”  The reason this isn’t a simple change, is because the “Performance Monitor Instance Name” class property doesn’t match the newer counters instance value.

    Additionally, there are three rules to collect perfmon data – one of which is enabled.  You should disable this collection rule as well, IF you just HAVE to discover individual CPU’s.

     

    Ok, now lets move on to the Windows Server 2016 Logical Processor.

    This is more useful as it will monitor individual CORE’s (or virtual CPU’s) to look for runaway single threaded processes.

    There are three monitors out of the box targeting this class and NONE of these are enabled by default.

    The one for CPU util, Microsoft.Windows.Server.10.0.LogicalProcessor.CPUUtilization is a native perfmon monitor for consecutive samples.  I like this WAY better than complicated and heavy handed script based monitors.

    HOWEVER – this will potentially be VERY noisy – as a server will have multiple CPU’s, and these will alarm anytime the _Total condition is met.  This means duplication of alerts when a server is heavily utilized.  That said – if only a SINGLE logical processor is spiked, but the overall CPU utilization is low, this will let you know that is happening.

     

     

     

    Bottom line:

    1.  CPU monitoring of the OS level is complex, script based, and uses multiple perf counters before it triggers.  Be aware, and be proactive in managing this.

    2.  The individual CPU’s can be discovered, but I DON’T recommend it as a general rule.

    3.  The default rules and monitors enabled for individual CPU monitoring focuses on SOCKETS, and isn’t very useful, and should be disabled.

    4.  The new Logical Processor class for the Server 2016 MP is more useful as it monitors cores/logical CPU’s, but all monitoring is disabled by default.

    UR13 for SCOM 2012 R2 – Step by Step

    $
    0
    0

    image

    KB Article for OpsMgr:  https://support.microsoft.com/en-us/help/4016125

    Download catalog site:  http://www.catalog.update.microsoft.com/Search.aspx?q=4016125

    Updated UNIX/Linux Management Packs:  https://www.microsoft.com/en-us/download/details.aspx?id=29696

     

     

    NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2012R2 and never applied an update rollup – you can go straight to the latest one available.  If you applied an older one (such as UR3) you can always go straight to the latest one!

     

    Key Fixes:

    • After you install Update Rollup 11 for System Center 2012 R2 Operations Manager, you cannot access the views and dashboards that are created on the My Workspace tab.
    • When the heartbeat failure monitor is triggered, a “Computer Not Reachable” message is displayed even when the computer is not down.
    • The Get-SCOMOverrideResult PowerShell cmdlet does not return the correct list of effective overrides.
    • When there are thousands of groups in a System Center Operations Manager environment, the cmdlet Get-SCOMGroup -DisplayName “group_name fails, and you receive the following message:
      • The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
    • When you run System Center 2012 R2 Operations Manager in an all-French locale (FRA) environment, the Date column in the Custom Event report appears blank.
    • The Enable deep monitoring using HTTP task in the System Center Operations Manager console does not enable WebSphere deep monitoring on Linux systems.
    • When overriding multiple properties on rules that are created by the Azure Management Pack, duplicate override names are created. This issue causes overrides to be lost.
    • When creating a management pack (MP) on a client that contains a Service Level (SLA) dashboard and Service Level Objects (SLO), the localized names of objects are not displayed properly if the client’s CurrentCulture settings do not match the CurrentUICulture settings. In the case where the localized settings are English English, ENG, or Australian English, ENA, there is an issue when the objects are renamed.
    • This update adds support for OpenSSL1.0.x on AIX computers. With this change, System Center Operations Manager uses OpenSSL 1.0.x as the default minimum version supported on AIX,  and OpenSSL 0.9.x is no longer supported.

     

     
     
     
    Lets get started.

     

    From reading the KB article – the order of operations is:

    1. Install the update rollup package on the following server infrastructure:
      • Management servers
      • Audit Collection servers 
      • Gateway servers
      • Web console server role computers
      • Operations console role computers
      • Reporting
    2. Apply SQL scripts.
    3. Manually import the management packs.
    4. Update Agents
    5. Unix/Linux management packs and agent updates.

     

     

    1.  Management Servers

    image

    It doesn’t matter which management server I start with.  There is no need to begin with whomever holds the “RMSe” role.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

    I can apply this update manually via the MSP files, or I can use Windows Update.  I have 2 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

    The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location:

    image

    Then extract the contents:

    image

     

    Once I have the MSP files, I am ready to start applying the update to each server by role.

    ***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator role to the SQL database instances that host your OpsMgr databases.

     

    My first server is a Management Server Role, and the Web Console Role, and has the OpsMgr Console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

    image

     

    This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure. 

    You *MAY* be prompted for a reboot.  You can click “No” and do a single reboot after fully patching all roles on this server.

     

    You can check the application log for the MsiInstaller events to show completion:

    Log Name:      Application
    Source:        MsiInstaller
    Date:          5/25/2017 9:01:13 AM
    Event ID:      1036
    Description:
    Windows Installer installed an update. Product Name: System Center Operations Manager 2012 Server. Product Version: 7.1.10226.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2012 R2 Operations Manager UR13 Update Patch. Installation success or error status: 0.

     

    You can also spot check a couple DLL files for the file version attribute. 

    image

     

    Next up – run the Web Console update:

    image

     

    This runs much faster.   A quick file spot check:

    image

     

    Lastly – install the console update (make sure your console is closed):

    image

     

    A quick file spot check:

    image


     
     
    Additional Management Servers:

    image

    I now move on to my additional management servers, applying the server update, then the console update and web console update where applicable.

    On this next management server, I will use the example of Windows Update as opposed to manually installing the MSP files.  I check online, and make sure that I have configured Windows Update to give me updates for additional products: 

    image

    The applicable updates show up under optional – so I tick the boxes and apply these updates.

    image

     

    After a reboot – go back and verify the update was a success by spot checking some file versions like we did above.


     
     
     
     
    Updating ACS (Audit Collection Services)

    image

    You would only need to update ACS if you had installed this optional role.

    On any Audit Collection Collector servers, you should run the update included:

    image

    image

    A spot check of the files:

    image


     
     
     
    Updating Gateways:

    image

     

    I can use Windows Update or manual installation.

    image

    The update launches a UI and quickly finishes.

    You MAY be prompted for a reboot.

     

    Then I will spot check the DLL’s:

    image

     

    I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

    image

    ***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

     

    I can also apply the GW update via Windows Update:

     

     

     


    Reporting Server Role Update

    image

    I kick off the MSP from an elevated command prompt:

    image

     

    This runs VERY fast and does not provide any feedback on success or failure.

    image


     
    NOTE:  There is an RDL file update available to fix a bug in business hours based reporting.  See the KB article for more details.  You can update this RDL optionally if you use that type of reporting and you feel you are impacted.
     
     
     
    2. Apply the SQL Scripts

    In the path on your management servers, where you installed/extracted the update, there are two SQL script files: 

    %SystemDrive%\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\SQL Script for Update Rollups

    (note – your path may vary slightly depending on if you have an upgraded environment or clean install)

    image

    First – let’s run the script to update the OperationsManagerDW (Data Warehouse) database.  Open a SQL management studio query window, connect it to your Operations Manager DataWarehouse database, and then open the script file (UR_Datawarehouse.sql).  Make sure it is pointing to your OperationsManagerDW database, then execute the script.

    You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

    If you see a warning about line endings, choose Yes to continue.

    image

     

    Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.

    You will see the following (or similar) output:   “Command(s) completes successfully”

     

     

    image

    Next – let’s run the script to update the OperationsManager (Operations) database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file (update_rollup_mom_db.sql).  Make sure it is pointing to your OperationsManager database, then execute the script.

    You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

    image

     

    Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  

    I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

    You will see the following (or similar) output: 

    image

    or

    image

     

    IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment with lots of activity, you will almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

    Technical tidbit:   Even if you previously ran this script in any previous UR deployment, you should run this again in this update, as the script body can change with updated UR’s.


     
     
    3. Manually import the management packs

    image

     

    There are 58 management packs in this update!   Most of these we don’t need – so read carefully.

    The path for these is on your management server, after you have installed the “Server” update:

    \Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\Management Packs for Update Rollups

    However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  I will remove all the MP’s for other languages (keeping only ENU), and I am left with the following:

    image

     

    What NOT to import:

    The Advisor MP’s are only needed if you are connecting your on-prem SCOM environment to Microsoft Operations Management Suite cloud service (OMS), (Previously known as Advisor, and Operations Insights).

    The APM MP’s are only needed if you are using the APM feature in SCOM.

    The Alert Attachment and TFS MP bundle is only used for specific scenarios, such as DevOps scenarios where you have integrated APM with TFS, etc.  If you are not currently using these MP’s, there is no need to import or update them.  I’d skip this MP import unless you already have these MP’s present in your environment.

    However, the Image and Visualization libraries deal with Dashboard updates, and these always need to be updated.

    I import all of these shown without issue.

     

     


    4.  Update Agents

    image

    Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

    One the Management servers where I used Windows Update to patch them, their agents did not show up in this list.  Only agents where I manually patched their management server showed up in this list.  FYI.   The experience is NOT the same when using Windows Update vs manual.  If yours don’t show up – you can try running the update for that management server again – manually.

     

    image

     

    If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending.

    In this case – my agents that were reporting to a management server that was updated using Windows Update – did NOT place agents into pending.  Only the agents reporting to the management server for which I manually executed the patch worked.

    I manually re-ran the server MSP file manually on these management servers, from an elevated command prompt, and they all showed up.

    You can approve these – which will result in a success message once complete:

    image

     

    Soon you should start to see PatchList getting filled in from the Agents By Version view under Operations Manager monitoring folder in the console:

    image


     

    I recommend you consider the following MP which will benefit the Agents by version so you can see the agent version *number* under Agent Managed in Administration:

    https://blogs.technet.microsoft.com/kevinholman/2017/02/26/scom-agent-version-addendum-management-pack/

     
     
     
    5.  Update Unix/Linux MPs and Agents

    image

    The current Linux MP’s can be downloaded from:

    https://www.microsoft.com/en-us/download/details.aspx?id=29696

    7.5.1070.0 is the SCOM 2012 R2 UR12 release version.  

    ****Note – take GREAT care when downloading – that you select the correct download for SCOM 2012 R2.  You must scroll down in the list and select the MSI for 2012 R2:

    image

     

    Download the MSI and run it.  It will extract the MP’s to C:\Program Files (x86)\System Center Management Packs\System Center 2012 R2 Management Packs for Unix and Linux\

    Update any MP’s you are already using.   These are mine for RHEL, SUSE, and the Universal Linux libraries. 

    image

     

    You will likely observe VERY high CPU utilization of your management servers and database server during and immediately following these MP imports.  Give it plenty of time to complete the process of the import and MPB deployments.

    Next – you need to restart the “Microsoft Monitoring Agent” service on any management servers which manage Linux systems.  I don’t know why – but my MP’s never drop/update the UNIX/Linux agent files in the \Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\UnixAgents\DownloadedKits folder until this service is restarted.

     

    Next up – you would upgrade your agents on the Unix/Linux monitored agents.  You can now do this straight from the console:

    image

    You can input credentials or use existing RunAs accounts if those have enough rights to perform this action.

    Finally:

    image


     
     
    6.  Update the remaining deployed consoles

    image

     

    This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

    You can use Help > About to being up a dialog box to check your console version:

    image


     
     
     
    Review:

    image

    Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.


    Known issues:

    See the existing list of known issues documented in the KB article.

    1.  Many people are reporting that the SQL script is failing to complete when executed.  You should attempt to run this multiple times until it completes without error.  You might need to stop the Exchange correlation engine, stop all the SCOM services on the management servers, and/or bounce the SQL server services in order to get a successful completion in a busy management group.  The errors reported appear as below:

    ——————————————————
    (1 row(s) affected)
    (1 row(s) affected)
    Msg 1205, Level 13, State 56, Line 1
    Transaction (Process ID 152) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
    Msg 3727, Level 16, State 0, Line 1
    Could not drop constraint. See previous errors.
    ——————————————————–

    UR3 for SCOM 2016 – Step by Step

    $
    0
    0

     

    image

     

    KB Article for OpsMgr:  https://support.microsoft.com/en-us/help/4016126/update-rollup-3-for-system-center-2016-operations-manager

    Download catalog site:  http://www.catalog.update.microsoft.com/Search.aspx?q=4016126

    Updated UNIX/Linux Management Packs:  https://www.microsoft.com/en-us/download/details.aspx?id=29696

    Recommended hotfix page:  https://blogs.technet.microsoft.com/kevinholman/2009/01/27/which-hotfixes-should-i-apply/

     

    NOTE:  I get this question every time we release an update rollup:   ALL SCOM Update Rollups are CUMULATIVE.  This means you do not need to apply them in order, you can always just apply the latest update.  If you have deployed SCOM 2016 and never applied an update rollup – you can go straight to the latest one available. 

     
     
    Key fixes:
    • The Application Performance Monitoring (APM) feature in System Center 2016 Operations Manager Agent causes a crash for the IIS Application Pool that’s running under the .NET Framework 2.0 runtime. Microsoft Monitoring Agent should be updated on all servers that use .NET 2.0 application pools for APM binaries update to take effect. Restart of the server might be required if APM libraries were being used at the time of the update.
    • Organizational Unit (OU) properties for Active Directory systems are not being discovered or populated.
    • The PatchLevel discovery script was fixed to properly discover patch level.
    • SQL Agent jobs for maintenance schedule use the default database. If the database name is not the default, the job fails.
    • When the heartbeat failure monitor is triggered, a “Computer Not Reachable” message is displayed even when the computer is not down.
    • An execution policy has been added as unrestricted to PowerShell scripts in Inbox management packs.
    • The Microsoft.SystemCenter.Agent.RestartHealthService.HealthServicePerfCounterThreshold recovery task fails to restart the agent, and you receive the following error message:  (LaunchRestartHealthService.ps1 cannot be loaded because the execution of scripts is disabled on this system.)   This issue has been resolved to make the recovery task work whenever the agent is consuming too much resources.
    • The Get-SCOMOverrideResult PowerShell cmdlet doesn’t return the correct list of effective overrides.
    • The Event ID: 26373 event, which happens when there are large amounts of rows returned from an SDK query, has been changed from a “Critical” event to an “Informational” event (because there is nothing you can do about it).
    • When you run System Center 2016 Operations Manager in an all-French locale (FRA) environment, the Date column in the Custom Event report appears blank.
    • The Enable deep monitoring using HTTP task in the System Center Operations Manager console doesn’t enable WebSphere deep monitoring on Linux systems.
    • When overriding multiple properties on rules that are created by the Azure Management Pack, duplicate override names are created. This causes overrides to be lost.
    • When creating a management pack (MP) on a client that contains a Service Level (SLA) dashboard and Service Level Objects (SLO), the localized names of objects aren’t displayed properly if the client’s CurrentCulture settings don’t match the CurrentUICulture settings. In cases where the localized settings are English English, ENG, or Australian English, ENA, there’s an issue when the objects are renamed.
    • The UseMIAPI registry subkey prevents collection of processor performance data for RedHat Linux system. Also, custom performance collection rules are also impacted by the UseMIAPI setting.
    • This update adds support for OpenSSL1.0.x on AIX computers. With this change, System Center Operations Manager uses OpenSSL 1.0.x as the default minimum version supported on AIX, and OpenSSL 0.9.x is no longer supported.

     

     


      Lets get started.

       

      From reading the KB article – the order of operations is:

      1. Install the update rollup package on the following server infrastructure:
        • Management server or servers
        • Web console server role computers
        • Gateway
        • Operations console role computers
      2. Apply SQL scripts.
      3. Manually import the management packs.
      4. Apply Agent Updates
      5. Update Nano Agents
      6. Update Unix/Linux MP’s and Agents

       

       

      1.  Management Servers

      image_thumb3

      It doesn’t matter which management server I start with.  I simply make sure I only patch one management server at a time to allow for agent failover without overloading any single management server.

      I can apply this update manually via the MSP files, or I can use Windows Update.  I have 2 management servers, so I will demonstrate both.  I will do the first management server manually.  This management server holds 3 roles, and each must be patched:  Management Server, Web Console, and Console.

      The first thing I do when I download the updates from the catalog, is copy the cab files for my language to a single location, and then extract the contents:

      image

       

       

      Once I have the MSP files, I am ready to start applying the update to each server by role.

      ***Note:  You MUST log on to each server role as a Local Administrator, SCOM Admin, AND your account must also have System Administrator role to the SQL database instances that host your OpsMgr databases.

       

      My first server is a management server, and the web console, and has the OpsMgr console installed, so I copy those update files locally, and execute them per the KB, from an elevated command prompt:

      image

       

      This launches a quick UI which applies the update.  It will bounce the SCOM services as well.  The update usually does not provide any feedback that it had success or failure….  but I did get a reboot prompt.  You can choose “No” and then reboot after applying all the SCOM role updates.

      image

       

      You can check the application log for the MsiInstaller events to show completion:

      Log Name:      Application
      Source:        MsiInstaller
      Event ID:      1036
      Computer:      SCOM1.opsmgr.net
      Description:  Windows Installer installed an update. Product Name: System Center Operations Manager 2016 Server. Product Version: 7.2.11719.0. Product Language: 1033. Manufacturer: Microsoft Corporation. Update Name: System Center 2016 Operations Manager Update Rollup 3 Patch. Installation success or error status: 0.

       

      You can also spot check a couple DLL files for the file version attribute. 

      image

       

      Next up – run the Web Console update:

      image

       

      This runs much faster.   A quick file spot check:

      image

       

      Lastly – install the console update (make sure your console is closed):

      image

      A quick file spot check:

      image

       

      Or help/about in the console:

      image

       

       

      Additional Management Servers:

      image75

      Windows Update contains the UR3 patches for SCOM 2016.   For my second Management Server – I will demonstrate that:

      image

       

       

      Updating Gateways:

      image

      Generally I can use Windows Update or manual installation.  I will proceed with manual:

      image

       

      The update launches a UI and quickly finishes.

      Then I will spot check the DLL’s:

      image

       

      I can also spot-check the \AgentManagement folder, and make sure my agent update files are dropped here correctly:

      image

      ***NOTE:  You can delete any older UR update files from the \AgentManagement directories.  The UR’s do not clean these up and they provide no purpose for being present any longer.

       

      I could also apply the GW update via Windows Update:

      image

       
       
       
       
       
      2. Apply the SQL Scripts

      image99

      In the path on your management servers, where you installed/extracted the update, there is ONE SQL script file: 

      %SystemDrive%\Program Files\Microsoft System Center 2016\Operations Manager\Server\SQL Script for Update Rollups

      (note – your path may vary slightly depending on if you have an upgraded environment or clean install)

      Next – let’s run the script to update the OperationsManager (Operations) database.  Open a SQL management studio query window, connect it to your Operations Manager database, and then open the script file (update_rollup_mom_db.sql).  Make sure it is pointing to your OperationsManager database, then execute the script.

      You should run this script with each UR, even if you ran this on a previous UR.  The script body can change so as a best practice always re-run this.

      image

       

      Click the “Execute” button in SQL mgmt. studio.  The execution could take a considerable amount of time and you might see a spike in processor utilization on your SQL database server during this operation.  

      I have had customers state this takes from a few minutes to as long as an hour. In MOST cases – you will need to shut down the SDK, Config, and Monitoring Agent (healthservice) on ALL your management servers in order for this to be able to run with success.

      You will see the following (or similar) output: 

      image10

       

      IF YOU GET AN ERROR – STOP!  Do not continue.  Try re-running the script several times until it completes without errors.  In a production environment with lots of activity, you will almost certainly have to shut down the services (sdk, config, and healthservice) on your management servers, to break their connection to the databases, to get a successful run.

      Technical tidbit:   Even if you previously ran this script in any previous UR deployment, you should run this again in this update, as the script body can change with updated UR’s.

       

       

       
      3. Manually import the management packs

      image18

      There are 33 management packs in this update!   Most of these we don’t need – so read carefully.

      The path for these is on your management server, after you have installed the “Server” update:

      \Program Files\Microsoft System Center 2016\Operations Manager\Server\Management Packs for Update Rollups

      However, the majority of them are Advisor/OMS, and language specific.  Only import the ones you need, and that are correct for your language.  

      This is the initial import list: 

      image

      image

       

      What NOT to import:

      The Advisor MP’s are only needed if you are using Microsoft Operations Management Suite cloud service, (Previously known as Advisor, and Operations Insights).

      DON’T import ALL the languages – ONLY ENU, or any other languages you might require.

      The Alert Attachment MP update is only needed if you are already using that MP for very specific other MP’s that depend on it (rare)

      The IntelliTrace Profiling MP requires IIS MP’s and is only used if you want this feature in conjunction with APM.

       

      So I remove what I don’t want or need – and I have this:

      image

       

      These import without issue.  If the “Install” button is greyed out – this means you have an MP in your import list that is already imported and not updated.  The “Microsoft System Center Advisor Resources (ENU)” MP was causing this for me – since it hasn’t been updated, I simply remove it from the list so I can install.

       

       
       
      4.  Update Agents

      image24

      Agents should be placed into pending actions by this update for any agent that was not manually installed (remotely manageable = yes):  

      image

       

      If your agents are not placed into pending management – this is generally caused by not running the update from an elevated command prompt, or having manually installed agents which will not be placed into pending by design, OR if you use Windows Update to apply the update rollup for the Server role patch.

      You can approve these – which will result in a success message once complete:

      image

       

      You can verify the PatchLevel by going into the console and opening the view at:  Monitoring > Operations Manager > Agent Details > Agents by Version

      image

       

      I also recommend you take a look at this community MP, which helps see the “REAL” agent number in the “Agent Managed” view console:

      https://blogs.technet.microsoft.com/kevinholman/2017/02/26/scom-agent-version-addendum-management-pack/

       

       
      5.  Update UNIX/Linux MPs and Agents

      image36

       

      The UNIX/Linux MP’s and agents have been updated to align with UR3 for SCOM 2016.  You can get them here:

      https://www.microsoft.com/en-us/download/details.aspx?id=29696

      The current version of these MP’s for SCOM 2016 UR2 is 7.6.1076.0 – and includes agents with version 1.6.2-339

       

      Make sure you download the correct version for your SCOM deployment:

      image

       

      Download, extract, and import ONLY the updated Linux/UNIX MP’s that are relevant to the OS versions that you want to monitor:

      image

       

      This will take a considerable amount of time to import, and consume a lot of CPU on the management servers and SQL server until complete.

      Once it has completed, you will need to restart the Healthservice (Microsoft Monitoring Agent) on each management server, in order to get them to update their agent files at \Program Files\Microsoft System Center 2016\Operations Manager\Server\AgentManagement\UnixAgents

       

      You should see the new files dropped with new timestamps:

      image

       

      Now you can deploy the agent updates:

      image

      image

       

      Next – you decide if you want to input credentials for the SSH connection and upgrade, or if you have existing RunAs accounts that are set up to do the job (Agent Maintenance/SSH Account)

      image

      image

       

      If you have any issues, make sure your SUDOERS file has the correct information pertaining to agent upgrade:

      https://blogs.technet.microsoft.com/kevinholman/2016/11/11/monitoring-unixlinux-with-opsmgr-2016/

       
       
       
       
      6.  Update the remaining deployed consoles

      image

      This is an important step.  I have consoles deployed around my infrastructure – on my Orchestrator server, SCVMM server, on my personal workstation, on all the other SCOM admins on my team, on a Terminal Server we use as a tools machine, etc.  These should all get the matching update version.

       

       

       

       

      Review:

      image60

      Now at this point, we would check the OpsMgr event logs on our management servers, check for any new or strange alerts coming in, and ensure that there are no issues after the update.

       

      Known Issues:

      None!

      Installing SQL 2016 Always On with Windows Server 2016 Core

      $
      0
      0

      imageimage

       

      This will be a simple walk through of installing two Windows Server 2016 Core servers, then installing SQL 2016, and setting up SQL Always On replication between them.  This is meant for lab testing and getting familiar with the scenario.  This setup is incredibly simple and straightforward, and fast.  You can have this scenario up and running in just a few minutes.

       

      First, deploy two VM’s.  Nothing fancy (2GB RAM, 2 vCPU’s, 1 disk) is fine for a lab deployment.

      I will name mine:  SQLCORE1 and SQLCORE2.

      Install Windows Server 2016, and choose the default option of Windows Server Core (no GUI):

      image

       

      When the install is complete, log in by creating a password.  You are now ready to begin configuration.

      From the command line, run PowerShell.

      We will configure static IP’s and DNS on each server.  Change these to match your lab:

      New-NetIPAddress -InterfaceAlias "Ethernet" -IPAddress 10.10.10.60 -PrefixLength 24 -DefaultGateway 10.10.10.1 Set-DnsClientServerAddress -InterfaceAlias "Ethernet" -ServerAddresses 10.10.10.10,10.10.10.11

       

      Next – we will join the domain and rename the computer when prompted.  Type “sconfig” and press enter.

      image

       

      From the menu – choose “1”.  Choose Domain, and provide your domain and domain credentials to be able to join.

      When prompted, choose “Yes” to change the computer name.  Provide the new computername you want for your SQL core servers.  Mine are SQLCORE1 and SQLCORE2.

      Reboot when prompted.

      You must log in as the local administrator after the reboot.  Then, type “logoff” and hit enter.  Now you can log in as your domain admin account in the domain.  Hit ESC to get back to “other user” and log in as a domain account.

      Add the domain group for your SQL admin’s to the local administrators group at the command prompt:

      net localgroup administrators /add OPSMGR\SQLAdmins

       

      At this point you can log in as one of your SQL Administrator accounts, or continue the installation as your domain admin account.

      Map a drive to your SQL 2016 installation media:

      Net use Y: \\server\software\sql\2016\ENT

       

      Install SQL server from the command line.  There are two ways to install SQL.  From a command line with options, or from an INI file.  The INI file is much more powerful, but to keep things simple we will use a command line here.  This basic install will cover the SQL database engine, the Full-text service, and set the SQL agent service to run as automatic startup.  You will need to change your domain group for the SQL admins, and your SQL service account and password.

      Setup.exe /qs /ACTION=Install /FEATURES=SQLEngine,FullText /INSTANCENAME=MSSQLSERVER /SQLSVCACCOUNT="OPSMGR\sqlsvc" /SQLSVCPASSWORD="password" /SQLSYSADMINACCOUNTS="OPSMGR\sqladmins" /AGTSVCACCOUNT="OPSMGR\sqlsvc" /AGTSVCPASSWORD="password" /AGTSVCSTARTUPTYPE=Automatic /TCPENABLED=1 /IACCEPTSQLSERVERLICENSETERMS

       

      The SQL setup will begin and you will see some UI’s pop up along with progress in the command line window….  when complete you will be returned to a command prompt.

      Now that SQL is installed – reboot each server.

      Log back in with a domain account to continue setup and configuration.

      Next, we will configure the firewall.  We will open the necessary ports for SQL and Always On, and then enable the built in group rules for remote administration.

      Run PowerShell.

      Copy and paste the following to configure the firewall:

      New-NetFirewallRule -Group "Custom SQL" -DisplayName "SQL Default Instance" -Direction Inbound –Protocol TCP –LocalPort 1433 -Action allow New-NetFirewallRule -Group "Custom SQL" -DisplayName "SQL Admin Connection" -Direction Inbound –Protocol TCP –LocalPort 1434 -Action allow New-NetFirewallRule -Group "Custom SQL" -DisplayName "SQL Always On VNN" -Direction Inbound -Protocol TCP -LocalPort 1764 -Action allow New-NetFirewallRule -Group "Custom SQL" -DisplayName "SQL Always On AG Endpoint" -Direction Inbound -Protocol TCP -LocalPort 5022 -Action allow Enable-NetFirewallRule -DisplayGroup "Remote Desktop" Enable-NetFirewallRule -DisplayGroup "Remote Event Log Management" Enable-NetFirewallRule -DisplayGroup "Remote Service Management" Enable-NetFirewallRule -DisplayGroup "File and Printer Sharing" Enable-NetFirewallRule -DisplayGroup "Performance Logs and Alerts" Enable-NetFirewallRule -DisplayGroup "Remote Volume Management" Enable-NetFirewallRule -DisplayGroup "Windows Firewall Remote Management"

       

      Next, we will install the Windows Failover Cluster feature, a prerequisite for SQL Always On.

      Install-WindowsFeature Failover-Clustering –IncludeManagementTools

      image

       

      Next – we will create a cluster.  You can create a simple failover cluster between two nodes in a single line of PowerShell!  You will need to change your cluster name, IP address, and node names to match your configuration.  Only run this on ONE NODE! 

      (This step assumes you are running this as a domain admin, as this will create a computer account in the domain for the virtual cluster computer.  If you do not wish to run this as a domain admin, you must pre-stage that account and assign permissions.  See cluster documentation for this)

      New-Cluster –Name SQLCORECL1 –StaticAddress 10.10.10.62 –Node SQLCORE1,SQLCORE2

      You might see a warning at this point.  That’s fine – they are likely just because we have a single NIC in each VM, and because we didn’t configure a witness.

      image

       

      Next, we need to enable each server to support SQL Always On.  You will need to provide your SERVERNAME\INSTANCENAME.  If you use the default instance like we did above, input just the servername.  Do this on each node, but change the servername to match the correct node name you are running it on.

      $ServerInstance = 'SQLCORE1' #this should be in format SERVERNAME\INSTANCENAME or just use servername for default instance Enable-SqlAlwaysOn -ServerInstance $ServerInstance -Force

       

      Lastly – we need to configure the Always On availability group.  This is easiest done manually, via SQL management studio from a remote tools machine.

      Launch SQL management studio and connect to the SQLCORE1 server:

      image

       

      First, we need to create a “dummy” database ONLY on SQLCORE1 which is required to configure and test Always On.  Go to Databases, right click, and choose New Database.  Name the Database “TESTDB” and click OK.

      image

       

      Before we can use a database in Always On, it must have at least one previous backup.  Right click TESTDB, tasks, Back Up.  Hit OK to accept defaults, and OK when backup is done.

      Now expand “Always On High Availability”, and Right Click “Availability Groups” and choose “New Availability Group Wizard”

      image

       

      Assign an AG name.  This isn’t terribly important.  I will use “SQLCOREAG1” and click Next.

      image

       

      Select your TestDB and click Next.

      image

       

      Add a replica, and choose your other server, SQLCORE2.  Check the boxes next to Auto failover and Synchronous commit on both servers.

      image

       

      On the Listener tab, create an Availability group listener.  I will use “SQLCOREAGL1”  We will use port 1764 (which we set up a previous Firewall rule for).  You will need to scroll down to the bottom right and click “Add” to add in an IP address, and click Next.

      (This step assumes you are running this as a domain admin, as this will create a computer account in the domain for the virtual availability group listener.  If you do not wish to run this as a domain admin, you must pre-stage that account and assign permissions.  See SQL Always On documentation for this)

      image

       

      Next, you will choose FULL synchronization, and provide a network share where the servers have read and write access to.

      image

       

      This will run the tests:

      image

       

      Click “Finish” and you should have success!

      image

       

      Go into SQL Management studio and look over your configuration:

      image

      Stop Healthservice restarts in SCOM 2016

      $
      0
      0

       

      image

       

      This is probably the single biggest issue I find in 100% of customer environments.

      YOU ARE IMPACTED.  Trust me.

       

      SCOM monitors itself to ensure we aren’t using too much memory, or too many handles for the SCOM processes.  If we detect that the SCOM agent is using an unexpected amount of memory or handles, we will forcibly KILL the agent, and restart it.

      That sounds good right?

      In theory, yes.  In reality, however, this is KILLING your SCOM environment, and you probably aren’t aware it is even it is happening.

       

      The problem?

      1.  The default thresholds are WAY out of touch with reality.  They were set almost 10 years ago, when systems used a LOT less resources than modern operating systems today.  This is MUCH worse if you choose to MULTIHOME.  Multi-homed agents can use twice as many resources as non-multi-homed agents, and this restart can be issued from EITHER management group, but will affect BOTH.

      2.  We don’t generate an alert when this happens, so you are blind that this is impacting you.

       

      We need to change these in the product.  Until we do, a simple override is the solution.

       

      Why is this so bad?

      This is bad because of two impacts:

      1.  You are hurting your monitored systems by restarting them over and over, causing the startup scripts to run on loops and actually consuming additional resources.  You are actually going periods of time without any monitoring because of this as well, because when the agent is killed and restarting, there is a period of time where the monitoring is unloaded.

      2.  You are filling SCOM with state change events.  Every time all the monitors initialize, they send an updated “new” statechange event unpon initialization.  You are hammering SCOM with useless state data.

       

      What can I do about it?

      Well, I am glad you asked!  We simply need to override 4 monitors, to give them realistic agent thresholds, and set them to generate an informational alert.  I will also include a view for these alerts so we can see if anyone is still generating them.  I will wrap all this in a sample management pack for you to download.

       

      In the console, go to Authoring, Monitors, and change scope to “Agent”

      image

       

      We will override each one:

      Private bytes monitors should be set to a default threshold of 943718400 (triple the default of 300MB)

      Handle Count monitors should be set to 30000  (the default of 6000 is WAY low)

      Override Generate Alert to True (to generate alerts)

      Override Auto-Resolve to False (even though default is false, this must be set, to keep from auto-closing these so you can see them and their repeat count)

      Override Alert severity to Information (to keep from ticketing on these events)

       

       

      Override EACH monitor, “all objects of class” and choose “Agent” class.

      image

       

      NOTE: It is CRITICAL that we choose the “Agent” class for our overrides, because we do not want to impact thresholds already set on Management Servers or Gateways.

       

      This is a good configuration:

      image

      image

      image

      image

       

      Ok – those are much more reasonable defaults.

       

      What else should I do?

      Create an alert view that shows alerts with name “Microsoft.SystemCenter.Agent.%”

      This will show you if you STILL have some agents restarting on a regular basis.  You should review the ones with high repeat counts on a weekly basis, and adjust their agent specific thresholds, or investigate why they are consuming so much, so often.  An occasional agent restart (one or less per day) is totally fine and probably not worth the time to investigate.

       

      image

       

      I am including a management pack with these overrides, and the alert view, and you can download it below if you prefer to to make your own.

       

      Download:

      https://gallery.technet.microsoft.com/SCOM-Agent-Threshold-b96c4d6a

      Don’t forget to License your SCOM 2016 deployments

      $
      0
      0

       

      image

       

      Just like previous versions of Operations Manager, all SCOM deployments are installed as “Evaluation Version” which is a 180 trial.  You DON’T want to forget about this and have your production and lab deployments time-bomb on you down the road.

      To see your current license, in PowerShell on a SCOM server:

      Get-SCOMManagementGroup | ft skuforlicense, timeofexpiration -a

      image

       

      In order to set your license – you just need to run the Set-SCOMLicense cmdlet.  This is documented here:

      https://docs.microsoft.com/en-us/powershell/systemcenter/systemcenter2016/operationsmanager/vlatest/set-scomlicense

       

      Two things:

      1.  You need to get your license key, from whomever keeps that information for your company.

      2.  You MUST run this cmdlet in a PowerShell sessions launched “As an administrator” as this will need access to write to the registry.

       

      Run this command ONE time on ANY management server…..

      Set-SCOMLicense -ProductId ‘99999-99999-99999-99999-99999’

      …… where you change the example key above to your key.

      You should restart the PowerShell session, then run the command to get the license again.

      image

      (Note:  You might have to restart you management server services or reboot the management server before you see this take effect)


      SCOM 2012 and 2016 Unsealed MP Backup

      $
      0
      0

       

      image

      This is a management pack that I use in every customer environment.  You *NEED* to backup your unsealed MP’s.  This will allow you to quickly recover from a mistake, without having to restore your databases from a backup.  Over the years, I have seen many customers accidentally delete workflows, mess up their RunAs accounts, break AD integration, or break their notifications.  All of these things are stored in unsealed MP’s.  We really need to back them up, with a daily history.  The amount of space needed is very small.

       

      This is an updated version of the community MP from SystemCenterCentral.com written by Neale Brown, Derek Harkin, Pete Zerger and Tommy Gunn, located at:  http://www.systemcentercentral.com/pack-catalog/backup-unsealed-management-packs-opsmgr-2012-edition/

       

      It contains a single rule which now targets the “All Management Servers Resource Pool” which will give this workflow high availability.

      image

       

      The rule runs once per day (24 hours) and executes a PowerShell script.

      You can edit the Write Action configuration for the number of days, and the share location, or local directory:

      image

       

      This will create these directories if they do not exist, either a local path on the management server, or on a share you provide as above.

       

      image

       

      It will log events to the SCOM event log for tracking:

       

      image

      image

       

      This script will run on one of your SCOM management servers, and will execute as the SCOM Management Server Action Account by default.  If you want to specify a specific account, there is a RunAs profile included.  You will need to use an account that has SCOM admins rights to the SDK, and read/write access to the directory or share that you choose.

      image

       

       

      Changes made:

      • Supports multiple management groups exporting to the same share path
      • Add start and completion logging with runtime and whoami.
      • Make SCOM management group SDK connection more reliable and with debug logging
      • Changed the rule target to the AMSRP from RMSe for high availability and future compatibility
      • Minor renames of MP ID, script, workflows, modules.  Cleaned up displaystrings.

       

      You can download the MP here:

      https://gallery.technet.microsoft.com/SCOM-2012-and-2016-2ccc45c0

      Document your SCOM RunAs Account and Profiles script

      $
      0
      0

       

      This script will document your SCOM RunAs accounts, and any profiles they are associated to.  It will output as a CSV file. 

      This is handy for collecting data for change management, making sure multiple management groups have the same configuration, and ensuring you have documented accounts prior to a major upgrade.

      The script is based on the work if Dirk Brinkmann's 2012 script - located here:  https://gallery.technet.microsoft.com/Listing-SCOM-2012-R2-24be56b1

       

      Here is a sample of the output:

       

      image

       

      Download here:   

      https://gallery.technet.microsoft.com/Document-SCOM-RunAs-cb64d461

      What SQL maintenance should I perform on my SCOM 2016 databases?

      $
      0
      0

       

      image

       

      ***Note – The products and recommendations have changed over the years, so what applied to previous versions does not really apply today.  Make sure you read the entire article!

       

      The SQL instances and databases deployed to support SCOM, generally fall into one of two categories: 

      1.  The SQL server is managed by a DBA team within the company, and that teams standard will be applied.

      2.  The SCOM team fully owns and supports the SQL servers.

       

      Most SQL DBA's will set up some pretty basic default maintenance on all SQL DB's they support.  This often includes, but is not limited to:

      • CHECKDB  (to look for DB errors and report on them)
      • UPDATE STATISTICS  (to boost query performance)
      • REINDEX  (to rebuild the table indexes to boost performance)
      • BACKUP

      SQL DBA's might schedule these to run via the SQL Agent to execute nightly, weekly, or some combination of the above depending on DB size and requirements.

       

      On the other side of the coin.... in some companies, the SCOM team installs and owns the SQL server.... and they don't do ANY default maintenance to SQL.  Because of this all too common scenario - a focus in SCOM was to have the Ops DB and Datawarehouse DB to be somewhat self-maintaining.... providing a good level of SQL performance whether or not any default maintenance was being done.

       

      Operational Database:

      Daily jobs that run for the OpsDB:

      • 12:00 AM – Partitioning and Grooming
      • 2:00 AM – Discovery Data Grooming
      • 2:30 AM – Optimize Indexes
      • 4:00 AM – Alert auto-resolution

       

      Reindexing is already taking place against the OperationsManager database for some of the tables (but not all, and this is important to understand!).  This is built into the product.  What we need to ensure - is that any default DBA maintenance tasks are not conflicting with our built-in maintenance, and our built-in schedules:

      There is a rule in OpsMgr that is targeted at the All Management Servers Resource Pool:

      The rule executes the "p_OptimizeIndexes" stored procedure, every day at 2:30AM.

      This rule cannot be changed or modified.  Therefore - we need to ensure there is not other SQL maintenance (including backups) running at 2:30AM, or performance could be impacted.

      If you want to view the built-in UPDATE STATISTICS and REINDEX jobs history - just run the following queries:

      SELECT TableName,
      OptimizationStartDateTime,
      OptimizationDurationSeconds,
      BeforeAvgFragmentationInPercent,
      AfterAvgFragmentationInPercent,
      OptimizationMethod
      FROM DomainTable dt
      inner join DomainTableIndexOptimizationHistory dti
      on dt.domaintablerowID = dti.domaintableindexrowID
      ORDER BY OptimizationStartDateTime DESC

      SELECT TableName,
      StatisticName,
      UpdateStartDateTime,
      UpdateDurationSeconds
      FROM DomainTable dt
      inner join DomainTableStatisticsUpdateHistory dti
      on dt.domaintablerowID = dti.domaintablerowID
      ORDER BY UpdateStartDateTime DESC

      Take note of the update/optimization duration seconds column.  This will show you how long your maintenance is typically running.  In a healthy environment these should not take very long.

      In general - we would like the "Scan density" to be high (Above 80%), and the "Logical Scan Fragmentation" to be low (below 30%).  What you might find... is that *some* of the tables are more fragmented than others, because our built-in maintenance does not reindex all tables.  Especially tables like the raw perf, event, and localizedtext tables.

       

      This brings us to the new perspectives in SCOM 2016, especially when used with SQL 2016.

       

      In SQL 2016, some changes were made to optimize performance, especially when using new storage subsystems that leverage new disks like SSD.  The net effect of these changes, on SCOM databases, is that they will consume much more space in the database, than when using SQL 2014 and previous.  The reason for this is deeply technical, and I will cover this later.  But what you need to understand as a SCOM owner, is that the sizing guidance will not match up with previous versions of SQL, compared to SQL 2016.  This isn't a bad thing, you just need to make some minor changes to counteract this.

      SCOM inserts performance and event data into the SCOM database via something called BULK INSERT.  When we bulk insert the data, SCOM is designed to use a fairly small batch size by default.  In SQL 2016, this creates lots of unused reserved space in the database, that does not get reused.  If you review a large table query – you will observe this as “unused” space.

      image

      Note in the above graphic – the unused space is over 5 TIMES the space used by actual data!

      image

      If you want to read more about this – my colleague Dirk Brinkmann worked on discovering the root cause of this issue, and has a great deep dive on this:

      https://blogs.technet.microsoft.com/germanageability/2017/07/07/possible-increased-unused-disk-space-when-running-scom-2016-on-sql2016/

      The SQL server team also recently added a blog post describing the issue in depth:

      https://blogs.msdn.microsoft.com/sql_server_team/sql-server-2016-minimal-logging-and-impact-of-the-batchsize-in-bulk-load-operations/

       

       

      Do not despair.  In order to clean up the unused space, a simple Index Rebuild or at a minimum Index Reorganize for each table is all that is needed.  HOWEVER – these perf tables are NOT indexed by default!  This was likely done when SCOM was designed, because these are not static tables, they contain transient data in the OpsDB, that is only held for a short amount of time.  The long term data is moved into the Data Warehouse DB, where it is aggregated into hourly and daily tables – and those are indexed via built in maintenance. 

      To resolve this, and likely improve performance of SCOM – I recommend that each SCOM customer set up SQL agent jobs, that handles Index maintenance for the entire OpsDB, once a day.  I’d say given the other schedules, a start time between 3:00 AM and 6:00 AM would likely be a good time for this maintenance.  That lets the built in maintenance run, and wont conflict with too much.  You should try and avoid having anything running at 4:00 AM because of the Alert Auto Resolution.  We don’t want any blocking going on for that activity.

      There are other performance benefits to reindexing the entire database, as many new visualization tables have been added over time, and these don’t get his by our built in maintenance.

       

      A great set of maintenance TSQL scripts for Agent Jobs plan can be found at https://ola.hallengren.com/

      Specifically the index maintenance plan at https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

      This is a well thought out maintenance plan, that analyzes the tables, and chooses to Reindex or Reorganize based on fragmentation thresholds, skipping tables if not needed at all.  The first time you index the entire DB, it may take a long time.  Once you set this up to run daily, it will only be optimizing the daily perf and event tables for the most part, which will be a single table containing one days worth.

      After a reindex – I have freed up a TON of space.  Here is the same DB:

      image

      Notice the huge decrease in “unused space”.  Additionally, the total space reserved in my perf tables is now consuming less than one fifth the amount of space in the database it was consuming previously.  This leaves you with a smaller footprint, and better performance.  I strongly recommend you set this up or check with your DBA’s to ensure it is happening.

       

       

       

      Data Warehouse Database:

      The data warehouse DB is also self-maintaining.  This is called out by a rule "Standard Data Warehouse Data Set maintenance rule" which is targeted to the "Standard Data Set" object type.  This stored procedure is called on the data warehouse every 60 seconds .  It performs many, many tasks, of which Index optimization is but one.

      image

      This SP calls the StandardDatasetOptimize stored procedure, which handles any index operations.

      To examine the index and statistics history - run the following query for the Alert, Event, Perf, and State tables:

      select basetablename,
      optimizationstartdatetime,
      optimizationdurationseconds,
      beforeavgfragmentationinpercent,
      afteravgfragmentationinpercent,
      optimizationmethod,
      onlinerebuildlastperformeddatetime
      from StandardDatasetOptimizationHistory sdoh
      inner join StandardDatasetAggregationStorageIndex sdasi
      on sdoh.StandardDatasetAggregationStorageIndexRowId = sdasi.StandardDatasetAggregationStorageIndexRowId
      inner join StandardDatasetAggregationStorage sdas
      on sdasi.StandardDatasetAggregationStorageRowId = sdas.StandardDatasetAggregationStorageRowId
      ORDER BY OptimizationStartDateTime DESC

      In the data warehouse - we can see that all the necessary tables are being updated and reindexed as needed.  When a table is 10% fragmented - we reorganize.  When it is 30% or more, we rebuild the index.

      Since we run our maintenance every 60 seconds, and only execute maintenance when necessary, there is no "set window" where we will run our maintenance jobs.  This means that if a DBA team also sets up a UPDATE STATISTICS or REINDEX job - it can conflict with our jobs and execute concurrently. 

      I will caveat the above statement with from findings from the field.  We have some new visualization tables and management type tables that do not get optimized, and this can lead to degraded performance.  An example of that is http://www.theneverendingjourney.com/scom-2012-poor-performance-executing-sdk-microsoft_systemcenter_visualization_library_getaggregatedperformanceseries/   They found that running Update Statistics every hour was beneficial to reducing the CPU consumption of the warehouse.  If you manage a very large SCOM environment, this might be worth investigating.  I have seen many support cases which resulted in a manual run of Update Statistics to resolve an issue with performance.

      For the above reasons, I would be careful with any maintenance jobs on the Data Warehouse DB, beyond a CHECKDB and a good backup schedule.   UNLESS – you are going to analyze the data, determine which areas aren't getting index maintenance, or determine how out of date your statistics get.  Then ensure any custom maintenance wont conflict with built-in maintenance.

       

       

      Lastly - I'd like to discuss the recovery model of the SQL database.  We default to "simple" for all our DB's.  This should be left alone.... unless you have *very* specific reasons to change this.  Some SQL teams automatically assume all databases should be set to "full" recovery model.  This requires that they back up the transaction logs on a very regular basis, but give the added advantage of restoring up to the time of the last t-log backup.  For OpsMgr, this is of very little value, as the data changing on an hourly basis is of little value compared to the complexity added by moving from simple to full.  Also, changing to full will mean that your transaction logs will only checkpoint once a t-log backup is performed.  What I have seen, is that many companies aren't prepared for the amount of data written to these databases.... and their standard transaction log backups (often hourly) are not frequent enough (or tlogs BIG enough) to keep them from filling.  The only valid reason to change to FULL, in my opinion, is when you are using an advanced replication strategy, like SQL Always On, or log shipping, which requires full recovery model.  When in doubt - keep it simple

      P.S....  The Operations Database needs 50% free space at all times.  This is for growth, and for re-index operations to be successful.  This is a general supportability recommendation, but the OpsDB will alert when this falls below 40%. 

      For the Data warehouse.... we do not require the same 50% free space.  This would be a tremendous requirement if we had a multiple-terabyte database!

      Think of the data warehouse to have 2 stages... a "growth" stage (while it is adding data and not yet grooming much (haven't hit the default 400 days retention) and a "maturity stage" where agent count is steady, MP's are not changing, and the grooming is happening because we are at 400 days retention.  During "growth" we need to watch and maintain free space, and monitor for available disk space.  In "maturity" we only need enough free space to handle our index operations.  when you start talking 1 Terabyte of data.... that means 500GB of free space, which is expensive, and.  If you cannot allocate it.... then just allow auto-grow and monitor the database.... but always plan for it from a volume size perspective.

      For transaction log sizing - we don't have any hard rules.  A good rule of thumb for the OpsDB is ~20% to 50% of the database size.... this all depends on your environment.  For the Data warehouse, it depends on how large the warehouse is - but you will probably find steady state to require somewhere around 10% of the warehouse size or less.  When we are doing any additional grooming of an alert/event/perf storm.... or changing grooming from 400 days to 300 days - this will require a LOT more transaction log space - so keep that in mind as your databases grow.

       

       

       

      Summary:  (or TL;DR; )

       image

       

      1.  Set up a nightly Reindex job on your SCOM Operations Database for best performance and to reduce significant wasted space on disk.

      2.  You can do the same for the DW, but be prepared to put in the work to analyze the benefits if you do.  Running a regular (multiple times a day) Update Statistics has proven helpful to some customers.

      3.  Keep your DB recovery model in SIMPLE mode, unless you are using AlwaysOn replication.

      4.  Ensure you pre-size your databases and logs so they are not always auto-growing, have plenty of free space as required to be supported.

      Reinstalling your SCOM agents with the NOAPM switch

      $
      0
      0

       

      This one comes from collaboration with my colleague Brian Barrington.

      Because of the issues with SCOM 2016 and the default APM modules impacting IIS and SharePoint servers…..  (Read more about that issue HERE, HERE, and HERE)

       

      Brian was looking for a way to easily remove the APM components from the deployed agents with minimal impact.

      Normally, the guidance would be to uninstall the SCOM agent, then reinstall it from a command line installation using the NOAPM=1 command line parameter.  That could be a challenging task if you have hundreds or thousands of agents!

       

      His idea?  Use my SCOM Agent Tasks MP here:  https://blogs.technet.microsoft.com/kevinholman/2017/05/09/agent-management-pack-making-a-scom-admins-life-a-little-easier/

       

      It has a class property in the state view called “APM Installed” to help you see which agents still have the APM components installed (which are installed by default)

      image

       

      It has a task called “execute any PowerShell” 

      In the task – Override to provide the command you want to run – such as:

      msiexec.exe /fvomus "\\server\share\agents\scom2016\x64\MOMagent.msi" NOAPM=1

      You just need to place the MOMAgent.msi file on a share that your domain computer accounts would have access to.

      image

       

      This performs a lightweight upgrade/installation of the agent, but only changes the switch “NOAPM=1” which will result in leaving all other settings alone, and only removing the APM service and components!

      We have gotten good feedback on the success of this process across hundreds of agents in a short time frame.

       

      Removing the APM MP’s

      On another note – if you have no plans to use the APM feature in SCOM – you should consider removing those MP’s which get imported by default.  They discvoer by default a LOT of instances of sites, services, and instances of classes where APM components are installed on the agents.

      MP’s to remove in SCOM 2016:

      • Microsoft.SystemCenter.DataWarehouse.ApmReports.Library (Operations Manager APM Reports Library)
      • Microsoft.SystemCenter.Apm.Web  (Operations Manager APM Web)
      • Microsoft.SystemCenter.Apm.Wcf  (Operations Manager APM WCF Library)
      • Microsoft.SystemCenter.Apm.NTServices  (Operations Manager APM Windows Services)
      • Microsoft.SystemCenter.Apm.Infrastructure.Monitoring  (Operations Manager APM Infrastructure Monitoring)
      • Microsoft.SystemCenter.Apm.Library (Operations Manager APM Library)
      • Microsoft.SystemCenter.Apm.Infrastructure (Operations Manager APM Infrastructure)

      All of the above can be deleted.  However – in order to delete the Microsoft.SystemCenter.Apm.Infrastructure MP, you will need to remove a RunAs account profile association, then clean up the SecureReference library manually by deleting the reference.

      In the Admin pane > Run As Configuration > Profiles, in the Data Warehouse Account.  On the RunAs accounts, you will need to remove the Operations Manager APM Data Transfer Service:

      image

      Then – manually export the Microsoft.SystemCenter.SecureReferenceOverride MP, and edit it using your favorite XML editor.  (Make a Backup copy of this FIRST!!!!!)

      Delete the reference to the Microsoft.SystemCenter.Apm.Infrastructure MP.

      image

       

      Save this, then reimport the Microsoft.SystemCenter.SecureReferenceOverride MP.

      At this point you can delete the final APM MP – Microsoft.SystemCenter.Apm.Infrastructure (Operations Manager APM Infrastructure)

       

      Deleting that MP with manual edits too scary for you?

      At a bare minimum – if you are not using the APM feature – you should disable the discoveries:

      image

       

      Then run Remove-SCOMDisabledClassInstance in your SCOM Command Shell, which will remove all these discovered instances that you don’t use.

      Updated SQL RunAs Addendum Configuration MPs

      $
      0
      0

       

      imageimage

       

      Just a quick note to let you know I updated these MP’s if you use them:

      https://blogs.technet.microsoft.com/kevinholman/2016/08/25/sql-mp-run-as-accounts-no-longer-required-2/

       

      Updates include:

      1.  Disabled the Monitor for SysAdmin check by default – you will need to enable this is you want to use it.  I have been recommending to use Low Priv since that is more secure so this monitor will be disabled by default now.

      2.  Updated the Low Priv tasks for SQL 2005 – 2016.  Made the tasks more reliable.  If the task encounters an error, it will not complete all the steps for configuring low prive, so it is important to review the output when you config your SQL servers the first time.  Changes made to skip databases in read only mode in all versions, and to be more reliable for SQL 2005 and SQL 2008.

      3.  Updated version to 7.7.31.0 to align with current shipping SQL MP’s.

      Viewing all 179 articles
      Browse latest View live


      <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>