SQL Server 2008 Data collection – description, internals and troubleshooting

SQL Server provides a data collector that you can use to obtain and save data that is gathered from several sources. The data collector enables you to use data collection containers, which enable you to determine the scope and frequency of data collection on a computer that is running SQL Server.

Below you can find a list of articles that you can use for familiarizing yourself with the terminology, some internals of the data collection, security, error handling and design information. Pay attention to Data Collector Architecture and processing, as there is the most vital info on the matter.

Topic Description
Introducing the Data Collector Describes the data collector in general.
Getting Started with the Data Collector Describes the installation requirements for this release of the data collector.
Data Collection Terminology Describes the terminology used for data collection and the data collector.
Data Collector Architecture and Processing Describes the data collector architecture and processing during data collection.
System Data Collection Sets Describes the predefined system data collection sets that are provided with the data collector.
Utility Information Data Collection Set Describes the data collection set that is installed on each instance of SQL Server 2008 R2 that is managed by a utility control point.
Collector Types Describes the generic collector types that are provided with the data collector.
Data Collector Security Describes security for the various data collector components, as well as the data retained in the management data warehouse.
Data Collector Logging Describes the data collector run-time logging mechanism and the information that is provided by the logs that it creates.
Data Collector Error Handling Describes data collector error handling and error recovery.
The Management Data Warehouse Describes the management data warehouse that the data collector uses for storing collected data.
Managing Data Collection How-to Topics Describes using SQL Server Management Studio and Transact-SQL to configure and manage data collection.

 

Now on the matter with troubleshooting…well I have to say troubleshooting Data collection is a bit of pain in the neck. First of all, because the process itself is using dcexec.exe which cannot be debugged and its work is kind of black-box for the dba. Secondly – there are some documented bugs with the collection itself. And the most important (difficult) to me – the log actually does not say much what is the error – yes, it says which job has failed, how it failed and sometimes there is a bit more info. Now with some examples:

  1. Concurrent purging of old(already uploaded) data – well, the purging is used as a second step in the upload part of the collection. Purging is done by executing a stored procedure, which is using some system resources(executables), which are quite unknown and sometimes it leads to deadlocking and eventual failing of one of the purging jobs. The good thing here is that the purging is one and the same for all collections so if one purge has failed and the other succeeded, the work done is the same as if both purge jobs succeeded. Still – there might be an error in your log which can cause a false alarm but still would need time and effort for investigation
  2. Arithmetic overflow error on upload job -> http://support.microsoft.com/kb/975915
  3. Some errors in cache files might cause permanent failure of the upload jobs – step one collects data and stores it in a cache file which is later on uploaded to the MDW database using the dcexec.exe. But if somehow the cache file contains invalid data that cannot be parsed/processed by the upload job, then the whole sequence fails. And the error you get is something like “there were X bytes read, 64 bytes expected”…and that is it…the only workaround here is to either call Microsoft to fix the cache file (eventually) or to delete it and hope the invalid data is not in some of the others as well. I have to say that once it helped me and losing a single cache file was acceptable, having in mind that the upload job did not execute successfully for quite a time.

 

Unfortunately MDW and data collection is something that I cannot see to be as powerful and worth enabling as first – it consumes server resources (MDW database is quite active), it does not provide better functionality than XE, profiler and perfmon. Maybe the strong side is in collecting and storing the info so you can monitor and predict some capacity matters but nothing more. I will be expecting, though, the tool to be improved and to provide wider scope of functionalities.

Leave a comment