Sql Server : Recommendations for Linked Server Performance Improvement
- Posted by Sqltimes
- On November 11, 2013
- 0 Comments
A few days ago, I had a chance to work on something interesting. Linked Servers on steroids. Hundreds of servers interconnect using Linked Servers… it was awesome !! (Complicated mess to clean up, but great to work on)
Linked Servers have been around for several years, so it is not surprising to see the wide spread usage in many production environments. It is the easiest, best, safest and fastest way to JOIN tables on separate SQL Instances (or physical machines). As with anything. there are exceptions, but in general it is preferred method for many situations. When configured correctly, we can make the most out of Linked Server connections. You can also set up Linked Server connection to non-Sql Server instances as well. But for this article, I’m focusing only on SqlServer-to-SqlServer connections.
Even though data is dispersed over different locations, you can JOIN the tables as if they are co-located. With minimal changes to your query you can access data from a different servers. To make things even better, Sql Server’s optimizer is smart enough to come up with a query plan that shows the most optimal way for the storage engine to retrieve data. Following are some of the points to keep in mind to facilitate this:
Benefits
- Distributed Partitioning Views:
- If you have a big tables, you can break it into smaller tables and store each section on a separate instance. Create a umbrella view that joins all these tables using UNION ALL to make it appear as one table located on one database. I would not recommend it, but it is possible, if you need it.
- ACID
- Executing linker server query still follow all the ACID properties on the local server. This is important.
- Offload processing to a different machine
- When you query a table using Linked Server, the overall duration may take a bit longer (depends on many factors, but in general terms), but if duration is not a concern, you can offload the processing overhead to a different machine using Linked Servers. Imagine this, you have a bunch of tables, on the remote server, that you need to query to process and retrieve only a small data set. If these tables are located on a different machine, you do not need to replicate them to your machine to run this query. Just write a stored procedure that does the processing and add it to the remote server. Voila !!. You just need to call that stored procedure from your machine and you get the results without the overhead of processing the underlying tables. Again, I’m not recommending this as a general practice, but it is available if needed.
- Minimal changes to queries.
- Querying a remote table uisng Linked Server is simple. Just add the linked server name & database name to the table name like this. LinkedServer.DatabaseName.dbo.tableName
-
SELECT * FROM LinkedServer.DBADB.dbo.WaitStats
- Flexible and Secure
- Once a linked server connection is configured, you can query it from within your query. No need to re-configure for each query, like OPENROWSET or OPENDATASOURCE.
- You have options to make this linked server connection more secure. Below are the three ways to configure in decreasing order in terms of security.
- Delegation: Authentication is handled by windows domain and is the most secure. But works with windows authentication only.
- Remote Credentials: Allows local logins to be mapped to remote logins. Works with non-Sql Server data sources as well.
- Self-Mapping: Connection attempt to remote server is made using the same local login. Make sure a similar login exists on the remote machine with same password. This option is not recommended.
Best Practices
- Latency: Querying remote tables always takes longer as they are not located locally. Latency is one of the main factors, so where possible, make sure these Sql Server machines are located in the same VLAN to minimize network latency. A few days ago, I read this article comparing the duration it takes to query a table located on different physical locations. Very interesting and educational article.
- Collation compatibility
- Collation compatibility is an important point, often overlooked, that has impact on the overall performance. Sql Server depends on Collation to decide the order of the strings. English alphabet is ordered differently compared to alphabet from other languages. Sql Server depends on the Collation setting to decide the way it processes rows. If the collation of local and remote server are the same, then we avoid a few processing steps and still know with confidence that the data set you receive from remote server is correct. We need to set this collation flag (to TRUE) in Linked Server configuration to let Sql Server know that it can push ORDER BY and WHERE clauses to remote server.
- Sql Server by default takes the local collation setting, so if they do not match we need to set this flag to FALSE to prevent any conflicts. When FALSE, local server will request to transfer all the table data from remote server and then start processing it locally. This is a huge performance hit.
- If you can explicitly set the collation in the query, it helps Sql Server optimizer in making decisions.
-
12
USE master;
EXEC
sp_serveroption
'SEATTLE3'
,
'collation compatible'
,
'true'
;
- Temporary Tables
- If linked server query is taking too long, sometimes, bring the data over to local server into a temporary table and then adding indexes will help improve processing time.
- Stored Procedures
- As explained above, if you can offload processing to remote server and just receive a small data set it will help overall duration. If all the tables you query are located on the same remote server, then write a stored procedure that performs all the processing steps needed and add that as a stored procedure on remote server. You just call that stored procedure from the local server and VOILA !! entire processing is offloaded and performance is improved as Sql Server on the remote server will come up with best execution plan to run the stored procedure.
- STATS
- Query Optimizer depends on table statictis when identifying a optimal query plan. For Linked Server based query, optimizer still tries to retrieve column STATS from the remote server. But if it does not have necessary permissions on remote server table, then it will be a sub-par plan. Gathering STATS from remote server needs permissions to run DBCC SHOW_STATISTICS, which is allowed only for sysadmins, db_owerns and db_ddladmin. I would not recommend granting these levels of permissions on Linked Servers, but it is available if needed. Hopefully, Microsoft will change this requirement in future.
- Refrain from using OUTER JOINs, unless needed. OUTER JOIN will bring back all the data from remote server which causes latency.
- Depending on the permissions level, indexes on remote server are not used, so there is a chance of performing more table scans. So, only query the data you actually and exactly need, nothing more.
- OPENQUERY
- OPENQUERY send the entire remote query to the remote server, enabling the remote server to fully optimize the query for processing. Then brings the data back to local server. Though the second part still adds to latency, but the first part helps significantly. So OPENQUERY is the second best option to query remote tables.
- If you need to query non-SQL Server instance, this option is really good.
Alternative methods
These are some of the alternate methods instead of using Linked Servers.
- Use replication to maintain a local copy of the remote table.
- OPENDATASOURCE
- OPENROWSET
- OPENQUERY: This is a good alternative to Linked Servers.
Hope this helps,
0 Comments