In the world of ever increasing ORMs, it seems that developers tend to forget that in order for you to utilise a technology to its full potential you need to understand how frameworks work internally. While I complain about how many people blog and talk about Data Access, I felt the need to clarify some of the annoyances I saw in various codebases dealing with ADO.NET over the past couple of years.
For the sake of this post I created a simple database as follows to help me better explain my points in some of the samples below.
Encryption
When working with N-tier public applications you usually deploy your application on an App\Web server and a database server (most common scenario). While most of the time your database server is not exposed to the outside world of your application consumers, you assume that an SSL certificate will suffice on the web server to handle secure connections between the client and your application inside the browser or rich client. Recently I worked on a personal project where things were a bit different. My product was actually my SQL Server database which was exposed over the web.
I am not going to go into inner details of how we secured the server from hackers, attacks etc. I am more interested in how easy it is to encrypt information over the wire travelling to and from my db server. What was surprising is that most people I spoke to about this didn’t realise how easy it is to do that. So just like a web app I had a security certificate installed on the database server. Now that is all good until I realised the underestimation of the connection string power among devs. While this is a mild example but really it is easy, the main thing you have to do after making sure the certificate is properly configured is adding an Enrypt=True value pair to the connection string along with some supporting properties.
"DataSource=MyDbServer;Initial Catlogue=MySchool;Integrated Security=SSPI; Network Library=dbmssocn; Connect Timeout=10; TrustServerCertificate=True;Encrypt=True”
That is really it so please when passing credit card numbers to your stored procedures next time please do consider encrypting the traffic to your DB.
Connection Pooling – When I don’t want it
Pooling is a great mechanism for enhancing the performance of our DB connections. Do note that pooling works over the connection Lifetime, hence a connection will be kept alive until its creation time is compared with the current time and then destroyed if the difference exceeds the lifetime. In specific scenarios though I would like to create a “Persistent Connection: One that stays alive until explicitly closed”. For that the I have to disable pooling. Again power to the connection string for this one.
"DataSource=MyDbServer;Initial Catlogue=MySchool;Integrated Security=SSPI; Pooling=False”
Another tip is when your are debugging data access code and have pooling turned on (that is the default behaviour) and you want to make sure that pooling is not affecting your scenario make sure you use the SqlConnection.ClearAllPools or SqlConnection.ClearPool methods.
Load Balancing – Lending a hand to DBAs :)
Still working with connection strings again you may have a failover SQL cluster. While many of devs depend on DBAs to configure the environments properly, we all know how busy those guys are so why not give them a helping hand. One of the properties on connection strings again can do just that.
"DataSource=MyDbServer;Initial Catlogue=MySchool;Integrated Security=SSPI; Failover Partner=MyRescuerDbServer”
MARS – Multiple Active Result Sets
While not often done I had a scenario where I needed to open multiple SqlDataReaders on the same connection. Only problem is when you try to do this you get an exception saying “There is already a datareader associated with the command that must be closed”. But hey, my scenario dictates that I have that flexibility. Well at this time it feels that my post is about connection strings but hey why not.
"DataSource=MyDbServer;Initial Catlogue=MySchool;Integrated Security=SSPI; MultipleActiveResultSets=True”
From this point onwards you’re free to open as many data readers (I believe) as you want on the same SQL connection.
Command Behaviours
Now I am a big fan of defensive coding and policy injection. Command Behaviours for me are like policy injection in that I define my policies on how I want this command to behave in certain events. There is no one set of rules again on how all commands should behave or else we wouldn’t have been provided with this flexibility. So my tip here is do familiarise yourself with properties on the Command Behaviour class. http://msdn.microsoft.com/en-us/library/system.data.commandbehavior.aspx
Command Builders
Ever wondered how typed datasets generated the insert, update, and delete statements for you. Generation Templates? String Builders? Internal Logic? Black Voodoo? Again one of the most missed classes among junior developers are command builders. Now a command builder is not a code generation tool as such but by right it can be used as one. Examine the following code and it’s output.
private static void Main()
{
const string connectionString = "Data Source=localhost;Initial Catalog=School;Integrated Security=True";
var connection = new SqlConnection(connectionString);
using (connection)
{
connection.Open();
SqlCommand command = connection.CreateCommand();
command.CommandType = CommandType.Text;
command.CommandText = "SELECT * FROM dbo.School";
var adapter = new SqlDataAdapter(command);
var commandBuilder = new SqlCommandBuilder(adapter);
Console.WriteLine(commandBuilder.GetInsertCommand().CommandText + Environment.NewLine);
Console.WriteLine(commandBuilder.GetUpdateCommand().CommandText + Environment.NewLine);
Console.WriteLine(commandBuilder.GetDeleteCommand().CommandText + Environment.NewLine);
Console.ReadKey();
}
}
The output:
As shown the command builder generated the select, insert, update and delete commands for me. While the select statement passed in has to be simple and usually dealing with one table, but still the generated code can be used in many of code generation tools I see today plus it’s used by the typed dataset designer.
Asynchronous Data Operations
Now when working with transactions, developers always tend to keep the transactions on the lowest tier possible above their data repositories which is totally correct. On the other hand they tend to work with threading on the highest level away from their data repositories because asynchronous calls with WCF or the background worker are much easier. While the most common guideline for this scenario is IT DEPENDS I still think that when dealing with multiple threads doing various data operations, the most efficient way is to use your ADO.NET objects to handle those. Let’s take the following scenario:
- I am reading some data from my school table.
- I want a file to be created after all the data is read.
- I don’t want the main\UI thread to be affected by the entire procedure and the whole thing is just a straight data dump with no business logic.
To me this is a data operation that I don’t even need to handle outside my repository. Remember repositories sit in your infrastructure tier where you persist and read from databases, file, web services etc..
private static void Main()
{
const string connectionString = "Data Source=XPSEXTREME;Initial Catalog=School;Integrated Security=True;Asynchronous Processing=True";
var connection = new SqlConnection(connectionString);
connection.Open();
SqlCommand command = connection.CreateCommand();
command.CommandType = CommandType.Text;
command.CommandText = "SELECT * FROM dbo.School";
Console.WriteLine("Starting Read:");
command.BeginExecuteReader(new AsyncCallback(WriteToFile), command);
Console.ReadKey();
}
private static void WriteToFile(IAsyncResult ar)
{
var command = (SqlCommand) ar.AsyncState;
SqlDataReader dataReader = command.EndExecuteReader(ar);
command.Connection.Close();
command.Connection.Dispose();
//Write reader content to file
Console.WriteLine("Data is ready to be written to file.");
}
Do note that you have to apply the Asynchronous Processing=True property in the connection string for successful asynchronous calls.
Batch Updates
One of the most annoying things in DataAdapters is that the default behaviour on executing multiple commands is:
- Open Connection.
-
Execute a single command.
- Close Connection.
- Goto 1 :)
A very useful property that again many of us miss is the BatchUpdateSize property on the DataAdapter. To update records at once in the database with a single open connection you define BatchUpdateSize to 10 records at once. The property is set to 1 by default.
var adapter = new SqlDataAdapter(command) {UpdateBatchSize = 10};
Now that’s a neat solution but if you are using SQL 2008 why stop there. By creating the following stored procedure and utilising the new Table Value parameters I can do batch updates even more efficiently.
CREATE TYPE dbo.StudentInput AS TABLE
( StudentId uniqueidentifier, Name nvarchar(100), SchoolId uniqueidentifier )
GO
CREATE PROCEDURE UpdateStudents (@students dbo.StudentInput READONLY)
AS
BEGIN
INSERT INTO dbo.Student(StudentId, Name, SchoolId)
SELECT ns.StudentId, ns.Name, ns.SchoolId FROM @students AS ns;
END
GO
Then I execute my command passing in a datatable parameter with multiple values.
private static void Main()
{
const string connectionString = "Data Source=localhost;Initial Catalog=School;Integrated Security=True";
var connection = new SqlConnection(connectionString);
var schoolGuid = new Guid("c7a92ea0-9fd5-42eb-9949-f16b3644cc6d");
var dataTable = new DataTable();
dataTable.Columns.Add("StudentId", typeof (Guid));
dataTable.Columns.Add("Name", typeof (String));
dataTable.Columns.Add("SchoolId", typeof (Guid));
dataTable.Rows.Add(new object[] {Guid.NewGuid(), "Omar Besiso", schoolGuid});
dataTable.Rows.Add(new object[] {Guid.NewGuid(), "Paul Stovell", schoolGuid});
dataTable.Rows.Add(new object[] {Guid.NewGuid(), "Brenden Crowne", schoolGuid});
using (connection)
{
connection.Open();
SqlCommand command = connection.CreateCommand();
command.CommandType = CommandType.StoredProcedure;
command.CommandText = "dbo.UpdateStudents";
var parameter = new SqlParameter("@students", SqlDbType.Structured) {Value = dataTable};
command.Parameters.Add(parameter);
command.ExecuteNonQuery();
Console.ReadKey();
}
}
DataViews
Ok by now the you must be bored so I will leave you with one last very simple clarification. A data view is a view built on a single datatable showing a subset or all of the tables columns. It’s not a view that you can use to join multiple datatables in the same dataset. This was a common joke at work recently when we were assigned to review some other mobs work so thought I’d share :)
Conclusion
Like anything make sure you know your frameworks internals and the simple\efficient things you can do. While in the world of ORMs most of this post doesn’t bother you much, trust me it will help when your manager comes screaming down your neck “Why is this not working? Why is it slow? etc.”. ADO.NET is a great framework so ride it till the wheels fall off :)