In-Depth

Write .NET Code in SQL Server

You can use VB.NET or C# to write procedural code and create user-defined types and aggregates in SQL Server.

Technology Toolbox: VB.NET, C#, SQL Server "Yukon"

All versions of SQL Server prior to SQL Server "Yukon" provide two ways to write procedural code that executes inside the database. The usual way is to use Transact SQL (T-SQL), a proprietary implementation of a portion of the SQL standard known as persistent stored modules (SQL/PSM). An alternative is to write extended stored procedures in compiled languages such as C++; extended stored procedures employ native calls using the Open Data Services API. Yukon lets you use any .NET-compatible language to program SQL procedural code without needing to know T-SQL. This enhancement offers certain advantages, but using T-SQL code is still optimal in many cases. I'll explore the implementation of .NET language support in Yukon and discuss trade-offs to consider in choosing among the available alternatives.

T-SQL code is highly integrated with the rest of SQL Server. It executes using "native" data types, so data returned from the storage engine doesn't need to undergo expensive type conversions prior to T-SQL's use. This makes T-SQL code highly efficient. On the other hand, it isn't compiled prior to use. Each query or line of T-SQL code runs under an interpreter, making it less than ideal for complex number crunching or other manipulations that benefit from compilation. Since SQL Server 7, stored procedures and user-defined functions (UDFs) aren't precompiled or stored in an intermediate parsed-tree format prior to use, as in earlier versions.

Extended stored procedures are useful for accessing OS-level services such as the file system and the Internet, number crunching, or storing state such as global timers—but they're difficult to write and debug. Database calls that use ODBC or OLE DB APIs require conversion of data types between the database and ODBC/OLE DB and back again. Although extended stored procedures can "join" the existing session for the purpose of being in the same transaction, doing database calls through OLE DB or ODBC requires connecting back to the database. The concept of the "current connection" isn't available when you program with extended stored procedures.

Although a rogue extended stored procedure can cause SQL Server to fail by calling the process exit() function, well-meaning but buggy code cannot cause it to fail by branching to location zero, for example. The reason is that SQL Server wraps the execution of extended stored procedures with structured exception handling. However, pointers to incorrect memory locations can cause an extended stored procedure to scribble on SQL Server's internal buffers, because the extended stored procedure manages memory allocation and access, and it can access memory that SQL Server uses. Use of extended stored procedures is rare, for these reasons.

T-SQL code under Yukon continues to operate mostly as before, providing complete compatibility with existing code and enabling the millions of current T-SQL programmers to continue writing high-performance data-access code for the SQL Server relational engine to execute. T-SQL will probably remain your language of choice if you're one of these programmers.

Leverage .NET Support
Yukon's support for .NET code provides several benefits. .NET classes are compiled; they can be just-in-time compiled into machine instructions at execution time or precompiled into machine code prior to use. Precompilation makes computation-intensive .NET code run faster than the equivalent T-SQL. .NET classes are objects, so you can use object-oriented programming techniques. Finally, using the .NET base class libraries inside SQL Server lets you implement .NET functionality that isn't available in T-SQL, such as cryptography functions.

Running .NET code in SQL Server is safe. When you catalog a .NET assembly in SQL Server, the engine makes sure the code is verifiable and type-safe. .NET code loads into application domains (appdomains)—mini "processes within a process." The .NET execution engine protects SQL Server's internal buffers from executing code, and extensive enhancements to the hosting control APIs in the execution engine prohibit exceptional conditions—such as stack overflow and out-of-memory conditions in .NET code—from interfering with SQL Server operations (see Figure 1). The execution engine controls storage allocation and memory management. This ensures that .NET procedural code never steps on random memory buffers (short of a bug in the engine itself). The execution engine can always dispose of the offending thread or even an appdomain in case of severe programmer error, while SQL Server continues to run without interruption.

Yukon ships with an in-memory .NET data provider to optimize data access from managed procedural code. You can use either .NET types or SQL types with this provider. Some .NET types, such as System.Int32, require no conversion, but others, such as System.Decimal, aren't exact matches. The .NET classes in System.Data.SqlTypes correspond exactly to the SQL Server types. Using these types in .NET procedures requires no type conversion, so execution is faster. Yukon's internal ADO.NET provider—System.Data.SqlServer—also contains optimizations, such as the SqlExecutionContext class, which allows further optimization if the same SQL statement executes more than once.

You no longer need to create the connections explicitly—as you do with extended stored procedures—when you use .NET to write procedures, functions, and triggers in Yukon. Yukon and the Common Language Runtime (CLR) make this possible by introducing the SqlContext class. SqlContext is a helper class that represents the environment of the current session. SqlContext has static methods you can use to get references to the managed representations of SQL Server objects, such as the current connection and transaction (see Table 1). Most of the classes you can reference are familiar if you've used ADO.NET on the client side; others are unique to the in-server programming model.

The SqlTriggerContext, SqlResultSet, and SqlPipe classes might be unfamiliar to you. SqlTriggerContext gives you programmatic access to the current environment within a managed trigger and is somewhat analogous to SqlContext. SqlResultSet is a new class that implements server-side cursor support; it didn't exist in client-side ADO.NET in versions 1.0 and 1.1, but the Whidbey version of SqlClient adds this class too. SqlPipe, which is a server-only class, allows you to send results back to the client, such as errors, messages, and results of a SELECT statement (see Listing 1).

Stored procedures and UDFs have two general uses: They let you write domain-specific, perhaps computation-intensive code, and they let you write data-intensive code. The locality of reference they provide means you don't need to remote possibly large amounts of data from the database to the client or middle tier for the data to be useful. Minimizing database round trips should always be one of your goals if you want your applications to be scalable.

Define Extended Data Types
Yukon also introduces support for SQL-99-like user-defined types. You must define extended data types as .NET classes. You can't define them in T-SQL, although they're accessible in T-SQL stored procedures, UDFs, and other procedural code. These classes (types) can have member functions that are accessible in T-SQL, à la SQL distinct types, and they can have mutator functions that are usable in T-SQL update statements.

Note that SQL Server complex types extend relational data types. You use them most commonly to define new scalar types—such as a ComplexNumber type—that extend the relational type system, not to define "object" data types you might define in an object-oriented database. You can define distinct types based on a single built-in data type, and you also can give Yukon user-defined types multiple storage items (attributes). Such a user-defined data type is considered a complex type in SQL-99. Once you define the new type in the SQL Server catalog, you can use it as a column in a table. You can use variables of the type in stored procedures and use the type's attributes and methods in computed columns and UDFs.

However, Yukon is unaware of the inheritance relationships among types (although you can use inheritance in the implementation) or polymorphic methods, unlike traditional object-oriented systems. For example, you can define a complex user-defined type called Person that contains multiple data values (member variables) and instance methods, and you can define a complex type Author that inherits from Person—but you can't invoke methods of the Person type when using an Author type or cast Author to Person. You can code complex classes such as Person, but a domain-specific scalar type such as ComplexNumber is a more likely implementation.

Yukon also supports user-defined aggregate functions. These types extend the concept of UDFs that return a single value, and you can write them in any .NET language. The SQL specification defines five aggregates that databases must support (MAX, MIN, AVG, SUM, and COUNT). SQL Server implements a superset of the specification, including such aggregates as standard deviation and variance. Yukon's support for .NET languages eliminates the need to wait for the database engine to implement your particular domain-specific aggregates. You can even define user-defined aggregates over user-defined types, such as an aggregate that adds a set of the ComplexNumbers (the data type I described previously).

The Whidbey version of VS.NET makes writing and deploying .NET code for SQL Server as easy as writing an ordinary .NET assembly or an ASP.NET Web application. You start by creating a SQL Server Project; this option is available in Visual Basic Projects, Visual C# Projects, or Visual C++ Projects. (It's called a SQL Server Class Library in C++.) It's likely that other .NET languages will offer equivalent functionality through the Visual Studio ISP program. I mentioned previously that SQL Server assemblies must be completely verifiable. A substantial revision in the managed C++ architecture makes it possible to write 100-percent verifiable managed C++ code in Whidbey.

Create a Database Object
When you choose a SQL Server project, VS.NET prompts you for a SQL Server connection to allow you to deploy and debug the assembly without leaving VS.NET (download the sample code). The starter project contains an AssemblyInfo source file with assembly-level attributes. However, it doesn't include a starter source code template; you use "Add New Item" to choose which of five types of database objects you'll create: stored procedure, user-defined function, trigger, UDF or user-defined aggregate. When you add one of these items, Visual Studio also creates a debug.sql file, which contains the SQL statement VS.NET executes when you choose to debug the assembly.

.NET stored procedures, UDFs, and triggers act almost exactly the same as the T-SQL versions of these objects. You can write user-defined types and user-defined aggregates only in .NET languages. Some SQL Server objects you can create in VS.NET have a corresponding behavior attribute. You use these attributes to define behaviors of the objects inside SQL Server, but the objects also can have attributes that relate specifically to deployment (see Tables 2 and 3).

You also can use the CREATE ASSEMBLY statement to catalog a .NET assembly manually in a Yukon database. This T-SQL data definition language statement defines an assembly to SQL Server and assigns it the symbolic name SomeTypes:

CREATE ASSEMBLY SomeTypes 
   FROM 'c:\types\SomeTypes.dll'
   WITH PERMISSION_SET = SAFE

The preceding statement loads the code from the file and assigns the symbolic name. You can load the code from a network share or from a local file system directory. The file must be a library (DLL) rather than an executable (EXE). No special processing of the code is needed beyond normal compilation; SomeTypes.dll is a normal .NET assembly, and it must contain an assembly manifest. Although a .NET assembly can contain multiple physical files, Yukon doesn't support multifile assemblies currently.

SQL Server verifies that the assembly code is type-safe and validates the code when it's cataloged. This not only saves time—because the runtime usually does this during the JIT process (at first load)—but also ensures that only verifiable code is cataloged into SQL Server. Unverifiable code causes the CREATE ASSEMBLY DDL statement to fail.

The value of the PERMISSION_SET you specify determines what happens during validation. The permission sets enforce semantics with respect to what kind of calls can be made. CREATE ASSEMBLY uses Reflection to ensure that you follow the rules.

Choose a Safety Level
The default permission set is SAFE. SAFE permissions are only allowed to use code that can't be used to compromise security or integrity. Safe code can't access external system resources such as the Registry, network, file system, and environment variables. SAFE code also can't access unmanaged code through COM-callable wrappers or PInvoke. SAFE code can make data-access calls using the current context, but can't access data through the SqlClient or other data providers. An attempt to use forbidden methods within a SAFE assembly results in a security exception.

Specifying EXTERNAL_ACCESS enables code to access external system resources. The Registry, network file system, and environment variables are available through the managed code APIs, but EXTERNAL_ACCESS code can't use COM-callable wrappers or PInvoke. Like SAFE code, EXTERNAL_ACCESS code can't compromise security or integrity.

UNSAFE code isn't restricted in any way, including using unmanaged code. Using UNSAFE could compromise SQL Server, so only users who are members of the sql_admins role can even permit UNSAFE code—though it can be available to other users who've been granted the appropriate permission. Although it seems unwise even to permit UNSAFE code to execute, it's no more unsafe than an extended stored procedure.

.NET procedural code serves four main goals: It provides a safer replacement for extended stored procedures, it's faster for computation-intensive operations, it lets you use the rich .NET Framework class libraries, and it lets you extend the type system. Using a managed environment for database procedures that must extend T-SQL's capabilities in domain-specific ways is a big improvement over using a low-level, type-unsafe language. If you can't wait for Microsoft to produce everything you want as extensions to T-SQL or as system-defined functions, then replacing extended stored procedures with .NET stored procedures is probably appealing. Likewise, processor-intensive calculations that should take place in the server should use compiled .NET code, until T-SQL stored procedures can be compiled to machine code. User-defined types and user-defined aggregates let you extend the type system for special scalar values without petitioning Microsoft to add your domain-specific type to the next version of SQL Server.

You can use the SqlServer managed provider to access data as I showed you, but T-SQL—a data-centric language—is still the preferred API for data-intensive routines. The choice between T-SQL and CLR procedural code is similar to the difference between T-SQL and dynamic SQL, because the query processor can parse and optimize the SQL statements in a stored procedure. You might be tempted to use a single .NET language both inside and outside the database, but—until the ability to evaluate, optimize, and perhaps even preprocess the static and SQL portions of the .NET procedure improves—it's a good choice only if your organization doesn't have much T-SQL expertise or wants to use a single programming language on the client and on the server. If you choose this route, it's still a best practice to get your database administrator involved in evaluating and assisting with the code (especially choosing the safety levels) before it goes into production.

Featured

comments powered by Disqus

Subscribe on YouTube