Practical App

How-To: Streamline SharePoint with RBS

Remote BLOB Storage (RBS) can improve SharePoint functionality and performance.

Organizations of all sizes are using Microsoft SharePoint as a document management and repository system, among other things. As a result, SharePoint is storing a large number of documents, with numbers often reaching into the millions.

SharePoint stores all these documents in the SQL Server database as Binary Large Objects (BLOBs). Like other relational databases, SQL Server wasn't designed to store BLOBs of this magnitude. Consequently, it can choke at various levels. This degrades SharePoint performance and makes database administration painful.

To address this problem, Microsoft introduced a COM-based External BLOB Storage (EBS) provider model in SharePoint 2007. EBS lets you offload BLOBs to outside storage and greatly reduce the database size. This resolved the database size issue, but not having a native .NET provider had performance overhead and wasn't a "clean integration" with SharePoint. So, this solution wasn't really considered complete.

In SharePoint 2010 and SQL Server 2008 R2, Microsoft released a native .NET-based Remote BLOB Storage (RBS) interface as a replacement for EBS. RBS also helps reduce SharePoint 2010 database size greatly. It does so by offloading all BLOBs from the SharePoint content database into a user-specified external storage. Stubs and metadata for these BLOBs are still kept in the content database. As a result, SharePoint still thinks these BLOBs are part of SharePoint and can access them the same way. Users don't feel any difference because all BLOBs are still logically part of the content database.

Now Microsoft has provided a default implementation called the RBS Filestream. However, this is fairly basic and doesn't provide a lot of flexibility for serious SharePoint users. Although it does offload BLOBs, it doesn't let you specify filters for which BLOBs to offload and which ones to keep in the database. You end up offloading all BLOBs whether you mean to or not. Also, it doesn't let you specify a remote storage location and stores BLOBs only on a local storage on the SQL Server machine. It also has no administration or monitoring tools.

If you're a mildly serious SharePoint user, you'll want a lot more control and flexibility over BLOB handling and storage. There are some fairly decent third-party implementations of RBS that work perfectly fine and address SharePoint database size issues. You'll need to ensure any third-party implementation you use is 100 percent native .NET and not a patchwork of Java and .NET, which can cause compatibility issues.

One benefit of the RBS provider model is how it has opened SharePoint for third-party vendors to add more features through their RBS implementations. Anybody implementing an RBS provider has control over where BLOBs are stored. They can use this control to give SharePoint users more BLOB-related features beyond reducing database size and increasing performance. In this regard, there are four main areas of SharePoint enhancements:

  • Reducing storage costs through multitier BLOB storage
  • Archiving and retaining BLOBs for compliance purposes
  • Linking non-SharePoint document libraries to SharePoint
  • Incorporating in-memory BLOB caching.

Multitier Storage
One of the main benefits of RBS is the ability to store BLOBs in multiple storage tiers instead of all in one location. This helps to significantly reduce storage costs. Without RBS, all BLOBs are kept in the SQL Server database. Even RBS Filestream uses SQL Server for BLOB storage. This single point of storage is usually something fairly expensive, such as a SAN/NAS disk.

BLOBs usually make up around 90 percent of all data in SharePoint, and you don't need frequent access to all of them. So when all BLOBs are stored in an expensive SAN/NAS disk array despite only requiring infrequent referencing, you end up spending a lot on storage that's inefficiently used.

A more reasonable strategy would be to store only the new and active documents (BLOBs) in expensive storage because users will need to access those more frequently and quickly. You can store the rest of the older and less-frequently used BLOBs in cheaper storage tiers.

One example of multitier storage is where you have new active documents stored in an expensive SAN disk.

Then you have a normal file server as the second tier and cloud storage for the third tier. SAN is the fastest and most expensive. Cloud is the slowest and the cheapest, and a file server is somewhere in between. Now as much as 80 percent to 85 percent of your BLOBs are no longer kept on expensive SAN storage. You don't need frequent access and therefore don't need high-availability, high-performance storage like SAN.

Now that you have multitier storage, the next question is how to determine which BLOBs to store in which storage tiers. You determine this at multiple points based on intelligent criteria. You determine a BLOB's location when it's first created (or, similarly, when it's first offloaded from the SQL Server database). Then you check periodically and determine how old a BLOB has become and how frequently it's used.

This is where some RBS providers give you the option of BLOB filters. There are usually one or more BLOB filters associated with a storage tier. These indicate that only BLOBs matching these filters should be stored on a particular storage tier. All other BLOBs should be evaluated against other storage tiers and stored where the corresponding BLOB filters match.

If a BLOB doesn't match any storage tier's BLOB filters, then you can store it in the SQL Server content database. BLOB filters usually include document name, size, age, author and even content type. You can base these filters on custom meta-tags you've added or on file type.

Some RBS providers let you change filters on existing BLOB storage. They'll automatically readjust the BLOBs and move them from one storage tier to another based on the new BLOB filters. Other RBS providers only evaluate BLOB filters at the time of creation.

For periodic checking, some RBS providers use age-, version- or usage-based archiving. A SharePoint task runs in the background and inspects all BLOBs for age, version and usage pattern. Although age and version information is kept within the SharePoint content database, usage pattern information is maintained by the RBS provider in separate tables or in a database.

If any older BLOB or document suddenly becomes popular and requires more frequent access, some (not all) of these RBS providers will automatically move the BLOB back to more expensive storage based on usage. That way, you won't lose anything on performance. The more expensive the storage is, the faster its response time. Usually, the less expensive the storage is, the slower the response time.

Archiving BLOBs
Another important factor is the ability to archive and retain certain documents and BLOBs in a separate archive for a predetermined period, and then have them automatically deleted from SharePoint.

Archiving BLOBs for compliance ensures they're not accidentally deleted from SharePoint. Many RBS providers don't have BLOB retention. They use the word "archive" to mean moving BLOBs from one storage tier to another. But some RBS providers have BLOB retention features.

In many situations, you'll have documents you must preserve for a specified period of time either for legal or company policy compliance reasons. For example, many industries preserve contracts and agreements for a five-year period. Such compliance is common in financial, insurance and other similar industries. This could, however, apply to almost any company.

In these compliance situations, you must have a way to keep a copy of the document separately. That way, even if it's deleted from SharePoint, you can restore it from the retention archive. Even if a user tries to delete such a document, you have a preserved copy of it for a number of years for legal compliance or company policy reasons.

There can be other compliance requirements that require automatically deleting certain documents after a certain period of time. For instance, there may be financial information you don't want to keep for legal reasons. Those documents can be automatically deleted based on existing rules.

Any RBS provider that gives you retention archiving lets you specify separate storage, which is an archive primarily for retaining documents. This archive is offline for all regular SharePoint operations. However, it's not really offline, but rather protected from regular access. Only you as the SharePoint administrator can access such a retention archive.

Now, you don't want to put all your SharePoint documents in a retention archive. That's why a good RBS provider should give you the ability to specify filters for which documents to put in retention archive. These filters could be based on document name, document extension, document size, content type, user ownership and more.

The idea is to give you real control over specifying which documents you want to archive.

Non-SharePoint Document Libraries
Your non-SharePoint document libraries can be virtually any variety -- from a simple file share to other document-management systems. You can efficiently link these documents in SharePoint. The SharePoint crawler indexes them, letting you and other SharePoint users search for, locate, open, read and edit any document.

This capability normally doesn't exist in SharePoint. By having an RBS provider, you immediately have this turbocharged capability where you can use SharePoint to share and access documents that aren't SharePoint documents.

In most midsize and large enterprises, SharePoint isn't the only document-management system. Even if a few departments or divisions move to SharePoint, everybody won't move at once. You'll have a situation where there are documents in various other document-management systems, and the owners of those document-management systems aren't yet ready to move everything to SharePoint.

SharePoint users would love to be able to access those documents from within SharePoint. The alternative is to have your SharePoint users log in to each document-management system separately, search the documents, open them for edit, and independently check them in and out.

A more convenient way is to have everything accessible from within SharePoint. Even if documents are owned by another document-management system, you can access them with SharePoint. RBS providers perform this operation by tricking SharePoint into thinking these external or non-SharePoint document libraries are part of SharePoint.

RBS providers let you specify the locations of these external document libraries. Then they iterate these documents and create meta-data for each document in SharePoint. RBS providers then make SharePoint think these are SharePoint documents, but the actual BLOB document is stored in outside storage.

When the SharePoint crawler indexes all the documents based on keywords, it also finds these documents as normal SharePoint documents (as far as the metadata is concerned). That's why the crawler can index them like all other SharePoint documents.

When SharePoint users search within SharePoint, they find these documents as well. SharePoint users can open the documents for reading or editing, then can check them back in. What they won't realize is that when they check them back in, the updated document is stored back to the original location -- those external or non-SharePoint document libraries.

Any changes are also available to other document- management systems because the RBS provider saved those changes at the source location.

A good RBS provider lets you implement pluggable custom storage for these non-SharePoint documents. If you already have a custom document-management system that stores all the documents in an Oracle database and you know the schema of that database, you can implement a custom plug-in for this external storage.

Implementing a custom plug-in usually involves writing some .NET code and registering your .NET assembly with SharePoint Web front-end (WFE) servers. This plug-in gives SharePoint the ability to iterate storage, check out and check in documents, and fetch documents for reading.

In-Memory BLOB Caching
RBS providers can also incorporate in-memory BLOB caching. Whenever BLOBs are fetched from the content database, SQL Server database or even external BLOB storage, they're cached in the WFE server in in-memory caching. The next time a user wants the same document, he can fetch it from this in-memory cache. This is many times faster than going to the BLOB storage. Caching is made possible by the RBS providers that are controlling the BLOBs.

Once you have in-memory caching plugged in, you can cache BLOBs, SharePoint lists and ViewState, as well as the ASP.NET Session State that SharePoint sometimes uses. Caching lists and BLOBs significantly increases your response time because SharePoint is no longer making these expensive database or data storage trips.

Caching ViewState reduces the payload returned by WFE servers to your browser. This reduces bandwidth consumption and also improves SharePoint response time -- especially if your users are accessing SharePoint across the WAN. Caching Session State lets you replicate Session States and prevent data loss, in addition to improving performance and scalability.

Caching also improves scalability. With caching, as you add more users, performance continues to remain high. That's not the case otherwise. SharePoint is a database-intensive application. It makes so many database trips that as you increase the number of users, the database load increases.

SharePoint is a remarkably valuable platform. It's not perfect, but is certainly open to improvement. RBS was originally meant only to help reduce database size and improve SharePoint performance. However, it has opened the gates for a lot of innovation

Featured

comments powered by Disqus

Subscribe on YouTube