How one Solution Provider is using Site Server 3.0s powerful search capabilities to bring Web content directly and continually to employees.
Search the World Over
How one Solution Provider is using Site Server 3.0’s powerful search capabilities to bring Web content directly and continually to employees.
- By Larry Cooper
- 11/01/1998
Maybe I should start this article with the moral, which
is: Automatically add at least one more week to
your project plan the first time you try this.
Like many consultants and trainers, I have a weakness:
I love playing with new technologiesespecially those
put out by Microsoft. Somehow, when that happens, the
MCSE lobe of my brain goes into high gear, and I start
speaking in tongues. Another little problem: I love trying
something no one else has done beforeat least, no
one in my corner of the world.
Enter: Microsoft Site Server 3.0. Exit: Most of my time
in the month of July.
The idea was simple: Deploy a platform to allow my company,
QuickStart Technologies, to move our knowledge management
plans forward. As a service company, were constantly
looking for ways to capture our shared experiences and
best practices and to share that knowledge with our entire
staff. Also, management of knowledge at QuickStart was
made more difficult by our inability to search all relevant
sources of information in our enterprise from a single
point. As with most networks, wed have to look in
various file shares, search Exchange public folders separately,
and search each Web site (internal and external) separately.
In light of those challenges, Site Server 3.0 seemed the
platform to help us meet our business needs.
The project would be deployed in several phases. Phase
one: Deploy the platform (thats MCSE-speak for set
up the box). Phase two: Implement a Search Server solution
to catalog the companys known universe (internal
resources, the intranet, public Web sites, and public
folders) and some important sites on the Internet. This
would allow employees to conduct searches on a set of
knowledge specifically pertaining to our business. While
many Internet search engines would give us lots of Internet
content, this would give us the one-stop shopping we wanted.
Naturally, I jumped at the chance to oversee the project.
Site Server 3.0 actually contains more than just search
capabilities. Site Servers core centers around content
publishing and management, search, and content delivery.
(For a description of the individual components, see John
Wests article "Broaden
Your Sites: The Site Server 3.0 Story" in this
issue). Ultimately, our goal would be to use the content
management capabilities to capture and categorize content,
use search to give our employees access to all available
content, and then use the content delivery mechanisms
to personalize and monitor the user experiencewhat
the user actually sees at the site.
How Hard Could Setup Be?
The first thing we did, believe it or not, was to read
the directions. Site Server comes with an excellent online
set of documentation. Without a doubt, the docs saved
me hours of problems. Thats because, as with many
of Microsofts most complex products, Site Server
requires a particular order to its installation. Youll
want to follow the order suggested, partly out of necessity
(one product will look for another), and partly out of
a desire not to start the process over again (trust me
on this one).
The first thing I did was to set up our server. You know,
the standard stuff: Windows NT Server 4.0 with Service
Pack 3. I even gave it a creative NetBIOS nameTribalKnowledgethat
leveraged all 15 characters available. Big mistake, but
more on that later.
I opted for the de facto standard for Web servers: a
stand-alone box. I could easily set up the platform as
a stand-alone entity first, then merge it into the domain
that the server would eventually call home
(more on that later too). Since Im not a domain
administrator, this was also the least painful route.
Many Microsoft products require large amounts of disk
space, processor power, or memory. Site Server doesnt
discriminateit likes all three. My server (a Pentium
II 333MHz) currently houses about 128M of RAM (and Im
in the process of doubling that), and about 6G of storage
space.
Because many of Site Servers complementary programs
(such as IIS 4.0) were going to need to throw things in
the system partition, I opted for a large system partition
(2G) and used the rest of the disk as a single large data
partition (4G). In general, Id recommend that you
install Site Server in a separate partition from your
\WINNT directory. Site Server likes to store things in
its own directory by default (search catalogs, etc.) and
its much easier to deal with that issue if you install
Site Server in a data partition.
Microsoft recommends a minimum virtual memory size of
128M. That worked fine for me for awhileabout three
days of active searching, to be exact. Then those nasty
little out of virtual memory errors sprang
up as I started to implement a search. After that, I more
than doubled the size of virtual memory, to 256M, and
all has been well since. Id recommend that you double
the minimum requirement if youre doing any memory
or processor intensive taskslike Search or Analysiswith
Site Server.
Installing the Database
The next step involved installing SQL Server 6.5. While
this didnt seem like a necessity at the time (I
could have used a remote SQL Server or local Access database),
I decided to go with experience and avoid the local Access
database. The local SQL Server has also made prototyping
a dream. I can do all my later work with the Analysis
and Membership databases from the confines of a single
server. While the database performance wasnt as
fast as Id want it to be, I also didnt need
to wait for an act of Congress for my DBA to approve the
addition of a new table. Later on, I moved all my database
content to a separate SQL Server and simply changed the
pointers on my ODBC references.
Installation was easy, since I read the directions a
couple of times. Not only did I need to install SQL using
the Local System Account security context, but I needed
to increase the Master Database Size to 50M and activate
TCP/IP connectivity. SQL Server installed beautifully.
Then came the requisite patches: SQL Server Service Pack
4 (unlike the beta, SP 3 wont do it), and a file
called SQLSERVR.EXE, on the Site Server 3.0 CD. I needed
to replace the file in the MSSQL\BINN (not BIN) directory.
Once again, read all the release notes and documentation!
At this point, all I needed to do was create my data
devices and databases, update the ODBC drivers from Microsofts
Web site, create data source names (DSNs) that point to
my database, and tune SQL Server according to the documentation.
No headaches there, since everything was fairly self-explanatory.
Honestly, none of it is required for Site Server Search;
but I figured it would be better to have everything in
place for using content management and personalization
down the road.
Then I installed Internet Explorer 4.01 (a requisite).
I left out the Active Desktop, since it wasnt necessary.
|
Figure 1. QuickStarts
application of Site Server 3.0 manages the corporate
knowledge base on a platform of NT Server and IIS
4.0. It uses the Catalog Build Server to maintain
information culled from several resources, including
Web sites and network files. Search statistics and
user preferences reside in a SQL Server database as
an analysis database and personalization database,
respectively. |
Finally, I installed Option Pack 4.0. The little helpful
hint in the documentation mentioned that I
should also install Transaction Server and Index Server,
so who was I to argue? Next time, I might do a little
arguing. Index Server 2.0 is invaluable for much of the
intranet functionality of Site Server 3.0 (specifically
content management), but I was able to disable it from
the Services Control Panel Applet. Youll need to
get all the way down in the Search documentation for that
little tidbit. Ive gotten much better system performance
on my dedicated Search Server without Index Server.
The next step required a little digging as well. I needed
to activate the FrontPage extensions that
shipped with the NT Option Pack. So, I opened up the FrontPage
Server Administrator and installed the FrontPage Extensions.
Funny, I thought I just did that as part of installing
Option Pack.
Then, I installed Visual InterDev, FrontPage 98, and
the FrontPage 98b page from Microsofts Web site.
I also decided to go to the Microsoft site and update
the FrontPage Server extensions. Call me paranoid.
After all this, youre probably thinking, So
would you go ahead and install Site Server already?
Whats in a Name?
Site Server installation seemed easy. I created two Administrator
level accounts on the server, one for the Publishing Component
and one for the Search Component. Click Setup, answer
a couple of questions, and youre off! And then
BOOM!
LDAP Error!
If youre unfamiliar with it, LDAP is the Lightweight
Directory Access Protocol, the RFC that will solve all
of your worries. Actually, its a great protocol
for accessing directory information, particularly the
Site Vocabulary that Site Server uses for knowledge management
and the membership directory components. Its also
the new standard appearing in Windows NT 5.0. In this
case, LDAP was going to help provide personalization for
my search Web pages, so I needed it to be functional from
the start.
Naturally, still being the worrier that I am, I started
to rebuild the server. I figured that it was meor
something Id done wrong. BOOM! Still no luck.
At this point, I want to thank the Microsoft PSS engineer
who spent three hours on the phone with me, to no avail,
trying to diagnose the problem. I finally did some digging
and realized I had used the name INSTRUCTOR (its
the MCT in me) during one of my successful installations
earlier that month. It turns out that Site Server doesnt
like a 15-character name, like TribalKnowledge.
In fact, it made LDAP initialization fail. With a 10-character
name, like Instructor, no problem. So, some
20 installations later, I changed the rules as well as
the name of my server. Success!
Now all I had to do was configure my server and create
a Web site. No problem, right?
Since I now had a functioning box, it was time to request
the assistance of our IS staff. They opened the
door for me and created a machine account in the
domain using Server Manager. After successfully adding
the Site Server machine to its new home domain, I had
our IS staff create an access user account
in the domain for Site Server to search resources on my
own internal network. For simplicitys sake, I gave
it the same name and password as the Search Service account
on my local server. Unlike the local account, it didnt
have administrator-level privileges. No sense creating
a back door into the domain.
In Search of
?
Our first step in creating a Search Server solution was
to configure the Catalog Build Server. From the MMC, I
had to go into the properties of the Catalog Build Server.
As I mentioned earlier, I created a content access account
on my home domain. I had to specify the account (DOMAIN\USERNAME)
and password.
The hardest partand the part that required the
most planningwas the definition of our catalogs.
A catalog is simply all the data that you decide to index.
The Catalog Build Server uses a catalog definition by
which to index or crawl. Ultimately, that
catalog gets handed-off to the Search Server
service, which will query it according to user requests.
I decided to split our search into four different catalogs:
Microsoft, External, Internal, and Exchange. I figured
out this trick after creating one large catalog that took
about 14 hours to build each night. Since Im using
the same server to build the catalog and run the searches,
I needed a more efficient way of searching.
The internal resources catalog was actually the easiest
to build. I started by going through each of our servers
enumerating shared directories. I then set up a file crawl
in the search for each file share. Our content access
account needed read access to each of those shared directories.
I had two choices: manually give the content access account
read access to each of the directories and their shares
or add the content access account to existing groups that
have read access. Just wait until you ask your IS manager
to have an account added to every single group except
administrators.
Important safety tip:
Treat your content access account password like you would
your Administrator or other BackOffice account passwords.
Keep it a secret! It is, however, very safe if youre
using NTFS security (and I hope you are). The Catalog
Builder not only reads the file, but also reads the ACLs
(permissions) on the file. Your search will never give
anyone improper access to a file if you use Basic or NTLM
authentication on your server. It will check your name
against the ACLs in the search and produce only accessible
references.
Since we host our Internet presence on-site, I also included
our site as a Web crawl in the search. In order to decrease
the time it takes to index our entire site, I increased
the number of documents that the Catalog Build Server
could grab simultaneously. Normally, its five at
a time. That assumes youre hitting over the Internet
(50KT1 bandwidth), not a 100BaseT connection. In
the Catalog Build Server properties (the server object
itself, not the individual search), I changed the Maximum
Request Frequency on the Timing tab to include *.domain.com
as Unlimited. Now, the nightly crawls on that site are
finished before daybreak on the East Coast.
Our intranet was also a little different in terms of
configuring the crawl. Since I wanted to include the ACLs
on each of our Web pages, I had to do a little creative
configuring. Normally, youd do a Web crawl on an
HTTP server. In this case, I did a file crawl.
I started by exposing the Inetpub\wwwroot directory as
a hidden sharepoint. I then crawled a file crawl for each
subdirectory that housed a virtual Web (\\server\sharepoint$\subweb).
The trick is to set up a virtual mapping that
puts each UNC file name to a location on your Web server.
In the Mappings area (the URL tab in your individual Search
properties), I added \\server\sharepoint$ maps to http://server.
It worked like a charm. Managers get to all the secret
stuff, and no one else is the wiser. At least, not
until this article appears...
In order to maximize our internal searches, we set the
frequency to one full crawl per night and one incremental
crawl in the afternoon.
10
Things to Remember |
- Test everything. From your browser,
enter every base URL you intend to
crawl. If you cant see it in
Internet Explorer, Site Server wont
see it either.
- Crawl internal Web servers using
file crawls. This will preserve any
NTFS security you have in place. Then
map the crawls to HTTP access.
- Plan. Write out everything you plan
to search. Group items into catalogs
based on like content and frequency
of changes.
- Keep your computer names short.
Youll eventually want to run
LDAP for personalization support.
- Make friends with your Exchange
Administrator and DBA. Youll
need their help to get everything
connected and running.
- Read the documentation.
- Limit the crawls on big sites using
site rules. Only crawl those directories
of interest to your audience.
- Back up your search configuration.
Back up your Catalog Builder settings
(its a task in the MMC) and
each of your Catalog Definitions (also
a task in the MMC).
- Avoid installing Site Server into
the system partition. You may find
your catalogs filling all available
disk space and killing your server.
- Use site indexes when available.
Theyll give you better searches
than default or home pages.
Larry Cooper
|
|
|
Searching Exchange
The next step was to create a catalog that would browse
our Exchange public folders on a frequent basis during
the day.
The hardest part of this was the proper configuration
of both Site Server and Exchange servers. Our Content
Access account had to be granted Administrator rights
on the Configuration Object in our Exchange site. Doing
it was easy. Convincing our e-mail admin that it was safe
was entirely different. I also had to go into my Services
Control Panel Applet on my Site Server machine and change
the Site Server Search service to run under that account.
The tricky part was the configuration of my Search hosts
Exchange information. The documentation is somewhat lacking
as to the format of the name to use in this dialog. My
success came from typing in the name of the Exchange Server.
Not the UNC name with the NetBIOS backslashes or the fully
qualified domain name (server.domain.com). Took me a few
times to get it right. I entered the same information
for my Outlook Web Access (OWA) server name. The Site
Server Search engine assumes that youre hosting
OWA in the http://server/exchange directory (the default
for Exchange 5.x).
Searching the Rest of the World
From a standpoint of working with all the other sites
on the Internet, I learned a couple of lessons the hard
way.
First, Ive found out that Im a lazy Web-surfer.
I type www.domain.com and expect a home page to come up.
Unfortunately, when youre indexing, you may not
want to start at the home page. Instead, find the site
index and have your crawl actually start on the site index
(for example, http:www.domain.com/index.htm). That will
give you 100 percent of the sites information (assuming
the index is up-to-date). Also, dont guess! Use
your browser, and actually type in the entire URL to the
site. Then cut and paste it into your search catalog definition.
Since Web pages can be .htm, .html, .asp, and so forth,
its important to start off with the correct page
name and extension.
The other thing I learned is, sites can be really big.
Yep
I transgressed the borders of Olympus and tried
to index parts of the Microsoft site. And to my surprise,
it worked very well. Too well, in fact. Because http://www.microsoft.com
is so interwoven, I started getting lots of content that
wasnt in English or that I had no interest in. Im
sure that many people enjoy Microsoft Golf, but I didnt
need to spend time indexing its product homepage.
To solve this problem, I told my search catalog to crawl
hyperlink http://www.microsoft.com and then told it not
to crawl www.microsoft.com. Ill explain.
In your search catalogs properties, you have a
General tab where you set up a site crawl.
It starts on your defined URL and starts crawling. In
this case, I set it to crawl www.microsoft.com/siteserver/default.asp
and www.microsoft.com/exchange/default.asp.
To limit the site crawl, youll need to go to the
Sites tab and set rules. In my case, I set
up www.microsoft.com as a site, then created a few rules
within the site. The first rule: to avoid crawling www.microsoft.com.
I then set up two rules enabling crawling of the www.microsoft.com/siteserver
and www.microsoft.com/exchange directories. As a final
note, be sure to demote the avoid crawling
rule to the bottom of your list. That way, you get the
directories you want and avoid the directories you dont
want.
Once I had gone through all of the trials of building,
testing, and scheduling my catalogs, I had to create an
interface to give my audience access to the data.
While you may opt for using the Knowledge Manager application
and creating pre-built searches (briefs), I decided to
take the grassroots approach and use the sample site that
came with Site Server 3.0. Youll find it in the
Microsoft Site Server\SiteServer\Knowledge\Search directory.
Be sure you grab everything (including subdirectories)
in that sample.
I literally copied every ounce of the code into my Web
servers root directory. I then created a new default.asp
frame set to accommodate my sites banner and the
two sample pages (searchleft.asp and searchright.asp).
Then I was off and running.
Ive found that Site Server Searchs object
model is extremely easy to learn and follow. If you primarily
concentrate on infrastructure and the systems side of
life, youll still be able to decipher the VBScript
code used to build these pages. I found it to be a little
easier than using the .ASP pages for Index Server. If
youre new to VBScript, go to www.microsoft.com/train_cert/download/download.htm
and get the free self-paced course, Essentials of
Visual Basic Scripting Edition 3.0 for Web Site Development.
The Moral of the Story Is
As you know if youve been paying much attention
to the industry, knowledge management is a
buzzword for the new millennium. And because Site Server
3.0 fits into Microsofts knowledge management strategy,
its a product you can expect to hear more and more
about.
As youve seen from my story, the implementation
of Site Server Search is, in theory, a simple proposition.
What it requires, though, is a full understanding of your
environment and a decent amount of planning. (Or, if youre
like me, little or no planning and a decent amount of
time.)
Hence, the moral of my story