Rabu, 25 Februari 2009

Rolling out from the hills..

Sometimes last week we're rolling out our ID manager application to production. Before rolling out we have a change management meeting with the client, about how the roll out will happened, which of the client infrastructure will be impacted and need to be included with the roll out, the standard basic stuff.

We already prepare a documentation about rolling out procedure with the estimated time frame. Based on that we do a roll out simulation. The simulation was good, we capture some information that wasn't given by the client that we need for roll out, some problem with our deployment scenario, but as usual, there always be something on the D-day :)

As we roll out, the part that we simulate run smoothly, some minor things happened but isn't something that worry us so much. The problem started when we start installing the gateway and password synch on Active Directory. FYI this part of the roll out that we didn't do simulation on, because we don't have our private AD server, so we only use the AD resource on the UAT environment. It turn out to be a critical point.

As we start to install we're having problems, i mean real problem, our installation show some error problem. Installing gateway and password synch showing that the administration account need more privileges , which is weird. We're doing fine when we setup both the gateway and password synch in our UAT environment.

It turns out that in Windows 2008 environment, when you run a command prompt. You need to specify that it run as administrator, this can be done by right selecting the command prompt icon and choose the run as administrator option. As in our UAT environment we didn't need to do this since we're running on Windows 2003 server.

This is another problem. Because the AD UAT environment Isn't Exactly the same as the production we have lack of confidence when we're facing compatibility/versioning issues.

One other thing worth mentioning is the JMS Setup. We're using weblogic application server. In our UAT Environment we managed to setup 2 server instance. But in production there's only one, so the JNDI for the JMS need to be set to a different name.

The biggest problem that take us days is a reconciliation problem. After successfully installing and setting up new resource, uploading our code, executing clean up script. We finally in the last 3 step of our roll out plan. It was one the process that took the longest time, since we're re-conciliating around 20.000 users.

The first reconciliation failed after couple of minutes., the reconciliation process failed, because the error in the error log over the 100 error threshold limit. The error message we're receiving is, ADsOpenObject(): 0X8007203A: , , The server is not operational. When we check test resource connection it turn out the connection to the AD server is intermittent. it turn out that the domain controller we're having is on a different geographical location with our Identity Manager server. This shouldn't suppose to be a problem but we decided to change to a domain controller that located in the same site with our Identity Manager Server and restart with the reconciliation.

The painful thing about reconciliation is that it need at least one successful reconciliation before it can do incremental reconciliation. After waiting for couple of hours the reconciliation failed arround 50% and this time taking the AD service down with it. Now we're really in trouble. We tried other work around like using bulk action and loading from file. But that doesn't succeeded either. And we also didn't have the full confidence in blaming the AD part since the environment is a bit different with our UAT. And also because we're using IDM version 6.0 with SP1 that didn't have official support for Windows 2008 server. But suggesting an upgrade at a time like this would mean suicide and take longer time to implement. So we're stuck.

After many long night, a successful test with the windows 2008 in the UAT environment , a support ticket. We finally managed to do a full reconciliation. But only after we restart the gateway server.

To do a summary, you need to look at the details on this things in your roll out procedure :

- The Exact Server Version of the Resource, is it still supported, did the internet show a lot of problem with the version of the resource.

- Don't install the gateway services and the password synch on a Domain Controller, because when you need to restart the server there's a lot problem that you will faced, technically and bureaucratically.

- Do a simulation so you'll find out if there's some data that you still need from the user.

- List all of the xml object that you will upload to the server. Make sure it wouldn't overwrite a configuration on the server like a Resource, config, etc.

- Prepare alternative method if your data loading process failed

Okay its time for some rest :)

Tidak ada komentar: