Dr. Strangeprinter or: How I Learned to Start Worrying and Hate the Cluster

Dr. Strangeprinter or: How I Learned to Start Worrying and Hate the Cluster

Tuesday Feb 12, 2013


A few weeks back I was onsite working with a printer/mfp manufacturer on a set of strange printer issues. I won't go into too much detail about the problems (something for another day), suffice to say corrupted printer drivers on the server were making their way down to workstation and creating havoc. This caused users to lose the ability to print, re-install a driver, or even delete drivers from the workstation. This started with just a handful of systems but the problem seemed to be escalating at an exponential rate and the complete impact could be the entire set of 1100 printers and 10,000 users.

The solution was far from simple, due to locked files, limited security, traditional spooler.exe dumbness and so on. We had a significant issue on our hands with no easy solution in sight. After days of discussion and monitoring of the ever increasing problem it was determined that we would try to clean up the print cluster. This was an active/passive cluster and a production environment, with no test environment for us to beat up on. We started by attempting to add a new driver in hopes of moving some of the print queues to it and allowing this to propagate to the workstations. We’ve had previous driver installation issues on this box and were not convinced this would work but it was worth a shot based on the situation. The result, a CPU spike on spooler.exe and the ultimate requirement to move to the other cluster node. An unscheduled 20 minute outage…ouch. The larger issue, all of the problems migrated as we moved to the other node. It was clear that the cluster had significant issues and would need to be rebuilt.

How would we get the 1100 queues onto a new cluster? Print Migrator is a great idea with one major downfall, it pulls the drivers with it and that was our core issue.

Next option, script out all the printer and port info and run this against the new cluster using prnport.vbs, prnmngr.vbs, and prncnfg.vbs. One problem, these tools don't work against a cluster. Ugh.

LAST OPTION… DITCH THE CLUSTER.

Clusters are great, unless you have something better, virtualization. Sure it's a departure from what the client was using before but using a virtualized, non-cluster instance of Server 2008r2 gives us access to a vast array of code and scripts that we can run to automate things. Also, the active/passive cluster we had was really only useful if we had hardware issues, with virtualization we can snapshot, test new drivers, roll back, and best of all, clone. If there is a hardware issue there are various tools we can use to move the VM to another piece or hardware and utilize VM load balancing to handle spikes or an increase in processor requirements.

So it was agreed, ditch the sinking ship that was the cluster in favour of a standalone server on a VM. Oh, did I mention it was Wednesday afternoon and if we wanted to do anything major it would be best over the weekend as this server wasn’t going to last much longer. Add to that , it’s basically a 24/7 operation with critical print needs. Could it get worse? Of course, the datacentre was moving that weekend (nice timing) so our only option was to implement Friday at 5. I love a deadline...decisions are made fast when up against a time constraint.

The last major hurdle, how to script out the current data so we won't have to recreate 1100 print queues manually. A simple export from Print Management Console gave us all the printers with name, share, comments, and location, along with the current driver they were using.

Next, pull a similar report pulling the ports and what printers were bound to each. The good news here, the port name was the same as the IP Address. I’m not sure what I would have done if this wasn't the case -- maybe a registry dump.

Next, a few hours with Excel doing some vlookups, string concatenation, list filtering + some major validation drove out a solid list of printer name, IP, location, comments, and current driver.

cluster.png

Next, install the drivers we actually want to use and update the Excel file using another vlookup replacing the current driver with the name of the new driver(s) we wanted to use.

Lastly, turn all this into 3 scripts.

  • Create the ports using the IP
  • Create the printer and bind it to the port
  • Share the printer, add the location/comments.
EXECUTE...<ELAPSE 4 HOURS>...951 PORTS AND 1186 PRINTERS CREATED WITHOUT USER INTERVENTION. WHO LOVES SCRIPTING...I LOVE SCRIPTING!

Okay, so the new server is ready but I don't want to touch 10,000 PCs and recreate connections to the printers. In theory, if the print server matched the cluster name the client workstations should just re-attach to the printer share (named the same as before) and not know any better.

Our 5pm emergency schedule outage was upon us. We downed the cluster nodes, removed the cluster and all its child objects from Active Directory, changed the IP and name of our new server to match that of the cluster and reboot the new box. One last step, re-publish all of the new queues to AD.

Presto. After careful testing and confirmation that existing clients (XP/Win7/32/64) would simply connect, get the new driver and print without issue it was now 6:15...Off to the pub for a celebration pint!

We weren’t out of the woods, what about scale? What happens at 8am on Monday when the bulk of the 10,000 users connect and print? The weekend would be a good test as many departments are around the clock. No panics on Saturday, or Sunday…Fast forward to Monday. No printer support calls at all.

I’D CALL THAT ONE IN THE WIN COLUMN!

Next steps, new test environment, and new production server to move some of the load too.

In the end, I won’t install another print cluster again. They seem like a good idea but when push comes to shove, you want something that’s flexible that you can script because I wasn’t about to recreate 1100 queues manually!

Special thanks goes out to JB & BP (you know who you are).