Tuesday, April 21, 2009

Microsoft Mythbusters Top 10 VMware Myths

I had just read through some articles and watch the video from Microsoft about the top 10 VMware Myths today, I would like to share my thoughts about the video details published by Microsoft with unbiased opinions.

The guys talked about Live Migration on HyperV on the next release. As a customer, I am always believed that the software provider should only commit and tell the customers when their products are ready, and not always tell the customer WE ARE READY IN THE NEXT RELEASE. This only explain as your product is not ready, what about your next release is in 2 years time? That meant the products is not as promise as you publish to the customers. Please take note that VMware had supported Storage VMotion for current version, which is another step beyond of Live Migration.

Cluster File system from Microsoft today will be far behind if we compare with the fault tolerance in vSphere. This is only able to match with what VMware had been done in the pass and not creating new technology into their product. They should take more innovative to come out something that VMware doesn't.

HyperV is not scalable as VMware does? This could be right depend how you want this to be compared. In virtual infrastructure, HA and DRS are both important pieces in the production environment which promise the load balancing and High availability. ESX 3.5 support up to 32 ESX host per clusters, which I think HyperV is not comparable at all. I think HyperV is far behind in term of the technology which customers are demanding. They may able to target the crowd which plan for development, test and training environment to be virtualized on HypverV, but not mission critical production systems. You can easily run 1000 of web server in the front end without HA or clustering to serve your web site today, with additional load balancer in the market which auto redirect the traffic to the servers, therefore when 100 / 1000 servers are down, and your web should still reachable. There is nothing should be proud to tell the customer how many Virtual Machine from Hyper V is currently running the Microsoft website. To run a web server in virtual machine today, is very common and not big deal at all.

About reliability, if we compare the system uptime for Linux and windows, which machine will we rebooted most and patch it most? I think you and myself should have the right answer in the mind. Even if VMware is utilizing the similar amount of resource to power up the ESX, but do remember, VMware perform smartly in term of resources management. It will smartly manage the resources to give the maximum utilization of the hardware you invested, plus higher ratio of consolidation. Of course, it does provide flexibilities to reserve the right amount of resources if require.

HyperV has the advantage to run on any hardware you like. This is something that VMware does not provide, as we are required to follow the HCL from VMware for each version of ESX we deploy. In most case, will we actually run our production virtual infrastructure to serve business need on a custom made server? or a mixture parts from multiple vendor which didn't fully tested of compatible purpose? Most customers today will buy the servers from DELL, HP, IBM and etc, which provide the best compatible from technology from motherboards, memory, CPU, storage and etc, which had been certified and tested before they sell to the customers. This had significant improved the server life spend and productivity. Therefore, I will conclude this advantages from HyperV does not really concern myself to select them as the hypervisor in our data center.

Management wise, I think they are trying to over sell their system center, which is a big step to lock customers down to their so call SCCM for everything in your environment. Why will you need to have pure microsoft to run in your organization while there are plenty of products available to be more reliable, cheaper and efficient. In our environment, we are trying to avoid to run Microsoft as much as possible due to the costly licensing term they apply to the customers. We run 85% of our system in Linux environment today, and I should said SCCM is not the right tools to manage my physical or virtual environment. If you want me to choose between Altiris and SCCM, Altiris might be a better choice to myself. SCCM may meant more to the pure Microsoft platform environment usage.

There are certainly more comments I can put here regarding the video I watched, but I think is just too tired to write up everything here. I think you guys who read this should have your personal opinion. My thought here is meant for share and more towards my environment. You may think differently from myself as the environment that you run may be different. Enjoy the video.

Offline VM Migration auto convert RDM to VMDK format

Due to some reconfiguration work we performed on our Virtual Infrastructure, we had required to relocate some of the VM to a different datastore. These VMs which required to be moved had been attached with Raw Device Mapping (RDM). Previously I thought the offline storage migration will not move the RDM over to the datastore as RDM is referring to the raw device from the SAN storage.Actually I was planned to convert the RDM to VMDK which I planned to manual transfer the files I need from RDM to the new virtual disk I created.


In a test yesterday, we found that a offline VM migration will auto convert the RDM which attach to the virtual machine to the VMDK format when I selected the data store to be moved in offline mode. This was really surprise myself and simplify my work actually, as I do not require some manual file transfer from the RDM to the new virtual disk.

Oracle to buy Sun Microsystem

Break news today which shocked most of the IT folks as Oracle had announced to buy over Sun Microsystems for 9.50 a share. I am really shocked to see this as last week, we were still talked about the deal called off with the possible merging from Sun Microsystems with IBM, which may end up IBM monopolize some high end computing market. Oracle could make a good move by taking over Sun. This may direct or indirect many environment which related to Virtualization, Java and MYSQL which is the strength of Sun Microsystems. There are chances to reduce the dependency of Oracle to Redhat and Enterprise Linux and move over to Sun Solaris or open solaris. This is more positive VS the buy over that suggested by IBM previously.

Oracle is no longer a software company today. With buy over for Sun Microsystems, they will able to deliver the hardware, platform, software, database, virtualization and etc. I think they will be more aggressive strategy to move themselves into the virtualization and integration market which may end up competing themselves with their partner such as VMware. This will definitely heat up the market or virtualization and hypervisor competition.

6 cores limitation per socket for vSphere enterprise

As the new licensing model from VMware vsphere 4, it clearly show that you may require additional 620USD per sockets to entitle yourself for the enterprise plus if which come with 12 cores per sockets, host profiles, distribution switch and etc.

For existing enterprise users, they will no longer entitle everything as they did in the past due to the new scheme that will apply by vmware. There is a clause which stated by the official documentation released from vmware.

" vSphere Enterprise is available for USD$2,875 per one processor with up to six cores for use on a server with up to 256GB of memory. "


This clearly stated as six cores per sockets is the max you can go if you are previous or new enterprise customer. Here is the concern now, as six cores is in the market now, and soon we will see 8 cores and 12 cores in the market too. When the hardware technology improve and provide more cores per CPU, we will end up require to pay for additional charges to entitle the features due to this licensing model.

Personal point of view, to provide alternative licensing model with new features should be acceptable, but it shouldn't fix the limit for number of cores to be allowed in each CPU sockets licenses. A customer may end to pay more not because the new features they really need, it may just purely due to the maximum number of cores per socket is allowed. I hope VMware should reconsider the clause they had included in the release.

Friday, March 6, 2009

Calculation of Max LUN Supported in ESX Server

I found my ESX servers could not recover the 65th LUNs that I tried to present to it and myself did log a support call and still pending the reply from VMWare. Beside that, I found another interesting article with the details below.

Article Copy from VMWare

In Multipathing Configurations the Number of Paths Per LUNIs Inconsistent
The hpsa driver in ESX Server might reduce the number of supportable LUNs below the expected maximum limit of 256 when the controller is used in multipath configurations. In multiplath configurations, if all four paths are configured, the total supportable LUNs is reduced to 64. In certain multipath configurations, because each target path consumes an available LUN slot, the total number of supportable LUNs might be reduced to 60.



Workaround
Reduce the number of LUNs on a server until the product of LUNs and paths is less than 256 (LUNs * Number of paths < 256), and if necessary, reduce the LUN count depending on use of multipath until each LUN has the expected number of paths.
The following example shows a configuration with the maximum supportable LUNs presented to an ESX Server installation on four paths, providing all LUNs with the expected number of usable paths:
Path 1: 63 LUNs seen through this path; Total LUN count (63 + 1 path) is less than 256
Path 2: 63 LUNs seen through this path; Total LUN count (63 + 63 + 2 paths) is less than 256
Path 3: 63 LUNs seen through this path; Total LUN count (63 + 63 + 63 + 3 paths) is less than 256
Path 4: 63 LUNs seen through this path; Total LUN count (63 + 63 + 63 + 63 + 4 paths) = 256

If I do use the formula above to calculate my environment, yes, I am at the full limit of 256 LUNs. I have 2 ESX servers which only have 2 HBA connection, and had no problem to present more than 67 physical LUNs to it until now. What I had done now is, I removed 2 HBA connection from each of my ESX servers, and run the rescan, and I found that the LUN is presented as I expected. Again, I am not confirmed with the solution yet and will do another round confirmation with the VMware engineer on this.

Manual commit snapshots delta file to vmdk flat file

I had a tough time this week to deal with the snapshot issue with one of the VM. The VM is containing an important snapshot that previously taken for system restoration. When I browsed through the snapshot manager from vCenter, the system show my VM was running without any snapshots. Here was the kicked start of my problem and excited journey until I managed to recover it this morning.

I tried to SSH to the ESX host and browse to the specified datastore, and I found the snapshot file which end with file extension .vmsn were available in the correct location. No matter how many times I tried and rebooted my Virtual center, the snapshot were not visible to the snapshot manager still.

I read through some articles and forums which suggested to clone the snapshot by using vmkfstools -i option, but it didn't success in my case here, and I continue my research and here I found a useful blog post from 1 of the blogger Oliver O'Boyle who experienced similar issue previously.

After I read through his article, which explained the chain within the CID and parent CID, it does help me to resolve my issues. I found that the root cause of my VM was due to the snapshot problem & vmdk config file corruption. For snapshot issues, we can recreate a new snapshots and we select to delete all snapshot afterward, it should force the vmdk flat files and delta files to be committed. In 1 of the virtual hard disk, we experience difficulty as the ESX servers will force the virtual HDD to be detached from the VM. The root cause of that was caused by the file missing on the parent file which should be VMxxxx.vmdk.

During this troubleshooting, you should ensure that the delta files and flat files are always retained and not overwritten. There are 2 delta files which end with VMxxxxx-000001.vmdk and VMxxxxx-000001-delta.vmdk. Your flat file should end with VMxxxxx-flat.vmdk. The 1st thing I did, was to ensure the virtual disk was able to re-attached the vm. I had manually created a new vmdk config file follow the guide from the Oliver O'Boyle, and I copy the parent CID and virtual disk value number require. I had manually configured the link within .vmdk and the flat file. After that, I was able to attach the virtual disk back to the VM from virtual center. Please take note that the virtual center will not see the flat files as the attachable virtual disk, as vCenter recognize the virtual disk base on the location of .vmdk. Recommended to keep the .vmdk and flat file within same datastore. You can also relocate the vmdk files to different datastore if you wish to do so.

Once the virtual disk had been attached to the VM, boot up the VM immediately. Please log in to the system and ensure everything is in normal and functioning correctly. The data I contained now, wasn't the latest data I needed as the result of the missing snapshot which was not committed by the system. Now, I take a new snapshot for my entire VM. Once I had done that, datastore in SSH showed up with plenty of delta files and newly created VMDK files which end with VMxxxxxxx-000003.vmdk and so on.

Here are the steps been taken to commit the snapshots manually

  1. Power off the VM
  2. Right click the VM and select edit settings from vCenter and select the virtual disk that you are trying to recover. The system will show which vmdk files this virtual disk is pointing to
  3. Copy down the file names and go back to your SSH screen
  4. Replace the VMDK and delta files that you previous retain from your original snapshots which you are recovering with the FILE NAMES that you copy on step 2
  5. Open up the snapshot manager for the VM, and select delete all snapshots option. This process will take time as it depend the size of your delta files require to be committed.
  6. It should stuck at 95 % or time out, but the system will still continue to commit the delta files back to the flat files. In my case, it took more than 2 hours to delete the snapshot
  7. I noticed the ESX server load and disks activity increased from the performance chart
  8. Once it completed, all the delta files will be deleted and everything should be back to normal
  9. Power on the VM and double check all the data and mount point and I found the system was back to normal

Sunday, January 11, 2009

Fault tolerance on Windows 2008

Fault tolerance had been recently added as an optional features on windows 2008 which provide a similar features to the Tandem Non Stop concept to minimize the system down time from minutes to seconds. As we notice, this technology will be down the road by VMWare on the next release ESX 4. I not sure whether Microsoft will consider to offer this in the virtual environment rather than the physical, as more users is more interesting in Virtual Vs Physical Servers today. Let's see what will be the next from Microsoft, innovative portion, I m glad to see this from Microsoft.

Windows 7 available for public beta

Windows 7 had been available for public beta. You can obtain a copy of this through the MSDN access or https://msdn.microsoft.com/en-gb/subscriptions/securedownloads/default.aspx

I just installed a VM in my ESX servers and found the performance is good if compare to Vista. I will spend more time to reimage my current workstation from Vista to windows 7 for further test out. Hope Microsoft will learn from the disaster Vista and really deliver the useable version of Windows 7 for the next release.

Balance in between In source and Out source for IT services

Recent big news and the hottest topic of the financial and IT market globally is really focusing on the Satyam Computers financial scandal which similar to Enron previously. It had significant impact to the client who use the services from Satyam for their day to day operation. Personally, I will see that the impact to the company who use Satyam as their IT our source vendor, which may result the significant cost increase as they original plan. Now is time for the company to reconsider the balance in between outsource and in source. This topic is nothing new to the world as always some claim outsource is better and some claim in source is better.

My point here is, both of them is the same, at the end of the day, CIO or CEO will only interesting to manage the figure of operation expenses as well as the result to be able to achieve. Problem of choosing which is the better deal in term of outsource or in source is purely base on people issues. IT is a business, is generating revenue and stop thinking IT as a COST will help the company and the business to grow faster than the competitor. Non of the success company today is not rely on their IT infrastructure to simplify and automate the business process with 24x7x365 availabilities. The Satyam case had provided the alert to the management to rethink the best option to run the IT services to the business. I am always believe the balance in between in source and outsource will help the operation rather that rely fully on 3rd party vendor.

Saturday, January 3, 2009

Microsoft will lay off 15000 employees

Latest news I watch today, Microsoft will lay off 15000 employees worldwide. It will be consider a Big "Present" to the employee with the new start of Year 2009. 17% of the lay off will be impacted to the MSN team as the source told. Personal point of view, it had been cash, stock price as well as revenue issue with Microsoft and force to make such a decision at this time could drive the situation in US to be worst.
 
Site Meter