Archive for Tayori Limited

Dark site operations with AutoSys

Posted in AutoSys with tags , , , , on 6 April 2009 by Hendry Taylor

It has been a little while since the last blog and I have been debating what topic I should write about. The other night this topic suddenly came to mind, I think there are a number of reasons for that. Some of the reasons are as follows:

  • Not sure how many sites are doing it
  • Not sure how many sites are contemplating it
  • Not sure how important it is

    I will talk about some concepts, which might or might not be supported by software. I am not seeking to promote any specific software but rather a concept. What one does need to bear in mind is that even though most or all of the software required to achieve the unattended operation of AutoSys is available from CA, other vendors do also have products in that space. The one really big advantage of using CA software is the close integration between the CA products.

    For clarity I will share my idea of the meaning of Dark site operations. Dark site operations is/was also known as lights out operation. This means that the idea is to have as much of your data center operations automated as possible to enable your data center to operate with no or little human intervention. The initial term of lights out operation came from the idea of having your data center operations automated to such a degree that you turn off the lights and lock the data center without having to return. We all know that this is a rather ambitious desire, but I believe that we can get very close.

    I will focus on achieving an unattended AutoSys operation, not your entire data center. To achieve the unattended operation of AutoSys I have broken it into several sections below.

    Job Failure monitoring

    The first function or section we need to consider is Job Failure monitoring. This is possibly one of the easier tasks to do. Firstly we need to have systems or network monitoring solution like CA Unicenter NSM or Netcool to which we can send the SNMP traps for the job failures. We also need to have a helpdesk system such as CA Service Desk or Remedy. Again here the advantage of the CA solutions is the out of the box integration with them. When there is a job failure we need the Systems monitoring tool to be notified. Ideally at that point you want to enrich the data with information such as Re-run information, call out procedures, helpdesk ticket queues. Once we have enriched the data we want to automatically raise a issue on the helpdesk system and have it assigned to the correct queue for resolution. The helpdesk system will naturally take care of the escalation and resolution SLA monitoring. Once the relevant support team has fixed the error that caused the job to fail, we need to have a mechanism for the restarting of the job. The ideal solution would be a Process automation tool where the automated process could be triggered and then it will then acquire approval for the restart of the job and once that has been obtained it will automatically issue the FORCE_STARTJOB to re-run the job. The circle needs to be closed with the resolution of the helpdesk ticket when the job completes successfully. The process should also allow for a job not being re-run but the next job being force started, or the job just being changed to a success status.

    AutoSys Maintenance

    Now we need to consider the maintenance of AutoSys itself. Firstly there is the DBMAINT tasks that need to be performed, by default they are run automatically by AutoSys. There is no notification of any failure other than log entries as the DBMaint is run internally to AutoSys. So the first thing that should be done is to split the DBMAINT script into a number of AutoSys jobs so that if there is a failure the normal job failure process takes over and someone is notified to resolve the problem. Some additional tasks should also be added to the maintenance jobs, some being the clean_files utility and the chase utility. There are also some log files that need to be archived, and/or deleted and a script should be developed to do this.

    AutoSys error monitoring

    Firstly you need to have a similar automated process in place for AutoSys related errors that are sent via SNMP traps to the systems management tool as that which exists for job failures. Where possible we can have automated recovery processes created as the resolution. An example would be if your AutoSys environment is running in HA and there is a failover. When the failing component is repaired you want an automated process to do the autobcp to re-synchronise your databases and start AutoSys back up. The AutoSys log files should also be monitored for error or failure messages that do not generate SNMP traps. Something else that you might want to monitor for and generate alerts for is any of the agent machines that go offline as that could cause delays to the batch.

    AutoSys performance monitoring

    You need to have some performance metrics gathered from AutoSys and automated trending done to create alerts if the performance goes outside of your acceptable range. The performance monitoring I am talking about here is Average latency, un-processed queue length etc. In most instances you can have some automated data gathering when performance is outside of the accepted boundaries which will assist the person who would get the ticket assigned. In fewer cases there might be some automated mitigation processes that could be run.

    Automated SLA monitoring

    For the SLA monitoring you would use a tool like JAWS which not only does the SLA monitoring and reporting, but can also generate alerts for SLA breaches or even possible SLA breaches. The alerts can be sent to an SNMP manager and the whole automated ticketing process can be utilised.

    Automated reporting

    The business objects reporting server provided with AutoSys R11 allows you to schedule the running and delivery of defined reports. Reports can be published to a website or to a SharePoint server. Alternatively reports can be saved on a central location or emailed to a user or distribution list. Access can be granted to users who require reports so that they may run them as and when required.

    Automated job promotion between environments

    The next big hurdle is automating the promotion of jobs from one environment to the next. Normally there would be 3 or 4 environments that jobs need to migrate between. The typical environments would be Test or Dev, UAT or Staging and Production. There might be a Integration testing environment between Test/DEV and UAT/Staging. The ideal way to automate the promotion process is through a Business or IT process automation tool like the CA IT Process Automation Manager (ITPAM). Using an IT process automation tool means that it is more structured and there is an audit trail of each promotion. Ideally you would want to include version control for the JIL files so that you can rollback to any known working version of the JIL. If you do not have an IT process automation tool then it can be done using a script and AutoSys jobs, I have actually implemented such a system, which I hope to migrate to an IT process automation tool in the not too distant future.

    Automated take-on

    This section follows along similar lines to the Automated job promotion section above. When I talk about automated take on, I am referring to new applications being added to an AutoSys environment. Here again I would suggest that an IT process automation tool is the best way to achieve this. Some of the tasks you would need to automate here would be defining the new machines to AutoSys via JIL, adding windows functional accounts to autosys_secure, and defining all the EEM policy required for the new application.

    Automated security management

    This follows on from the automated take on section. Here we want to automate all the processes around EEM policy changes. An example would be if a user moves department and thus they will be working with a different set of AutoSys jobs, the EEM policy needs to change to remove the user from the old jobs and be added to the new ones.

    Conclusion

    All of the above automation processes should include audit trails, automated approval systems and reporting. If we had to achieve all of the above we would not need any AutoSys BAU staff, only AutoSys support staff for when there was a problem with AutoSys itself. The reason for not needing BAU staff is that job failures would automatically be resolved by the relevant support teams, and all other BAU work is automated. Your BAU staff can then start working with the users to optimise their batch processing and take full advantage of the abilities of AutoSys.

    I do not know of anyone who has gone the whole way, but some environments are certainly on their way and have achieved a fair amount of automation.

  • Evolution of Scheduling

    Posted in AutoSys with tags , , , , , on 12 March 2009 by Hendry Taylor

    For this blog entry I have to thank Jonathan Mcalroy from Citi Group who wrote it. I have not modified it as I did not have anything to add. I think it is a very insightful look at scheduling, and instead of merely stating what is good or what is bad it focuses on what we should be doing and what we should be striving for. After all should we not be evolving instead of complaining, make it better or use it smarter is ultimately the better option.

    I was asked recently about why we should move to r11 and what will be the long term benefits. It’s good question and I should have had a quick answer ready. I didn’t but fortunately for me it was an email exchange so I had time to properly formulate an answer. As it’s the anniversary of Darwin’s birthday, it’s timely and ironic that we can maintain our evolution with a little intelligent design.

    For the reply, I could have rattled off a few paragraphs about the wonders of extended calendars and the marvels of look back dependencies. But I knew this questioner would push deeper and ask what benefits they would have. To answer the question therefore, I had to show the progression that all schedulers have been making in the last few decades so that I could then show the benefits that will follow from further advances.

    An analogy I like to make is to imagine a line representing the business logic in a batch, if the green segment represents the logic held in the scheduler, then the blue portion is the logic held in the script executed by the scheduler. In the bad old days of Cron and Windows Task Scheduler, very little of the logic was in the scheduler. The vast majority of it was defined within the script that was getting executed.

    image001

    As the early versions of AutoSys, TWS and Control-M came along, the amount of logic that could be included in the scheduler increased as it became possible to reuse the same job in different places and have conditions to define their execution.

    image002

    As the schedulers have become more complex the number of job types has increased meaning that developers have had to do less developing in order to perform regular tasks. The 4.5 version of AutoSys has 3 (really just two) job types, whereas r11 has over a dozen and the ability to define new jobs. ITPAM has over a hundred possible ‘actions’ and new actions can be defined and made available to every other user.

    The benefit of the migration of logic is that once it’s in a scheduler it’s very easy to re-use it in other places. Whether it’s a job that FTP’s a file or a job that executes a stored procedure, if it’s centrally managed it only requires definition once. This means that the pieces of logic they represent can be reused by multiple applications.

    This means that instead of your batch just containing your logic, it can now contain lots of different groups logic in the right places. Your DB calls are managed by the DBA’s, the Grid calls are managed by the Grid team, the Business Objects feed is managed by the business objects team. Etc etc.

    image003

    The ultimate goal is that a batch is reduced to a number of parameters that call pre-existing jobs to perform a required task.

    In parallel to this line of evolution, the scalability of the scheduler has also had to evolve to keep up and handle all of these extra responsibilities. I remember using AutoSys 3.4 at Lehmans (RIP) and being shocked to hear that some sites ran two eventors in order to handle the workload. Fast forward a couple of years and the first time we ran 4.0 and wondered what EVENT_HDLR_ERROR meant.

    In order to maintain evolution, the possibilities must always race ahead of the requirements. Every time we tell a developer, “No, AutoSys can’t do that” or “Hmm, you’ll have to script that” evolution is delayed. What we should be saying is; “Yes, the ABC team does that” or even, “Yes, here’s a wrapper script”. I’m not suggesting for a second that we start taking ownership of the business processes, but we should give our infrastructure colleagues the opportunity to get ahead of the curve and keep growing.

    So in summary what does r11 give us? It might not seem like much but with a few new job types, the odd blob here and the WebService there, we can keep the line moving to the right and grow the expertise of our users and free up their resources.

    “Ignorance more frequently begets confidence than does knowledge: it is those who know little, not those who know much, who so positively assert that this or that problem will never be solved by science.”

    Charles Darwin

    The Descent of Man

    Tayori Limited

    Cost of change

    Posted in AutoSys with tags , , , , on 28 February 2009 by Hendry Taylor

    Well it is time for my second blog entry. This time around, I have chosen to discuss some of the hidden costs of changing from one product to another that pretty much delivers the same functionality and results.

    Firstly a cost of change analogy:

    If we take a simple home scenario of changing a car to help understand some of the hidden costs we do not always think about.

    So you decide that you want to trade in your large gas guzzler for a more economical car, no harm in that. So you start by determining if the numbers add up, if you bought a smaller car that has a manual transmission instead of an automatic transmission, which is more economical than a larger car. You chose the Manual transmission because it should be more economical. The figures will add up as follows:

  • Lower monthly car instalments
  • Lower monthly fuel bills
  • Lower insurance premiums
  • Based on financial analysis you go ahead and change your car.

    Now the hidden costs start showing their ugly head, your wife, your son and your daughter need to go for additional driving lessons as they have only ever driven automatic transmission cars and do not know how to drive a manual transmission car. Then there are a few mishaps that result in damage to the car but no one is injured. This means claims against your insurance, which means you loose your no claim discount. Then because you have made claims your premiums go up. While your family is learning to drive properly with the manual transmission they actually use more fuel than your old automatic transmission car did. Then along comes your first family holiday with the new car and you find that because the car is smaller you can not fit all the luggage into the car. So you either need to buy a trailer or rent one, again additional cost that would not have been incurred with the old larger car.

    Net result, after a year your new cheaper, more economical car has actually cost you more.

    Now lets move on to the technology related discussion.

    Decisions of which technology is deployed to solve a particular problem is always taken at a higher level. The biggest contributing factor is always cost, and ROI. The cost of ownership and ROI is very often presented by the vendor and although fairly accurate, omit a couple of factors. Here are some factors often overlooked when migrating from one technology to another. These are factor that either myself or friends of mine have witnessed, this list is by no means exhaustive or complete.

  • One of the the most commonly overlooked factors is the cost of retraining the users on the new technology.
  • On a regular basis all the peripheral scripts and tools that have been developed around the existing technology.
  • If we focus on Workload automation, very often some of the batch streams need to be redesigned because the new technology does something differently.
  • More often than not there is a great deal of work that needs to be redone with regards integrating the new technology into other areas of IT.
  • All of the above points can be quantifiable in turns of cost to the organisation in monetary terms with regards to time and loss of productivity.

    There are also some soft losses when switching technologies, and these include:

  • Relationships that have been built with the existing vendor. This sometimes results in additional benefits such as discounts or even early access to new releases.
  • The support staff going from experts to novices.
  • Familiarity moving to apprehension or uncertainty.
  • In conclusion, I am in no way saying that we should not change technology. I am just wanting to ensure that we see the entire picture and know the true cost of the change. There are times when the newer technology is the better option even if it costs slightly more to change. I guess my ultimate goal is that we make informed decisions based on factual data and not make decisions based on emotions.

    To R11 or Not to R11

    Posted in AutoSys with tags , , , on 16 February 2009 by Hendry Taylor

    For a number of people the decision to upgrade to AutoSys R11 or not is a weighty one. I do not intend to make light of it or suggest I have all the answers, but I will try to shed some light on what considerations should be taken into account when trying to decide. I will also suggest some reasons why you should upgrade to AutoSys R11 and some reasons why you might delay your upgrade to AutoSys R11. I will also share my view as to which release of AutoSys R11 you should consider going live with.

    In my opinion it is a bit of a no-brainer. You should move to AutoSys R11, it is just a question of when and how. Please bear in mind that these are my personal opinions based on experience and not necessarily the view of CA. CA can provide you with best practice advice, if needed, and I can obviously provide you with best practice advice based on my experience, which may differ from CA.

    What to consider:

  • License implications of DB clients required for AutoSys 4.5

    With AutoSys 4.5 each agent requires a DB client for your DB of choice and that can incur a considerable cost, unless you have an enterprise license with unlimited DB clients. AutoSys R11 does not have the requirement for these DB clients and can thus save you a considerable amount in costs. The non requirement of DB clients also simplifies the agent deployment to the Agent machines.

  • AutoSys User Interface requirements or preferences

    If you have a preference for one of the AutoSys 4.5 GUI tools you might consider remaining on AutoSys 4.5 because with AutoSys R11 you only have WCC as the user interface. I personally find WCC far superior to the old AutoSys 4.5 GUI tools. This is especially true if you are looking at WCC R11.1 SP1 which I would suggest is the best version to deploy. WCC R11.1 SP1 includes useful tools such as the APP EDITOR which is like a web based version of JOB VISION.

  • Security requirements

    With AutoSys 4.5 you had the choice of Native security or eTrust Access Control (EAC), and with AutoSys R11 you have the choice between Native security and eTrust Entitlements Management (EEM). Firstly it is a no-brainer that you should utilise either EAC or EEM as without it you do not really have much in the way of security and no granular security. With out either EAC or EEM you will also not be able to achieve SOX compliance if that is a requirement. The differences between EAC and EEM are quite marked. EEM has a less complicated architecture and, when combined with AutoSys R11, all security checks are done via the AS-server. This simplifies the overall architecture and reduces points of failure. EEM also has built in HA which EAC does not.

    Why you should upgrade:

    I have mentioned some reasons why one should upgrade to AutoSys R11 but I will elaborate here. For me the foremost reason for moving to AutoSys R11 is performance. I have seen a marked performance improvement between AutoSys 4.5 and AutoSys R11. Based on performance monitoring I have done, the average lag time in AutoSys 4.5 is at best 7 seconds. Whereas the average lag with AutoSys R11 that I am observing is under 1 second. There are many factors that can negatively affect performance.

    Another major reason to upgrade to AutoSys R11 is the new architecture with AutoSys R11. Namely: the non requirement for a DB client, the introduction of the AS-server, and the instance independant agent. The introduction of the AS-server means are no longer required and we have a single point of interaction with EEM. It also means that, because we can have multiple AS-servers, we can spread the workload around. The instance independant agent means we have one agent on a machine, regardless of how many instances might use it. The AS-server list is passed to the agent, by the scheduler, when a job is started. This means there is no need to configure the agent.

    EEM is also a really good reason to move to AutoSys R11. A single EEM instance can be used to provide security for AutoSys R11, WCC R11 and JAWS. This simplifies the overall architecture of the environment and also means you only need one integrations point to your LDAP, AD or Siteminder. EEM also has built in HA capabilities which is very important when you have the business or parts of it reliant on Autosys to ensure timely batch completion.

    Why you might consider delaying your upgrade:

    You might consider remaining on AutoSys 4.5 if you have particular skills in AutoSys 4.5 or you have a large number of users that would need to be retrained to WCC from either the Motif GUI’s or from the Web Interface. Having said that, I do not feel that those two reasons should warrant staying on AutoSys 4.5, as they are both easy to overcome. The only valid reason I can really think of to remain on AutoSys 4.5 is if the performance you are getting is acceptable and you do not have the technical skills, or resources, available to do an upgrade. The technical skills unavailability will only be an issue if you do not have the funds to get a consultant who has worked with AutoSys R11 to do the upgrade for you.

    Conclusion:

    Well I hope that I have provided you with sufficient information to make an informed decision, I am always available should you want to discuss or bounce ideas off me.

    Hendry Taylor

    Tayori Limited