Case study: Cyfronet / PLGrid

About PLGrid

PLGrid logo

PLGrid is a consortium of the 5 largest academic computer centres in Poland. The consortium goal is to support Polish scientists by providing free computing power and storage resources through a nationwide PLGrid Infrastructure. Apart from the resources it also equips scientists with teamwork tools for teleconferencing, project management and wikis to make their collaboration more efficient. The consortium is led by Cyfronet which is also a FedSM EU project partner. Currently the PLGrid Infrastructure provides 35000 cores and 2PB of storage space for more than 3000 scientists. Very soon it will grow bigger so the Infrastructure will enable Polish scientists to use in total 75000 cores, 20 PB and also cloud PaaS resources.

What were the main motivators for considering IT service management (ITSM)?

Our 'ITSM experience' started around 2 years before the start of the FedSM project. It was driven by several factors. First of all, more and more Polish scientists wanted to use our Infrastructure so we had many people coming asking for the resources but no uniform way to express our services and on what terms we are offering it. We needed to find an appropriate approach to manage a significantly bigger workload on operations staff while at the same time being able to offer our customers guarantees. We started to encounter users asking if they will be able to finish computational part of their research at a specific date and it was a new phenomenon for the computer centres serving the academic community. It became clear that something must be done to streamline the operations and reach requested service quality levels.

So this was the user-driven motivation. On the other side we had to consider upcoming changes driven by the provider needs. We were, and still are, an infrastructure providing services in a federated manner and it is a big challenge to coordinate such provisioning. The federated nature of the infrastructure influences the way in which the service is designed and how it is provided. In a federation all members may contribute to  service delivery. Thus in designing a service all federation members need to agree on the conditions and they must comply with their local policies. This makes management of federated IT infrastructure hard. To convince PLGrid federation members we had to stick with some good approach to drive changes in operations.

How did you approach the adoption of ITSM?

So after we discovered and convinced our partners that going beyond "best effort" level of support may only be done with adoption of some IT management standard, we started looking. We did hear about ITSM as an idea but didn't have much experience in it. The quest to discover ITSM began. And obviously back then, when you said out loud, or entered in Google, 'ITSM' the term you got back was ITIL.

So we tried it. We bought ITIL books, we read them, we discussed about what we have read and ... got stuck. The amount of information we learned from the books was impressive but we didn't really know what to start with. After a couple of weeks it was possible to plan improvements to some operational processes, but it was hard to get an overall idea for getting further than "best-effort". So our overall impression was that we did a lot of reading but we were still not sure what is important in PLGrid case.

But to be fair, after our first ITSM journey with ITIL we did set up the first prototype of PLGrid grant system - a uniform way to negotiate, allocate and account for PLGrid resources. After this achievement we knew that ITIL is too 'big' for us so we started to look around after something more suitable. And it didn't take us too long that we encountered an opportunity to participate in FedSM project.

What were your main reasons for choosing FitSM as your guiding ITSM framework?

Definitely guidance provided in the FitSM standard. When we started adapting it, we knew what steps needed to be taken first because the standard told us! The way of describing FitSM processes is very helpful for people introducing it as it clearly shows how to do it. There are guides, templates, examples, really everything you need. I have already mentioned ITIL being too complex for our Infrastructure so obviously the lightweight nature of FitSM is a very important advantage as well. And the fact that it supports federated infrastructures. There are trainings so we can enhance the competence of our staff which were also essential. And everything mentioned so far - FitSM documentation, guides, templates, trainings - is for free; a very important factor for a non-profit organization. We started the adaptation of FitSM standard as a part of FedSM project but we are positive that we will continue its development in PLGrid as it is a very efficient way to implement ITSM in our infrastructure.

What were the main successes you achieved by implementing FitSM?

Probably understanding what a service is, and that it should be user-driven. Before the ITSM-era in PLGrid we had an academic, not business approach: "we have some computing power, you (scientists) want some computing power, let's see what we can do with it". Of course, defining services should not be an an end to itself or be done just to say "we are advanced in ITSM".

But without realizing what your service is and that it's delivery should be planned according to users' needs, it is very hard to start managing service provisioning effectively. Thankfully to FitSM standard now we know that. We also changed our operations to make them as user-friendly as possible. Earlier we didn't treat this factor as something we should consider when planning and developing our operational procedures. Along with changing the procedures we also started to replace our user-facing tools. It is a big step for us.

What would you consider as the main challenges you had to face when trying to implement FitSM?

The first and most important challenge was convincing our top management and partners that we should come back to ITSM with the FitSM standard and that we need their commitment to do so. That the future improvements achieved due to FitSM will be worth their time and our common work.

`The second challenge appeared right after we've heard the magical 'Yes', at the very beginning of implementing FitSM. It was mapping ITSM terms to an already existing PLGrid environment. It required some time to get "a full picture". I remember discussions about how many services do we have in PLGrid. What is the main PLGrid service? That was when we began to understand what a service is. We ended up with four services, while at the beginning we thought we have a dozen in each PLGrid site.

And finally, or constantly if you like it better, the very laborious part is the implementation of the standard in our operational tools. Sometimes it requires to have a process working for a year to see what needs to be automated. Sometimes our assumptions made at the beginning are not as accurate after some time as we thought they would be. This requires a lot of work but in the end, as we are starting to see from the comments from our users and customers, and continually making improvements, it is all worth it.

Cyfronet, Krakow

Looking back: What worked well? What would you have done differently?

Everything worked well!! But more honestly... because of all the help provided by the standard, from the very beginning we knew what to do, step by step. First: self-assessment, so we knew our strengths and weaknesses, we knew where to start. Second step - we know processes we should start with (naturally SLM and SPM), we had documentation, templates, 'first steps' documents to set up the process. After a while self-assessment again and then going to the next processes and activities connected with them. In this way we could start our initial ITSM environment.

Somewhere in between were trainings providing new knowledge to a wider group of our employees. Those done by external experts proved to people that ITSM is an important concept. After the FitSM foundation training we noticed that we started to speak the same language about ITSM, our discussions began to converge, and more people could work on evolving our service management system.

Our PLGrid Helpdesk is a big success due to the guidance from FitSM. It became simpler, more user-friendly, it supports ISRM and PM processes so operating it runs smoothly. Maybe not perfectly but definitely better than with old Helpdesk tool.

What would we have done differently? Maybe not differently. We wish we had sooner realized what services we are offering to our customers. This would have saved a lot of work conducted at the beginning of the project.

How did your organisation change during the process of implementing FitSM?

The people changed. They understood that there are in fact benefits resulting from introducing ITSM Standard in PLGrid, namely: FitSM.

There was initial resistance to introduction of "bureaucracy" from technical staff. They liked to have their work done as usual. Now site administrators find it convenient to have a defined way to introduce changes, no spontaneous way or single persons coming and asking "hey, you need to install this and this for me". A positive effect is recognizable among people.

What are your plans for the future?

We will keep on developing ITSM in PLGrid. We have big changes coming. After replacing our current Helpdesk portal, we will introduce 2 more new user-facing tools: PLGrid User Portal and Resource Allocation Platform, which will support our processes and procedures compliant with the FitSM standard. Not for the sake of the standard obviously, but for the sake of our customers and users. t the same time we plan to continuously improve our employees qualifications, by their participation in advanced and expert level trainings.

 

Download PDF version