We recently caught up with Craig Easley, president of the Carrier Ethernet Academy and board member of the MEF, at CTIA 2010 in Las Vegas. Craig was at the conference to provide training for Ethernet mobile backhaul operators’ engineering staff as they prepared to roll-out large-scale 3G & LTE services (watch a video replay of this 30 minute class). One focus of the session was the challenges providers face when implementing Y.1731 Service Operations, Administration & Maintenance (SOAM) to monitor and manage Quality of Service (QoS) in these performance-critical, all-packet backhaul networks. Following is a short dialog between Craig and Patrick Ostiguy, President & CEO of Accedian Networks, who provide service assurance equipment to leading backhaul deployments to both mobile operators and wholesale backhaul providers.
[CE] Craig Easley, President, Carrier Ethernet Academy
[PO] Patrick Ostiguy, President & CEO, Accedian Networks
[CE] If you received a new software release from your favorite switch vendor that supports this connectivity check messages (CCM), they can be configured just to run end-to-end and so you have only two points in the network actually sending and receiving the CCMs, or you can configure them to be processed by each inner management entity as well to provide complete path-performance information.
If you do that at a high enough level of granularity, it’s possible to actually flood the processing of the equipment that’s in the middle in such a way that you get a false positive that you have a problem. Your data traffic isn’t being processed and handled by every one of those interim points, but the connectivity check messages are. They are being read and time-stamped and then forwarded along. And if an intermediate switch is overrun with time-stamping of connectivity check messages, the accuracy will be off and the operator might think there’s a latency problem – when in fact the end-to-end latency of the actual data itself is within spec.
[PO] The sheer amount of OEM sessions that Mobile backhaul is faced with when implementing these standards is creating this very interesting challenge right now. These providers deliver 3 to 4 classes of service per tower and also want to have an OAM Performance Monitoring (PM) session for each of those classes in addition to a CCM continuity check message session going to each of those towers every second.
So considering that you can have 200+ of thoe towers being served by a single Mobile Switching Center (MSC), it rapidly increases the amount of OAM sessions you are dealing with. So, converging at the MSC you can easily have 1000+ sessions that have to be terminated at a critical aggregation and hand-off location – typically served by a 10 GigE link. That is extremely dense, even for the “big iron” switch-routers that are out there from the big vendors.
[CE] This is something that people are just starting to wrestle with: essentially, all of the big equipment manufacturers are releasing support for OAM – some are releasing new hardware to go along with it, but most are just doing it in software. And if you just have OAM capability in software there is only certain amount of compute power in those switches that are already deployed. So the good news is: it’s a software upgrade, you don’t have to deploy anything new. The bad news is: you may be pouring a little bit “too much sand in the bucket” and exhausting the capability of the switch.
[PO] Like Craig suggests, this level of processing cannot be done by software running on the routers’ existing cards. In that context our customers have asked us to develop a product to alleviate this problem by providing a pure hardware based design that is independent of traffic load and can therefore handle thousands of OAM sessions, while offering microsecond precision one-way measurements. The beauty of this new product, the MetroNODE 10GE, is that because it is Y.1731 standards-based, it allows the operator to test performance to each and every tower, whether the cell site employs dedicated hardware such as NIDs, or uses cell-site routers or base stations supporting this OAM standard.
[CE] Agreed. More and more people, I believe, will deploy special purpose network interface devices like the Accedian units to make sure that they get accurate data coming back from the network in terms of the SLA, especially in mission-critical and ‘zero tolerance for error’ latency environments like mobile backhaul.