Posts Tagged ‘Maintenance’

Scaling 3G & LTE: SOAM Issues

Wednesday, April 14th, 2010
RSS Feed Subscribe to EtherNEWS Bookmark and Share

We recently caught up with Craig Easley, president of the Carrier Ethernet Academy and board member of the MEF, at CTIA 2010 in Las Vegas.  Craig was at the conference to provide training for Ethernet mobile backhaul operators’ engineering staff as they prepared to roll-out large-scale 3G & LTE services (watch a video replay of this 30 minute class).  One focus of the session was the challenges providers face when implementing Y.1731 Service Operations, Administration & Maintenance (SOAM) to monitor and manage Quality of Service (QoS) in these performance-critical, all-packet backhaul networks.  Following is a short dialog between Craig and Patrick Ostiguy, President & CEO of Accedian Networks, who provide service assurance equipment to leading backhaul deployments to both mobile operators and wholesale backhaul providers.

Speakers:

speakercea

[CE] Craig Easley, President, Carrier Ethernet Academy

[PO] Patrick Ostiguy, President & CEO, Accedian Networks

[CE] If you received a new software release from your favorite switch vendor that supports this connectivity check messages (CCM), they can be configured just to run end-to-end and so you have only two points in the network actually sending and receiving the CCMs, or you can configure them to be processed by each inner management entity as well to provide complete path-performance information.

If you do that at a high enough level of granularity, it’s possible to actually flood the processing of the equipment that’s in the middle in such a way that you get a false positive that you have a problem.  Your data traffic isn’t being processed and handled by every one of those interim points, but the connectivity check messages are.  They are being read and time-stamped and then forwarded along.  And if an intermediate switch is overrun with time-stamping of connectivity check messages, the accuracy will be off and the operator might think there’s a latency problem – when in fact the end-to-end latency of the actual data itself is within spec.

[PO] The sheer amount of OEM sessions that Mobile backhaul is faced with when implementing these standards is creating this very interesting challenge right now. These providers deliver 3 to 4 classes of service per tower and also want to have an OAM Performance Monitoring (PM) session for each of those classes in addition to a CCM continuity check message session going to each of those towers every second.

So considering that you can have 200+ of thoe towers being served by a single Mobile Switching Center (MSC), it rapidly increases the amount of OAM sessions you are dealing with. So, converging at the MSC you can easily have 1000+ sessions that have to be terminated at a critical aggregation and hand-off location – typically served by a 10 GigE link.  That is extremely dense, even for the “big iron” switch-routers that are out there from the big vendors.

[CE] This is something that people are just starting to wrestle with: essentially, all of the big equipment manufacturers are releasing support for OAM – some are releasing new hardware to go along with it, but most are just doing it in software.  And if you just have OAM capability in software there is only certain amount of compute power in those switches that are already deployed.  So the good news is: it’s a software upgrade, you don’t have to deploy anything new.  The bad news is: you may be pouring a little bit “too much sand in the bucket” and exhausting the capability of the switch.

[PO] Like Craig suggests, this level of processing cannot be done by software running on the routers’ existing cards. In that context our customers have asked us to develop a product to alleviate this problem by providing a pure hardware based design that is independent of traffic load and can therefore handle thousands of OAM sessions, while offering microsecond precision one-way measurements. The beauty of this new product, the MetroNODE 10GE, is that because it is Y.1731 standards-based, it allows the operator to test performance to each and every tower, whether the cell site employs dedicated hardware such as NIDs, or uses cell-site routers or base stations supporting this OAM standard.

[CE] Agreed.  More and more people, I believe, will deploy special purpose network interface devices like the Accedian units to make sure that they get accurate data coming back from the network in terms of the SLA, especially in mission-critical and ‘zero tolerance for error’ latency environments like mobile backhaul.

You can watch a more detailed overview at Accedian.com/10, or watch the CTIA training session by Craig Easley at Accedian.com/cea-ctia.


RSS Feed Subscribe to EtherNEWS Bookmark and Share

Accedian Introduces 10GbE Packet Performance Node

Tuesday, March 23rd, 2010
RSS Feed Subscribe to EtherNEWS Bookmark and Share

New unit assures 10 GbE performance, scales to 1,000s of Y.1731 sessions for wireless backhaul monitoring.

Las Vegas; March 23rd, 2010 – Accedian Networks ™, a leading provider of Packet Performance Assurance ™ solutions for telecom, cable and wireless communications providers, introduced today the MetroNODE 10GE™ packet performance node. Featuring a hardware-based, ultra-low latency architecture, the 10GE delivers highly-scalable performance monitoring for critical 10 gigabit Ethernet applications. Addressing a critical need in 3G & 4G (LTE & WiMAX) backhaul networks, the 10GE can establish and maintain thousands of Y.1731 sessions at the Mobile Switching Center (MSC), providing comprehensive Ethernet Operations, Administration & Maintenance (OAM) coverage unachievable using today’s switches or routers.

The MetroNODE 10GE is a unique networking product, reflecting input from leading service providers seeking an enhanced alternative to traditional network elements. With an initial feature set optimized for mobile backhaul applications, the 10GE unit’s ultra-precise OAM capabilities easily scale to the large number of sessions required to monitor and maintain 3G & 4G service deployments.

To guarantee Service Level Agreements (SLAs) for a wide variety of real-time communication and data services, backhaul connections maintain different service classes for high, medium and low priority traffic. Used to monitor connectivity and performance for each service class between the MSC and each cell site, Y.1731 sessions converging at the MSC quickly scale into the thousands as operators light up hundreds of towers in a metro region.

Existing routers with software-based OAM implementations can incur processing delays that result in nonsensical latency and jitter measurements – often several times longer than accurate measurements provide. This lack of precision under real-world conditions leads to false alarms and inconsistent or incomplete monitoring visibility. By contrast, the 10GE unit features a dedicated-silicon, all-hardware architecture capable of processing thousands of flows in parallel with microsecond precision – technology scaled from Accedian’s well known MetroNID® units, widely deployed to establish OAM and monitor performance at cell sites worldwide.

“Mobile operators no longer have to maintain networks with sparse, inaccurate OAM measurements as they move from field trials to full-scale 3G & LTE deployments,” explained Patrick Ostiguy, President of Accedian Networks. “Hundreds of operators count on our solutions to assure critical applications – we engineered the MetroNODE 10GE™ to exceed their requirements and expectations. By using the 10GE to deploy service with confidence, they can overcome shortcomings in what they now consider legacy technology.”

A video overview of Accedian Networks’ MetroNID 10GbE packet performance node is posted on the EtherNEWS industry blog at Accedian.com/blog and Accedian.com.

Accedian is currently exhibiting at CTIA in Las Vegas (booth 6565); Mr. Ostiguy will address the challenges of 3G & LTE deployment on the panel “Engineering Mobile Backhaul” at 12:20pm on Tuesday, March 23rd.


RSS Feed Subscribe to EtherNEWS Bookmark and Share

LTE & 3G False Alarms

Thursday, February 25th, 2010
RSS Feed Subscribe to EtherNEWS Bookmark and Share

Capacity and next generation mobile services (3G & 4G/LTE) seem to be constantly under scrutiny.   Ever since the iPhone came on the scene and sucked the lifeblood out of at&t’s backhaul network we constantly hear about the impending doom, the bandwidth desert we’re all facing ahead.  This has been labeled “The Capacity Crisis” – here’s an example of one of a gazillion articles harping on the uncertainty of our mobile broadband future.  Sound a bit like the swine flu?  What ever happened to that?

One thing you learn working with real operators doing real deployments is that:

  1. backhaul capacity is something they dealing with (don’t lose too much sleep);
  2. there are bigger issues: real deployment challenges to figure out first.

And field trials for 3G & 4G are full of such examples.  No one’s finding an issue getting bandwidth to the cell site – no magic formula is required for that – simply put, if a fiber is laid or a good microwave connection is setup the capacity is there, pretty much on tap.  The issues that operators are stumbling over have more to do with the operational nuts and bolts.  A lot of new technologies are getting put through their paces at the same time, and some that work great in the lab seem to be falling short in the field.

Ethernet OAM: Lies, Lies & More Lies

One of the key technologies almost every operator is counting on is Y.1731 – the popular Ethernet operations, administration and maintenance (OAM) standard for connectivity fault monitoring (CFM) and performance monitoring (PM).  Y.1731 is a must, and for good reason: it’s the only standards-based QoS monitoring method available to assure Ethernet latency, jitter, frame loss and availability meet the demanding targets required for packet backhaul.  It works in multi-vendor networks; it works in multi-operator networks (great for using and keeping tabs on wholesale backhaul carriers).  Every network element maker selling into backhaul has it in their products and they’re all tuned up and ready to go.  Are they?

A recent field trial in a 3G deployment in North America went into crisis mode when one leading mobile operator turned on OAM PM to verify latency over their backhaul provider’s network.  The one-way latency target (and SLA) from mobile switching center (MSC) to tower was set at 5ms.  Y.1731 measured 20ms.  The mobile operator freaked.  The backhaul carrier claimed 3ms.  What was up?

Using an alternative test method transparent to OAM processing, the mobile operator confirmed the 3ms, giving both carriers another problem to solve: why were the OAM measurements in error by more than 300%?  The first step was to turn off OAM at all intermediate nodes in the network – suddenly Y.1731 PM measurements said 3ms.  They turned it back on: 20ms.  It’s important to point out here that the delay only affected OAM traffic – real traffic was unaffected and was meeting spec the whole time!  With the problem isolated to OAM processing itself, they were starting to experience something most network element vendors knew full well might turn up, but were hoping would go unnoticed.

oam-delays

The problem?  Most switches and routers claim to offer the full Y.1731 feature set, but none of this was thought out when the products were originally architected.  When Y.1731 became a must-have for backhaul, the features were typically shoe-horned into a software patch.  Running delay-sensitive monitoring features in software is a big faux-pas, because shared CPU time in the network element is a poor place to do anything critical.  These CPUs are busy doing more important things (like routing / switching functions) most of the time, putting OAM into background processing queues.  When traffic is at its peak, the network elements are heavily taxed – and just when you need performance measurements the most, they turn out the least accurate of all.

oam-delays2

Scary stuff.  In this case, every latency alarm the operators saw wasn’t an indication of network performance issues, but of CPU processing restrictions.  Not a very useful alert.

There of course ways to fix this situation, and these two operators came to their own conclusions and had things humming a little while later.  OAM can certainly work in large-scale, multi-provider deployments, and can assure critical services.  It just takes a few tricks and some solid, hardware-based OAM devices to help things out.

y1731-flows

This gets especially critical when you consider the OAM flows hitting the MSC: expect 1,000’s at a time as CFM and PM for 3 service classes from say, 250 towers, converge at a single router.

We’ve been getting a lot of calls in the middle of the night recently, and things can always be worked out.  Let’s just say none of these calls are about ‘The Capacity Crisis’.  That’s for the media to worry about.


RSS Feed Subscribe to EtherNEWS Bookmark and Share