Archives

These are unedited transcripts and may contain errors.

Routing Working Group session, 4 May 2011 at 2 p.m..

CHAIR: Welcome back. It's 2 p.m. so we are going to start this session here today is the Routing Working Group, if you wanted to be in the cooperation Working Group, that takes place next door in St. John's room. I am the cochair, together with Rob Evans, who is sitting right here, and will be speaking soon.

A few things before we start. A word of thanks to Emile Abban for being the scribe for this session and and the Jabber transcriber.

One request, whenever you go up to the microphone to ask anything or comment on anything, please do state your name for the people that are attending remotely can better follow what's going on and who is saying what.

In good old Routing Working Group tradition, the agenda is pretty backed so we hope not to overrun at least.

The two sevenminute talks in there will take place in reverse order. That's not important, I hope you'll all stay. We'll start with IPv6 routing recommendation document, and Rob is going to talk about that.

ROB EVANS: Afternoon. You may or may not remember that we have had this document that's gone through the Working Group a couple of times. It started off when we were taking the routing recommendations out of the v6 allocation policy. Some people still wanted a document that pointed out that aggregation was good but there are reasons that you might want to advertise more specifics. So we have RIPE 399 for the general case, possibly with some v4 buy us and then this document, which supposed to cover v6 specifically.

So, it's done a couple of rounds. It's coauthored by Phillip Smith and myself. We haven't maybe been pushed it as hard as we should have. We had quite a few good comments after the last revision was sent out Te end of February. So, the general gist of the document is aggregation is good. We all know that. A couple of edits removed mention of AS path limits, because that's dead, nobody uses that. We added no peer, not because many people use it but because that is at least a standard. And also, quite importantly, what we missed out from the previous revision which must have been in is a recommendation to register all the routes in your favour in a routing registry, which is the RIPE database or make sure it's covered.

What we'd like to do now is publish this, get it out as a RIPE document and are there any more comments from the room on what the document says at the moment, what it should say, what it shouldn't say?
Speak now or forever hold your peace. Excellent.

In that case we can get on with the rest of the session and the first main talk is by Marco Canini.

MARCO CANINI: Hello everybody. I am Marco Canini. I am the researcher, and it's my great pleasure to be here today to talk to you about our system for line testing of BGP, this is a joint work with a few collaboraters. So let's get it started.

Is it hard to crash the Internet? Well, let me just give you an example with software faults in interdomain routers. We have two different router types, a green and a red run and as you may expect something is going to be bad soon and actually this happened a couple of years ago when, as more Internet service providers advertised an if you BGP updates which contain this AS 4 path attribute. This was a protocol compliant but a confusing message and the router type B simply decided to report an exception by, well resetting the session. So this, by itself, would not be a big deal if the Internet was only composed by two routers, but what went wrong at large is that the number of unaffected routers effectively multicasted this announcement and a number of affected routers is they decided to reset the session upon encountering the confusing message. Then the sessions would get reestablished, the rib would be refreshed and that would still contain the confusing announcement and cause repeated service disruptions which led to routing instabilities and eventually all also networks that became unreachable.

So, if we step a little bit back from a general standpoint, BGP is not always reliable and I think we all know that. And look at an analogy and with the area of distributor systems, we can observe that BGP, as a distributed system, has a behaviour that is determined by the aggregate result of interlead actions of multiple routers that operate in a federated /HET genius and failureprone environment. So, the main issue here is the difficulty to reason for corner cases and combinations and this difficult is shared which the network cooperateers and software designers who resort to design a network by implementing local behavioural configurations in each router.

So things can be pretty brown even though by itself the local configuration appears to be correct.

By agenda here today is to describe our system for online testing of BGP, but I do have a disclaimer that this is still a research work, so please do not expect this one to be an immediate solution that can work today for your networks, but my hope is that it will be eventually a tool that will be helpful for this community. And I solicit your feedback and while I am giving you this presentation, I would like you to have at the back of your minds, these two questions: Which faults would you look for if you had such a system? And what would convince you to deploy our system in your network? And hopefully we can also have a discussion later on in the coffee break maybe.

So, BGP is not always reliable, but DiCE comes to the rescue, DiCE is the name of our system. And the key idea that we want to automatically explore the system behaviour to detect potential faults. It can actually decompose this key idea in three simple steps:

The first is to create a snapshot of a BGP neighbourhood and this snapshot is isolated from the production environment and becomes our testing platform. Next we subject a router BGP process to many inputs that can systematically exercise possible router actions. And for each of these inputs, third, we check in the snapshot whether the execution misbehaves, whether the BGP deviates from its desired behaviour.

The conclusion here is that if there is an error in the snapshot, then we can gather that as evidence of possible future behaviour of the production system and perhaps preventative measures can be taken. I will now detail each of these three steps.

So, first, the BGP snapshot. The idea is it to isolate the testing pro the production environment the at some point in this BGP topology, the router on the left that we call the explorer is going to want to shade the exploration so the first thing that it does is to take a local check point of its current state and configuration. What I mean by this is if you picture this green box as your router, there is going to be a BGP process and some other production of resources such as the FIB and the sock etc. That maintain the BGP sessions so what we want to do is to fork and obtain a clone of this BGP process inside the router, disconnect it from the production environment so that we can test in isolation its behaviour. Then, the router is going to send an update that contains a special and reserved IP prefix and some custom air route to make all the magic happen and the other routers that encounter this message are going to take a local check point. In the end these local check points will start new BGP sessions among themselves and this is realise our testing platform where the cloned BGP processes have new sockets and independent BGP sessions. And the observation here is that because BGP is a federated system, we wanted each router keeps its own check point locally so that the private state and the configuration will stay within the boundaries of the AS and the idea is it that the ASes will collaborate to detect potential faults.

So, now that we have a snapshot to test upon, we want to explore the behaviour. To do so, the idea is that DiCE will make a copy of the cloned BGP process and will inject an input that will a certain router action to be exercised and in the end we go into the snapshot and we observe whether there is an error. I'll come to that in a few slides. This operation can be done systematically, so we can have a second copy and a you input a different router action and so on, until ideally, we exhaustively test all the router actions but more practically, we are happy to even test the most relevant router actions. And if the error is found, well, that's good because we know it in advance.

Internally, DiCE uses a path exploring engine. So what we use is a softer testing technique which is called Concolic execution that is able to systematically exercise the code pass of the BGP process.

As I was saying, we want to explore behaviour. We need to understand what drives behaviour. In the first place, there is code and there is current configuration that determined what the router will do. And on top of this, there are inputs. So the path exploring engine is going to learn by the code itself what are the possible inputs that can drive and exercise the behaviour of the routers. And it will use this information to generate inputs that still the execution through this code paths. By inputs I mean several variety of sources such as messages, configuration changes, failures, timeouts, random choices. To give you a better example, let's take a BGP update. What we do with this is that we mark two specific region of the message as what we call symbolic inputs and the symbolic inputs are responsible to mark the region of bites that the path exploring engine can arbitrarily change to explore the code path. Another example is the configuration changes here what we look at it the route ranking, our symbolic input then becomes: Which is one is the most referred route and this reflects a policy change that affects the route selection process and therefore, drives a different behaviour on the router.

Finally, as we are now able to explore the behaviour, we also want to detect faults. To do this, we check for violations of properties that would capture the desired behaviour of BGP. Also here, let me give you an example that connects back to the initial example that I gave you when the session resets. This property that we call harmful global events would work as follows: In the snapshot, there would be this ambiguous but valid message that prop gates and it reaches a routers that's affected by the softer issues which will reset the session. At this point what we want to do is to execute a function that acts upon the local router state and returns zero or 1, depending on whether there is an error. Then, we aggregate all these results and we check whether the error count is above a given threshold. If so we log the inputs that have a harmful global behaviour.

We also have thought of other properties. One is to detect policy induced misconvergence. The other is origin configuration but do not think of this as a res placement for BGP Sec, or security solution, it's something that would be helpful to capture mistakes in the configuration for the filtering. In this property the idea is we want to check whether a customer or a provider can take a router  sorry, a route for a prefix that he does not own. So inside the snapshot, we would want to check that if the customer announces such a prefix D that it does not own, he is not able to pollute the routing table of his provider. Te end, DiCE would report the list of prefixes that can actually leak, if there are configuration mistakes.

As I was saying, though, in BGP, because of its federated nature, it's important that we keep the confidential information within each AS. So, here the information that can leak, we classified in two distinguished natures. One is the potential router behaviour, because we exercise all this code paths but we argue that the common code paths at least have already been exposed by BGP is a long running system and we don't feel like having such a mechanism would make reverse engineering any easier than it is already today.

The second class of information is that which is more important which is the private state or configuration, and here we can use randomisation to hide this information where we can avoid to use inputs that are driven by such confidential data so it can not leak, it's guaranteed that it can not leak. There are also other counter measures that we can think about such as rate limiting the information or not using certain explorers all together. Lastly what we consider as an extra possibility is to realise the property checks in an anonymous way by using secure multiparty computations so that there is no need for a trusted third party.

Regarding the implementation details, we integrated DiCE in BIRD, which is an open source router coded in C, and what I want to point out is that Concolic execution needs to instrument the code in order to track the symbolic inputs as they execute, but we are careful to use the instrumentation only for testing so there is only a neglectable input on the production environment.

We evaluated DiCE using a 48 core machine on top of which we ran multiple BIRD instances and we installed and checked the properties that I have already discussed and the policy conflict which, as I said, I will not have time to discuss.

This is the evaluation topology that we used. This is interesting topology because it contains all the kinds of commercial links in use today, at least the most common ones. We loaded  we sprinkled some of the routers that are affected by the AS 4 path issue around the topology. We have loaded from a dump of a rib, 300,000 BGP prefixes used today and we used a realtime to do our evaluation. We also configured will policy in each AS and the customer route filtering according to the best practices, although we introduced some mistakes in order to see whether DiCE was able to find them. And we placed this topology inside a network emulator to add some Internet like conditions regarding the latency and the link capacities.

So, we ran a number of micro benchmarks to have an idea what would be the overhead in terms of CPU, memory and bandwidth. For CPU we observed that if he we ran a stress test during the reload, the is 8 percent but using the trace of 50 minutes realtime updates, which is a more realistic condition, we only observe a negligible impact. For the memory each clone process as on average a 37 percent overhead, although our code is clearly a research grade so optmisation are possible and for bandwidth, we observed there is an average 8 kilobits per seconds utilisation during the exploratory messaging.

The results are that DiCE was able to detect the session reset and the origin misconfiguration problems present in our topology. It managed to explore all the code paths in the update message handlers inside BIRD and it went across and Internet like test bed on average in four minutes and given that some of the problems in BGP may be latent and stay around for months, we think that such a testing time should be a profit for our tool.

Coming to deployment options. We have thought about two options: One is of course to convince one of Cisco, Juniper or Huawei to integrate with DiCE. This may not be of course the most easy way to do, but if at least one of these vendors were to consider this option, maybe that could create enough traction that everybody would consider adopting it. Instead, a more incremental option for deployment would be to deploy DiCE and BIRD in a server within each autonomous system that the server could potentially run multiple router instances which would then be configured with the current AS policy and fed with the live BGP data and then each AS would connect his own server to the DiCE servers in the neighbour ASes.

We also thought a little bit about incentives because at this point you might be wondering which incentives would you have. First of all, the Internet is a shared infrastructure and our over arching goal is to keep it running reliably, but there is also some sugar to it in that an ISP would benefit as being the target of an exploration because it would be able to learn about its own faults, faults that are present in its own network. And instead in an upstream ISP could invent advise its customers to serve as an explorer under the logic that with fewer faults there are also lower operational costs.

So, to wrapup: We have this prototype of of an online testing system for BGP, and would you be interested to tryout our prototype or do you have suggestions for any properties that you may want us to check?

Thank you four your attention and thank you for giving me this opportunity to present here. Do you have any questions?

AUDIENCE SPEAKER: David Friedman from Claranet this looks wonderful. Have you had any interaction with the IETF at all?

SPEAKER: No, not as of now.

AUDIENCE SPEAKER: I just suggest looking into the global routing operations Working Group, simply because there is some incentives going on at the moment to monitor BGP inside a domain, something called the BGP monitoring protocol, and it may be that this approach has some value, just a thought.

CHAIR: Anybody else? In that case, Marco, thank you very much for presenting here.

(Applause)

CHAIR: Next up we have Alex.

ALEXANDRU STEFANESCU: Hello, I am going to deliver a short presentation on the effects of RPK scenarios.

I am only going to give you a brief overview on your approach, our tools and what we hope to achieve.

Basically we want to study the effects of the BGP deployment scenarios. That is choosing a set of ASes on which we assert that they deploy various security additions and we want to measure how secure does the Internet become, how many secure routes are formed. What we ultimately hope to achieve is to find a way to start securing ASes for a maximum benefit, but to get to this, we first have to understand the relationship between the number of secured ASes and number of secured routes. For example, should we start with stub ASes or large tier 1 ASes. In this case we should start with larger ASes but how many of them should implement security solutions in order to have a strong impact?

Otherwise, should we start with content delivery networks because they have a lot of traffic and it should be interesting to see how, what exactly is their impact.

Of course, this is useful to find out exactly how many ASes should we secure in order to have a certain number of, a certain percentage of secure routes?

So, our approach is to come up a simulator able to, in which we can easily implement various security solutions. We can practically emulate any security proposal, although we focus now on routology and validation. It is important to mention that we do not perform crypto computations. This is not a benchmark of computations required for security solutions. We just emulate these. The word is abstract what you can but keep the simulator running in realtime, even if it's scaled, so it's not an event simulator and to emphasise on using real world data. The simulator should be able to work with current topology of the Internet taken from CAIDA so basically 

The result of our abstraction is protocol, a model of the BGP protocol in which there is no practical network modelling. An AS corresponds to 1 node. There is no internal BGP modelling. But, we have all standard features of BGP, other BGP protocol, including even route flap dampening.

Our security model is implemented by simply tagging the announcements sent between the ASes. And these ASes are assigned various security policies individually and one of the most important thing to say is to favour a secure routes when a tie is encountered, but there are others possible mentioned earlier like para I had no secure, we prefer a secure route if it exists even if when unsecure route is found shorter than a secure one, or simply adjusting no other security aspect.

Our software is an enhanced version of the preexisting simulator from earlier. It runs on a cluster at the university. It each has its own Java thread, so each node in the cluster should be able to handle a few thousands ASes, and we can easily tweak the security policy deployed on each AS.

So basically our testing scenarios goes like this: Basically we assign security policies in different distributions, different types of security policies, and we propagate one or a few prefixes and in the end, compute, count the number of validated routes in all our routing tables of the ASes.

However, this simulation is affected by a number of factors, for instance, what is the influence of topology? How we could, for instance, test on topology from the previous years, from today, we could use synthetic topology, ring or muster, it doesn't matter. What's the effect of different types of security policies? The one I have mentioned earlier. How about their distributions? What if we secure 10% of the largest ASes, 20%, 30%? And what have if we start securing by geographic regions? Start with the RIPE ASes? Large, small, anything you can imagine.

So, our envisioned results is to first of all, we are continuing practically the work from Sharon Goldberg, which is focusing actually more on path validation than route region, and her group is modelling economic incentives rather than the effects of different classes of security policies deployed an ASes and the work of Jennifer recognises forward, which is more theoretic in nature and kind of has opposing results.

In the end, our output basically is a more detailed simulation than before, and a guide to a favourable turnover for investments in BGP security. It's important to mention that our results are a valid overall, not on a per AS  not, I mean  we will show a general trend, not a specific policies.

So, for future ideas, we will need your help and we are open for your suggestions. Also, we are applying to to take into consideration the dynamic aspect of our simulator. For now we are ignoring the time aspect. So, we could, for instance, study if a secure route prop gates better than  faster than an unsecured route in certain conditions.

Here are are our references, the first is the master thesis describing the simulator and the final two are the works, previous work I have referred to.

Thank you for your attention. Are there any questions?

CHAIR: Any questions for Alex? As research progresses we will encourage you to report back on findings. Thank you very much.

(Applause)

ADRIANA SZEKERES: I will present my work at NLnetLabs as a Master's student from the university. My work on AS level multipath routing.

First of all, I will make a short introduction to AS level multipath routing. Then I will motivate our work. Then I will present our work and some preliminary results.

So, I will begin with a short introduction on multipath routing, and the problem it tries to resolve. So, consider this four ASes and the path that S learned to D. So basically, suppose that D announced a prefix and S learned through BGP a path to D. Now, when the link to D fails, basically S will be disconnected from D. Fortunately, the Internet toll on is not a tree, and so multihoming it a well known technique, so we have a chance that D will have a second provider, okay, and we will also know our path to D, a second path to D.

So, when the link PD fails, we will  P will start drop packets to D because it doesn't know of a second path to D, so it will start dropping packets until A will switch the path to the second path, will switch to the second path, after it gets a withdrawal from P.

So basically, this is the problem that we have, the disconnectivity that appears during BGP convergence. So basically, this problem would have been solved if P would have known about the second path to D. And then he would immediately change route packets to the second path. So basically, this is the main idea of AS multipath routing to supply multiple paths, and also to find the most disjoint path from the one currently used by BGP.

So, the conclusion is that it would be nice to have multipath routing, but it comes at some cost. And to see what cost I am talking about. Well the number of ASes has doubled since 2004. So, BGP faces scaleability and stability problems. Many proposals have been needed to solve the scaleability issues such as more out timers, so this is a wellknown issue of BGP, which can not be ignored. So, because of these issues, there appear a reticence to new BGP enhancements. So basically many proposals have been submitted but few of them have been accepted to be implementeded into BGP. So the conclusion is that every new BGP addition can be, should be carefully studied and tested before actually being implemented in BGP.

So basically, this is the motivation of our work, because this method, this multipath methods are relatively new and haven't been studied very much, so this is what basically we want to do.

So, we want to study the promises of the recently proposed AS multipath method. So, so far, we have implemented two of the proposed methods, that is resilient BGP and multiprocess BGP. The main idea of resilient BGP is to advertise a fail over path to the AS through which I am routing, through which the AS is routing. And the idea behind multiprocess BGP, is to run multiple instances of BGP who try to find this joint path. So these are two different approaches to solve the problem.

So, we focused on to study the impact on BGP's scaleability, the impact of these methods on BGP's scaleability and then the degree of the fault tolerance, so do these methods actually achieve the full results they promised or not.

So run these experiments we used the BGP simulator my colleague talked about, because it is a large scale simulator and it can similar late more than 30,000 ASes. He already told you, it is ran on a DAS4 cluster with with 8 course per node and it is relatively fast, but it has memory requirements. And as topology, we used topology from CAIDA from 2005 to 2010, which simulate very accurate the current Internet topology.

So I will present some preliminary results. Well, this graph shows the impact on BGP scaleability. So basically, this red line is the BGP simple, so we advertised the prefix in the topology and then we counted the number of updates. So, if you look, on the topology from 2005, we got over 120,000 messages, okay. And these ones are from RBGP and MPBGP. Now, I personally can't quantify these numbers because for me, I don't know if 280,000 messages are a lot or not to advertise a prefix, but these results can be further interpreted by those who have a feeling of, if this is much or not. If this increase in messages is a lot or it's acceptable.

And after restarted studied the impact on scaleability, on BGP scaleability, we wanted to look at the quality of the alternative paths that are found by the two methods, so basically we wanted to see how node disjoint the alternate path from the current path that is used by BGP.

So basically, if you can look at this graph, on the X axis we find the maximum node disjointedness from all the alternate path that the method finds. So the node disjointedness from the current path that is used by BGP. So basically, we can observe that MPBGP finds more path so basically all the ASes has an alternate path, at least and alternate path, which is some 1 node disjoint or 2 node disjoint and 3 node and so on. But for RBGP, we can see that more ASes, or 92% of ASes don't have an alternate path but this  we can not conclude that RBGP is worse than MPBGP because even though it finds less paths, they are strategically put, so RBGP only advertises a failover path to the ISs through each of the routing. So it's normal to find a less number of paths, but it could actually be  it could actually achieve a more full tolerance. So, we want to further study the routing tables and see if the fault tolerance is really achieved by RBGP. I don't know if it's clear.

So, we will further study  so this is not final. We will further look into the routing tables.

And these are the results for 2009 topology, and we see that it's not really distinguished from the previous one. So basically the topology does not influence much the methods.

In conclusion, we believe that our work will be of importance to those who develop, who design the BGP and who, if they want to take into consideration these additions, these multipath routing, because we'll give insight into the impact on scaleability and their efficiency. If they really achieve the full tolerance.

And for future work, we will further analyse, as I said, the routing tables and to draw a final conclusion and we will add more methods to the analysis to complete the view over these methods because they are relatively new, so...

Thank you. If you have any questions.

CHAIR: Thank you. Any questions?

AUDIENCE SPEAKER: Wilifred Woeber, and where it comes to more recent BGP developments, what some people call these days a silver surfer, if you go back to one of the early slides where you had the alternative path, so there was this node A and the P and the P2 and that sort of things. You said if the node P would know about the alternative path through B2, P2, it could do more clever things than drop the packet. I got that one. But regular BGP processing logic, if P returns the packet back to A, A would still see P as the best path unless there is a routing layer update about connectivity or the drop of connectivity. So, my question is: What sort of modifications to the behaviour of the routing protocol would be necessary in node A to make this work?

ADRIANA SZEKERES: For RBGP for example, node A would also advertise this path to P. So P will be aware of both the routes.

AUDIENCE SPEAKER: True, I got that one. But what I do not get is how would A find out that the alternative path is the better one before receiving the route update from P telling A that the segment from P to D is no longer available?

ADRIANA SZEKERES: Well, it advertises the second path not after P sends the withdrawal to A. So, it's previously, it's previously 

AUDIENCE SPEAKER: I am referring exactly to that time period where you said A would still ship packets to P, and P would have to drop the packets because it lost the path to D, and at the same time, it would generate a route a topology update message to be sent back to A. And what I understood was that your approach tries sort of to do more clever things in this little time frame 

ADRIANA SZEKERES: This is a simple situation, but more complicated situation can appear. I only give an example.

AUDIENCE SPEAKER: Sure, but I would like to find out what changes to the routing protocol of the forwarding logic would be necessary in A to be sort of more liquid to be faster and do the clever thing before it actually gets the route, the topology update from P?

ADRIANA SZEKERES: I don't really know exactly what you are referring, but A will advertise path before failure happens. So...

CHAIR: I think Wilifred is talking about the forwarding of the packets in the moment of instability.

AUDIENCE SPEAKER: Geoff Huston. Can I help here just a tiny bit? The presentation says assuming that there is a mechanism that a house alternate paths to work in BGP, then what do the updates look like? Now, you are taking an example going. Wow this alternate path mechanism in BGP really looks ropey. I agree, but that's not what the presentation is about. There is this whole other thing about BGP best path, you know, how good are withdrawals etc. Which is a massive other other piece of work. This work simply assumes that that all happens, which is fine.

AUDIENCE SPEAKER: Blake Lillis, two quick questions for you. One is: Does your routing  I assume this is purely a routing simulator, you don't have a packet, an actual packet forwarding routing across this so you haven't been able to test the actual convergence time of a flow?

ADRIANA SZEKERES: And it's also only simulated at the AS level, not routers.

AUDIENCE SPEAKER: And second question was about being able to guarantee a loopfree topology but I think it pretty much goes back to what the discussion of a moment ago.

The RBGP in particular looks a lot like the work that's gone on recently to provide fast tree route to link state protocols lick OSPF and OSIF, I am not sure if you are familiar with that work at the moment

ADRIANA SZEKERES: No. That's from 2010 or?

AUDIENCE SPEAKER: Yeah, exactly.

ADRIANA SZEKERES: I am not familiar with all the methods.

AUDIENCE SPEAKER: It's very similar to that sort of keeping a backup path waiting in the wings is if the prime one goes away I already have this one almost in the FIB instead  where the decision is made locally where it guarantees a loopfree topology.

CHAIR: Thank you very much for presenting.

(Applause)

We now have Alex Band from the RIPE NCC.

ALEX BAND: Hi, my name is Alex Band. I am the product manager at the RIPE NCC, and one of the projects I look after is resource certification, everyone's favourite topic nowadays.

We launched this on the 1 Jan, 2011, and what is it that we did? We did a hosted platform. We started really small, really basic, and we did a limited amount of other space. We certify provider aggregatable space and IPv6 allocated by RIR, that's the only space we are doing. And within this hosted platform, all you knee is an LIR portal account, so every RIPE NCC member has one. You log in and you can just do one click and enable your certificate authority, and all resources eligible for certification will be listed on the certificate. Now, after that, the only other thing you have to do, what certification allows you to do is create socalled route origin authorisations. So, you say I authorise this autonomous system to announce these prefixes. That's it. And the only influence that the RIR has in this is that only the registered holder of an IP address can create such a ROA. That's the power of this system.

Now, let me show you how that works. To make it a little bit more tangible. You click add ROA specification. You fill in the AS number. For example, 64511, call it my upstream AS, just give it an unique name. You drag one of the prefixes in you can optionally make it smaller. If you try to fill in a different prefix, it will though you an error message. Then you have an option called maximum length. Now, this is something different than what you are used to with for example, with rootal checks in the RIPE database. Maximum length actually allows the autonomous system to deaggregate the prefix up to the point you specify. So if I were to leave this particular field blank, then the ASes only authoriseed to announce a/22 and nothing else. So, they are not authorised to announce anything more specific.

However, if you would for example fill in /24, then the AS could announce/22, /23, /24, etc..

So I fill in /24, it also does v6, so I'll drag one of those in there and then you can enter a start and an end date. So, you could add that if you want. If you leave this blank, then the ROA itself will just match the validity time of the certificate. And the certificate is automatically renewed, it's just part of the membership process. So long as you are a RIPE NCC member you'll just have the certificate recollect it's as simple as that. Lastly the only thing you have to do is click add row. All of the crypto work, all of the storage, everything, everything is done automatically by the system. So, the publication of the ROA is also done automatically. This ROA is now available within the ROA repository which you can use, anyone can now use a validation tool to grab this repository and validate whether a particular route announcement has a valid ROA attached to it. That's all there is to it.

Now, a couple of people have done this in the four months that we are running. Actually we have 168 thousand /24 prefixes, so the equivalent. So there is, it ranges from /24s to /10s. We have all kinds of prefixes in there. But  and this is something like 2 and a half /8s for those of you who can do quick maths. We also have 8 thousand 400 /32 IPv6 prefixs in there. And then you can start having a look at the data. And that's kind of fun. Because, people are just experimenting with with this now. We are just having a look, okay, what can I do with this system? What options does it give me? How can I use this to potentially make routing decisions in nobody is actually doing this out there in the wild as far as I know. I talked to a lot of LIRs who participated in this programme, who have some ideas about it, who are interested in it. And a lot of them are just having a look, what benefit it could offer them. You know.

So, you have varying degrees of detail. Some LIRs today just have a couple of of prefixes, you know, only their IPv6 prefix, for example, and none of their IPv4. Some have ROAs for only their own autonomous isn't. But maybe they announced the prefixs from another AS as well, but they didn't add any ROAs for them yet, because they thought I'll get to them later. That's tricky. Because once you start doing this, and once others would consider actually making routing decisions based on this, you actually need to make sure that once you start creating ROAs you do it consistently, you do it for everything. Because if you don't authorise another AS, well if it doesn't have a valid ROA attached to it. What that means is that well essentially the announcement is going to be seen as invalid. So if you are going to start, and you actually want to use this, you know in a production environment, you have to be careful that you do all of it.

So, also the maximum length field that I talked about earlier, some people just want to allow themselves the freedom to deaggregate, so they fill in /24 or /32 everywhere for every ROA they create. Some are really strict so they leave the maximum length field blank, they only want to authorise that particular prefix and nothing else. And some simply misunderstand the purpose. The result of that, if you take the entire ROA repository now, actually hundreds and hundreds of prefixes would be regarded as invalid, because people misunderstand the maximum length feature or they simply decided that that's how they want to implement it. But if you start comparing this data in a repository to real world routing, actually quite a number of prefixes would be regarded as invalid.

These are the different implementations and this is actually a question for you. We implemented maximum length exactly according to the Internet draft in the IETF so it's an optional blank field that you can fill in if you want. However, LACNIC has made it a mandatory field. You have to fill in a maximum length. You have no option to leave it blank. APNIC and AfriNIC, the implementation that, you know, Geoff use tonne has done, they have a default /32 for IPv4 and a default /128 for IPv6. By default allowing maximum deaggregation? Is that a good idea? What is a sensible default, and that is a question that I would like to ask you.

So, what I want to do, after talking to many of you in the community, is actually implement a system, build a system that checks the ROA repository to the RIS route collectors live. So, once you have created a ROA, you have actually made aware within the system you are made aware of the effect of your choices because for many people this is not immediately apparent. So, I would consider triggering an alert for any more specific announced from an authorised AS, any more specifics announced from a different AS. That means either two things: You forget to create a ROA or somebody is trying to do a high Jack. That could be an option. Or, for example, a prefix for which you created a ROA is no longer announced for X amount of time. Because especially in this stage of the game, we are just running now and people may be add ROAs and add some information, but you may walk away and forget that you have done this, that you have implemented this. So, if you get an alert like an email or something else, that, you know, you are no longer announcing a particular prefix, that is something that you could be reminded of. You see a lot of route objects in the RIPE database as well. Lots of people add a route object, not very many people update it or do their housekeeping. Because to me, what would be the goal to achieve is to get the data that we have now, to get it to be as accurate and as reliable as possible so it's not a lot of data, but the data that we have is actually kind of good.

Another thing that we could do is actually suggest ROAs based on what route collectors see it's something that we could do. That's one of the things that we are working on.

The other thing that the guys in business applications are fewer usely coding on is implementing the up/down protocol. It should be done real soon. Next weekish, two weeks from now, something like that. This allows you to run your own certificate authority because within the hosted system, we are actually the holders of your private key and any security expert would say, well, is that a good idea? Maybe not. But for a lot of you, running your own certificate authority dealing with a keys, dealing with a roll overs, dealing with potentially hardware security modules, things like that, we have actually taken great care in providing a solid infrastructure, so, many people will choose to, will want to choose to rely on the RIPE NCC's infrastructure if you wish to do so. But, it's not really fair to say okay, I am not going to offer you an alternative, right. If you would want to run your own CA right now, you can't. You simply can't. But that will change, like I said, in a week or two.

Also this is a requirement for ARIN to launch, because their Advisory Council said well, if you don't have that as an alternative, we don't really think it's a good idea to launch. What we also want to do is release client software. There are different implementations, different open source implementations and we want to release a client of our own. This is open source BSD licence, so you can install the software, it securely interfaces with the RIPE NCC, so you can run your own certificate authority, maybe script against it, do all kinds of things. So, we are running a pilot of that and we are trying to get in touch with a couple of you to see what it would be like to run your own certificate authority and what features you would need in order for it to be useful for your business.

So if you would want to participate, please let us know. So maybe I can set up something.

Also, validation. RIPE NCC validation tool is really, really basic. It's Java , it's command line. You can point at a repository, it can give you a little bit of text up, it can give you a common separated file. That's really all it does. We would like to expand that with different functionality like background caching, giving it a nice webbased user interface, have some scripting support. So you can script against using Pearl or Python, expose API and lastly, implement RPKI router support, so getting, creating the ability for RPKI enabled routers to actually talk to the validation tool. The RPKI RTR works is a router simply asks a question, I see this announcement from this AS, is that valid? And it will ask that question to such a validation service. And the validation service will answer, like, yes, this is valid. No, this is invalid. Or I don't know, there is no ROA for this. It simply doesn't exist. So those are are the three states that it can have and based on that you can create route maps, set up preferences, set up flags or prefix filters, things like that. Also this, we open source the BSD licence. So, if you have input on this, if you would want to help in this, if you want to participate in this, please let us know.

The kind people from Cisco actually contacted us and they said well, we are going to release RPKI capable routers in X4 of 2011 for IOS, and for IOS XR in Q 1 of 2012 and since we are going to do that, it would be nice if the validation tool you have actually works properly with it. So, could you have a look that what you build it actually verifies on the code that we have, so we have a couple of Cisco 7200 running to actually make this work. We know the Juniper is working on the implementation as well. So this is something that is going to be, if you want it to be an endtoend solution, so you can create a certificate, set up ROAs, you can choose to validate against them and base routing decisions on the information that is available to you. So everything drives preferences. Nobody is forcing you to do anything. All about routing preferences.

All the information you can find it on ripe.net /certification, information announcement update, things like that, so, have a look if you want. And if you have any questions, please let me know.

First of all, you asked one question and you can get an answer right from the horse's mouth. The proper default for max. length is in your case, blank. What we want to have as the default to keep the routing table small, to have the maximum aggregate and nothing else as the default, and allowing  and allowing everything down to 128 length in IPv6, Geoff, what kind of grass have you been smoking?

ALEX BAND: Geoff is not getting up. You have a perfectly reasonable explanation for that and I'd love you to share it.

AUDIENCE SPEAKER: Or this is the reference for changing the bit that I think you invented to turn it into max length?

GEOFF HUSTON: No, I didn't invent that bit but the original modelĀ  this is Geoff Huston from APNIC  the original model that we thought here was one of, by default, the prefix owner is not the route originator, that's the authority that's been handed over. And what we thought by default was appropriate was that the prefix owner gives the maximum permission to the route originator. Do what you need to make this route visible. If the prefix owner wishes to put additional constraints on it, then they have to move away from a default in the UI. That was all it was about. It's a relatively minor thing given that the tool is there either way. All we are arguing about is a default setting but we thought it was kind of better to allow the route originator, the maximum ability to make the route work as distinct from constraining the route originator. It's a style thing. That's all.

ALEX BAND: The potential problem with the choice that we made is that people may go well I don't really know what this option is, but since it's optional, I am just going to leave it blank and they may not realise that the more specifics that they thought they could announce, are not really working. So, that's the potential problem with the choice that we have made.

Ing I think we can alleviate that with the notification system that I talked about. So, you are actually alerted like, do you realise that this is what you have done? And then people can change it if they unintentionally did that.

AUDIENCE SPEAKER: The right default it maximum aggregate, and if I want my network provider to do more specifics, it doesn't do harm if I actually give him the explicit allowance.

AUDIENCE SPEAKER: Just on this, it's James Blessings from Limelight. The issue is partially that the minute you start having a router making a decision on a ROA, is that if you don't have a /32, /128 available and you are, say, advertising a blackhole community on that particular route because you want your upstream to blackhole it, they are not going to accept it because there is a ROA that exists but it's invalid for that particular request. So that's actually one case where having the /32, /128 is actually a good choice. Personally I would leave it blank as it is now.

AUDIENCE SPEAKER: Well, okay, let me see. Beyond an answer  well, okay, a comment: I find it confusing that you seem to call a CA implementation that you will be offering to be client software.

ALEX BAND: That's terminology, yeah, that you are right.

AUDIENCE SPEAKER: I actually guess that, well, okay, I would probably look into relying party functionality when I am talking about or hearing someone talking about client software.

ALEX BAND: You are absolutely right.

AUDIENCE SPEAKER: And, well, okay, please get the up /down stuff out. If you had something where clients could connect there, that would be nice. High priority on doing another router RPKI implementation isn't really that important.

ALEX BAND: No, but getting up  down done is something business applications is currently working on 24/7 with all the resources we have available for certification. So once that's done, once we have released that, then we are moving on to other stuff, but up/down is all the way at the top of our list and we started on the 3rd January, on a Monday, coding furiously.

AUDIENCE SPEAKER: Sandy Murphy Sparta. I'll preface this by saying I am not really certain I know enough about the RIPE web interface to be making a statement, so this is honest to gosh really a curiosity question. The RPKI goal from the beginning was to say that the prefix holder is the one who has the authority to say who is allowed to originate a route to the prefix. I notice that your web better face picture started off with picking what AS number you are, and proceed from there.

ALEX BAND: No, no, it's not what AS number you are. Which AS number you authorise to announce that particular prefix.

AUDIENCE SPEAKER: Okay. So you start off with with signing on with your authorisation for the prefix holder. So the maintainer idea  okay, thank you.

ALEX BAND: You log in with your LIR portal credentials and what you'll see on your certificate are all of the prefixes that are registered to your LIR and that is the only thing 

AUDIENCE SPEAKER: Thank you.

AUDIENCE SPEAKER: Well, okay, comment there and that may be interesting for Sandy, well, okay, actually, what you are allowed to put in as a ROA in this interface is kind of limiting of what can be done in a real ROA like, okay, in the real ROA, there is no real limit to the number of prefixes to list. Here there is a fuzzy limit, it's not quite clear whether it's 4 or 5 or 6 prefixes that you can actually put there.

ALEX BAND: That's the entire picture. The resources that you see on the right. These are the prefixes that be on the LIR that is logged in. You can see it at the top there, the sort of the training purposes, the test LIR that we have.

CHAIR: We need to move on, we already are running a little bit late and to be fair to everyone. So thank you very much, Alex, for presenting that.
(Applause)

RANDY BUSH: We have talked a lot about the current state, RPKI and so, this is the next stage after origin validation. As I said on Monday morning, there is this gang of people from Government academia industry, vendors operators, working on the next stage which, let's call it BGPsec, and let's assume we have the RPKI as described. Let's assume ROAs are a given, so that I can say, Oh, let this prefix be announced by this AS. Let's assume that we can get it down to a router so the router can make decisions based on the authority we have given it. So, the router can actually get this  you can actually say hey, show me this prefix. Oops, AS 3130 announced that that's invalid. AS 4128 announced that that's valid and I can use, for instance, this is a Chris could he, so I can use route maps etc. To make decisions based on it. There is a gap. The gap is that that origin in the routing announcement was not cryptographically signed. Anybody could have forged it. Okay, there was no cryptographic assurance of the announcement, and also, by the way, the AS path could have easily been forged so the origin announcement was correct, but somebody is trying to steal traffic.

So essentially, origin validation is weak because it only stops accidental misconfiguration. This is wonderfully useful. Most of the accidents we have on the Internet today that cause YouTube to relocate to Pakistan and not because of the low salaries there, are accidental announcements. But also, a malicious router could announce and forge and that's a serious problem.

Now, I can't  there have been attempts to say let's take routing policy, in other words, that Elmar is connected to Nigel and that's in the RPSL and so we know if I see that AS path from Nigel via Elmar, that it's valid. The problem is that policy on the Internet changes every 36 milliseconds, and no mean that as a joke. New customers come up, people switch providers, links go down, etc. So policy changes continually on the Internet. We can't have a database of that. In fact we developed a protocol to distribute the policy changes. It's called BGP. So, I can't know whether Mary wanted to announce the prefix to Bob, but I can test formally that Mary did announce the prefix to Bob. Okay. So the name of the game in BGPsec is to validate that the protocol has not been gained. It's not about the intent of the people. I just want to know that nobody is violating the protocol and I am going to waste your time taking you through a simple example, which is path shortening. This is the normal path B announces her prefix to W, announces it to X, announces it to A. The traffic should flow this way but Z announces to X that Z can get directly to B, a false path. So X sends traffic that way, Z skims off the dollars, although I guess if it's dollars, it's Z not Z and the dollars never get over to B, okay. How do we deal with that?

We are going to use something called forward path signing, which is AS hop N signs cryptographically that it is sending an announcement to the next one, N + 1 by AS number, in other words, it puts the forward AS number in the packet, in the announcement. So, essentially, W, when it announced to X  or pardon me, sorry, when B announced to W, it's signed with B's key that B is sending it to W. So, Z cannot forge that B sent it to him because there is nothing that says B signed forward to Z. So Z can't forge the thing, so this announcement can never be verified so the dollars appropriately formed go to bush.

So, this is a modification to the BGP protocol to what's being proposed. We are talking about something that's going to roll out in three to five years, not today. It's assumed that the consenting routers will use BGP capability exchange to agree to one it between them.

Note when these signatures etc. Get in there that the BGP updates are going to get bigger, it turns out not too bad. What most of us didn't know is that under the covers the BGPsec says that no announcement can be more than 4 K bites long. That's got to be removed and in fact you'll see the proposal that's already in the Working Group of the IETF to make it 64 K. And if the BGPsec capability is not agreed, in other words I am talking to a router that's an old router, an innocent, then don't past any of this group to it.

This is a new BGP attribute. It's transitive, optional, blah blah blah, and I'll show you pictures of the attribute in a second. There is another attack that I want to prevent, which is called the replay attack. Which is, R0 used to be connected to R1. R1 passed to R3 the path 10, hey I can get you to R0, talk to me, and R0 switches providers. R1 can play at R3 the old announcement, a replay. Okay. And this is an attack, a traffic attack, a traffic attract err to bring it here. What can you do to prevent that? What you can do was when the origin R0 signs the packet over to here, that that signature, that signed packet has an expiration. So it signs with a short signature hive time. By short I mean a matter of days, whatever. The origin reannounces the prefix well within that lifetime. If you look at the proposed protocol specifications, you'll see that it should be like on the order of a third of the lifetime just in case one of them was missed, right, you don't want the prefix to die. This we call beak onning, it's suggested to be days but it could be hours for truly critical infrastructure and yes I know we think our infrastructure is critical and nobody else is, but, you know, people will start applying route flap damping to this one if you abuse it.

Per router keys. So, if one of my routers in my AS, you know, if the Amsterdam router is compromised, and somebody grabs the private key from it, I don't want my entire network compromised so we'll have keys per router. That implies a more complex certificate and key distribution mechanism. It's really still the RPKI. What's cute is a router could internally generate a key pair, its private and public keys, and send the certificate request up to the RPKI with the public key, and the private key of the router would never have left the router, which is really kind of cool. But, I doubt if people are going to go with this, but it could be.

So, if you want one AS per key, that's fine too, just take a router's key and share it all around your AS.

So, this enhances the key structure. So an AS generates keys for all the routers thatten codes the AS number and the router ID in some fashion. All that matters is that each router has its own.

So, what's the packet look like? This is the originator. This is a normal announcement. It contains the prefix and the AS number who is originating it. There is a new optional transitive attribute that contains a reference to the router certificate, the forward signature, hey I am sending this from AS0 to AS1, and this whole thing is hashed, signed with an expiration date by the key of this router and the signature is added to the end of the packet. The next router gets it, router AS1. Now, notice that this router said it was sending it to AS1, so AS1 better bloody well be the one here. And AS1's cryptically verifiable router certificate is going to be referenced here, and AS1 signs over its forward data that it's sending it to AS2 and it signs over the signature from AS1. The signing over of the signature is the equivalent of signing over that whole thing but why bother rehashing, just sign over the signature. So we now have a chain of forward signatures only the origin beak ones, everything else just quietly goes along for the ride.

So, this only occurs at the provider's edge. It doesn't happen internally. It doesn't happen  it doesn't affect your IGP. The you probably carry it in your iBGP, especially if you want to get to to route reflectors because you are going to want it route reflectors.

So, interprovider peerings might or might not be BGPsec speakers. As I said earlier, if my peer is not a BGPsec speaker, I don't send all the signature glorp, I have to send the whole stuff. What happens is there will be islands of this as it deploys. Totally different topology than origin validation.

What's cute is an end site. It has the boarder router and it as two upstreams. Joe trusts the upstreams to validate announcements so the upstreams don't send signature glorp to Joe. Joe wants to protect his prefixes, though, so he has a per router key and he is in the RPKI and all he has to do is sign the announcement, which requires very little crypto, very little space, etc., that's all he has to do to protect his prefixes. He receives good stuff and that means that he can run this on current hardware. Okay. So, no hardware upgraded needed for the 84 percent of ASes which are stubs.

It's meant to be incrementally deployed. It does not require flag day, etc. No increase of operator data exposure. I don't want anybody to know who my customers are, etc. This is one of the things that sadly has hindered Internet routing registry, real deployment and real usability is that for it to be really used in routing, you had to disclose too much.

Confederations: Even though they are logically an eBGP boarder, you don't want that going outside so you don't sign across your internal Confederation. Route reflectors need to carry the information: Don't sign.
All that's signed undercurrent definition is the prefix, NLRI and the AS path. Other attributes like communities etc., we do not understand how they affect security, and so we are not going to sign what we, you know, don't have a firm grasp of the semantics. For example, no export, it's just supposed to go to the next hop which I presume is at least covered with MD5 so why are we signing it? Why would we sign it? The proposals as they exist today in the IETF are utterly unoptimized. For instance  and we want to get the semantics and the security properties straight. So, for instance, if you prepaint paint your AS 100 times, we just didn't go into that. Probably what will happen is there will be a counter and you just sign over the counter. It's assuming that optmisation can be done later as the protocol heads words to real standardization. It uses the RPKI probably an enhanced protocol where something like the RPKI router protocol is enhanced, it's going to have to give the certificates to the router which it doesn't today. We assume origin validation so that we don't have to throw the ROA in the package, where you just a the router signature over the AS.

It's just another BGP decision, just like origin validation. It comes up with a validity state and you put it in your route map. It doesn't drop the prefix automatically or add the prefix automatically. The example I continually like to give is my friend Steve bell me, in fact one of the developers of this, a security researcher, he may only want to collect the invalid routes because he is doing security research. So it's all local policy and the vendors of course will give you lots of knobs to play with.

There are consequences. New hardware. This is going to drive up, first of all, memory requirements, because the AS information in your router is bigger, and you are going to have to put a  I have all the certificate gloop in there too, the certificate gloop as things start is going to be relatively larger. As things deploy more will be, percentagewise, less of the problem. But it's going to be years before it gets to be a problem. This is from NIST, the US National Institute of Science and Technology  Standards and Technology, thank you. There we go.

So anyway, they did a nice thing, they are cooperating on this project and this is what, presuming all sorts of stuff and I have a full presentation on it for those people who want it, bludgeoned to detail. This is what it looks like given a guess on deployment scenarios as far as rib memory increase goes. Not to panic, it starts going up in 10 years, 15 years.

Pardon me, six years. Okay.

No PDU packing, because I currently can have multiple prefixes in the same BGP announcement because they have all have the identical attribute. That wouldn't work here because they would hit the next router, that next router would apply policy to those prefixes, split them up differently. Okay. So, packing doesn't happen. We have done some measurements, turning packing off has less than a 50% increase in full transfer on funki little routers. So, shouldn't be a major influence.

Here is one that bites it. AS transparent route servers, right, so what they really are is an AS in the path, except they are not seen. So it's actually the correct security property for them not to be able to be there and do what they are doing. When you think of the semantics, but the problem really is if there is an AM transparent route server, so I am peering with both Lorenzo and Derek, to whom do I forward sign? Lorenzo or Eric? So...

Proxy aggregation doesn't work. That's okay. It's not used.

To be noted: This does not lock the data plane. In other words, packets aren't forceded to follow the BGP path. And in fact, as I reported a couple of years ago here, we did some measurements that show that 70% of the ASes in the default Freezone actually aren't default free. They have default. Measured and shown to have default. So, that's life in the big city. Maybe some years from now we'll be able to lock the data plane to the control plane, but if we haven't got a locked down control plane, what would be the fun of that? So right now, we are still trying to lock down the control plane.

For toes worried about black helicpoters, this work is sponsored in part by the department of Homeland Security. The people who take your scissors away and and we are working on turning those scissors into plough shares.

And that's it.
Questions?

CHAIR: We are a little bit over time but if there are questions, I think we should take them. I mean it's not like it's the first time with the Routing Working Group. No questions? Okay.

In that case and just, does anyone have any things that they would like to mention? If not 

RANDY BUSH: Yes, I do. I believe that you saw our presentation in Rome on route flap damping and how in fact would the significant change of parameters, it can actually be useful. So, I suggest that Christian and Philip, etc., etc., have another bash at the route flap damping document and I would advocate that we change the number, but maybe we should do it in PDF instead of text, and  XML  but anyway, to update the route flap damping recommendation to say: You could use it if you set the parameters this way.

CHAIR: Okay. I'll make sure to reconvene the authors and put that in. Thank you for that presentation and thanks for this one as well. Thank you everyone for coming. Keep watching the mailing list which is how we work in between meetings and other than that, we'll see you in Vienna in October. Thank you very much.
(Applause)