Archives

These are unedited transcripts and may contain errors.


DNS Working Group session on 5 May, 2011, at 11 a.m.:

CHAIR: Good morning. This is the DNS Working Group session, so if you want to hear about something else, go to another place. Before we start, just I want to say the usual warnings: Note that this session is recorded Stenographed and all jazz, so don't say things you don't want them to know. The other thing is, please when you go to the mic, state your name so the people on  what is that?  whatever, on that, so everyone can hear who is saying what, and my name is Jaap, so I should do the protocol myself, I am one of the three cochairs, the other one you had a chance to see yesterday, Peter Koch, and Jim Reid is making himself useful by closing the doors and keeping check of the time.

I am actually inviting Wouter to the stage.

WOUTER WIJNGAARDS: So, I am here to talk about plans for the NS D4, which is the new, a new version of NS D which is an authority daytive DNS server. Gladly loaning slides from Jaap. So what am I going to talk about? I will start by talking about the general characteristics that NS D is showing, after which I will sort of show the version progress that NS D has had and the various things that it does, because there is still people that use our previous versions and that sort of explains what is going on in the longterm view.

Then, I will explain the plans that we have, the preliminary plans. We have got for building the next version. And some initial speed results at the end of that.

So, what is NS D about, it's about doing one thing and doing it well and the idea is to have an authoritative DNS server. It was geared towards being used at route servers, Kroot by RIPE initially and TLDs and not doing other stuff like recursive service, caching, a lot of other stuff and it's not doing any of those. The idea is to no features that aren't germane to the task at hand and no other features that could be there and there is significant pushback on having new features in this software. It is a simple and, and as a result of that, having a small code base that is easily audible, that is laugh speed hopefully, as well, at the same time. I see somebody standing at the mike. Oh, right.

NSD is supposed to have just enough documentation because we assume that our users are technically competent and know what they are doing because they are running stuff like route servers and TLDs so they do know what they are doing because they want to keep their stuff running. So there is no need for extremely userfriendly stuff, no wizards with next buttons and stuff like that, right?

One of the goals, as well is to software independence to provide a measure of diversity in the landscape of DNS servers as they are serving zones so that a flaw in one of the implementations would not, you know, affect the service of the DNS because there is another implementation without saying which one could have trouble, of course. And to this goal, we are trying to have software independence and just having, you know, a bigger diversity so enlarging the echo system. And because of that, it has to be are resilient against high loads because when times are troubling loads with going up and it has to keep operating and keep the users experiencing their DNS zones, usually they don't experience anything about DNS but they have some experience that the DNS is facilitating.

It all started with 1.0, it was just that, it was a server that didn't really know what it was doing, well, it was doing something useful but it was ignorant about DNS protocol, serving precompiled answers stored in memory, imagine something like a set of bytes that is returned when another set comes in and not really much more of it, which is of course, small and fast, and the user interface is spartan, meaning no configuration whatsoever because it just serves the data. And obviously implementing the basic DNS, not DNSSEC, DNS RFC is 130, 345 and the up dates on that. No zone transfer stuff or that sort of thing going on and because this is IPv6 week I should note to you that version 1.2 in 2003 supported IPv6, and so this is our IPv6 deployment plan and it happened in 2003.

Obviously, DNSSEC came along which was more difficult to support than the IPv6, although IPv6 took some work. The DNSSEC was  took a change in internal database structure and this was needed because DNSSEC has these NSECs or 3s, and if you know about NSEC records there is forgiving NSEC domains something with a signature on it so the receiving end knows that it didn't really exist and for this you need stuff like knowing which is the closest and closer, I see everybody falling asleep at this term. You need to know much more and so the NSD 2.0 became much more knowledgeable about DNS in this way, and the internal database structures changed as well with no more knowledge about it in there so it could while still being speedy and implemented 403, 4034, it didn't exist at this time, right. It was, you know, slightly less compilation because obviously it needed to fetch the right NSECs to include and that sort of thing. It gained an AXFR module and it could serve full zone transfers as well and the ability to do T significant and 5 that we all know and will have to had a configuration file. It's maybe a bit of a big term for a file with a list of zones in it, very, very simple, as was needed, to do the task at hand, and people have been using that and some of them still are. It does the job.

After that, there was simply a desire for support of more of the DNS RFCs that were coming out and the IXFR and using notify, that was NSD 3 and this, again, you know, this increased complexity in NSD, as it gained some support to keep track of the timers in the SOA record, to accept and send notifies and accepting IXFRs to update the zone so it could much more rapidly update zones as zones are growing bigger across the world, doing partial is much more efficient. And support for a full DNSSEC, which I mean here to include DNS SEC 3 support which needs even more knowledgeable about DNS and knowing how to hash things into NSEC3 hashes and finding them and doing the NSEC3 stuff which is generally there. Again, increasing as well as more DNS met at that support  already supported a while ago before that but supporting like SHA in the TSIG and the DNA ME record which is another thing that you have to understand what this means to be able to serve the customer about it.

So this is providing more support for the DNS, let's say, variations that exist in the DNS for various purposes. But, complicating the code base ever so  ever so slightly more and, therefore, you know, but it's doing the job at hand with the minimum needed stuff that you need for that.

And so we go for the new NSD 4 which is so good it has a vapour wear logo and we have plans that span the universe. We have requested some people want to run lots of zones, like TLDs they usually are have this one really big zone and although it doesn't have a problem with lots of data, lots of zones was something that was not part of NSD 3, so we intend to support this and to be able to support this we will need zone configuration thing template thing. We intend to speed up the serve a bit by changing internal database again to accommodate stuff and have actually already implemented it and that is the speed test you will hear about later.

And to try to have even more pre processing going back a little about towards the by pre proposing more by storing NSEC3 hashes that are already computed and so they don't need to be computed again after a zone transfer. So this is again going to grow the internal complexity of NSD which is something that we are trying to avoid. We don't want to grow the internal complexity but we want to support more features.

And this complexity is moving not to NSD 4, it's examining to have the same, say, user facing, well if you are running DNS servers that is your users, your user facing is staying the same, answering queries but the compiling system that answers queries, that is going to get awful the complexity in this case to handle more zones and awful the dynamic behaviour that we want to have, for example adding new zones, removing zones, so changing the configuration without restarting a server, which is often heard point that people didn't want to stop and start to reread the config file. And to be able to reload just one zone so you can load one zone file and not recompile all the other zones at the same time. So that more dynamic changes, so that changes to DNS information are more easily picked up in a dynamic way. And to be able to have have some sort of control, the idea is to have an SSL channel to send these commands that will help the user by hiding the complexity even though that is not exactly our goal because the user is smart but that will just be there to command it anyway so we will do that.

While we are at it we might improve the TCP support is DNSSEC is proving to cause some TCP usage. We want to improve the TCP support so it's more resilient against high load on that front. The idea is that although we are going to make this a new version, that it should not be hampering the original target audience so they can continue to use NSD for their purposes and because these people are generally very cautious in maintaining their important servers, and NSD might need some longer support while we continue to create NSD 4 and roll that out. The idea to have is to have this nonvapour wear somewhere at the end of 2011, obviously spurred from the name server from the C Z that is coming out. But that is simply the timing that we have. If you have any wishes about what would be useful in a DNS server for you, that you just walk to me, I will be here the rest of the week, as well. And I will be gladly hear what it is you want as long as it's not dynamic update.

So, this is the architecture it looks like. I will go over this really quickly, it's a bit too can he tailed. NSD 3 had as well but then changed over on the lefthand side you can see it's forking like if you know NSD it operates object the internal side, a big database and memory and forks different processes that don't communicate called the children here, that will talk to port 53 and make your customers happy and it has this big database that is using copy on rights if you know the forking behaviour, this is what it's using. That is changed is everything on the right hand of this picture where there is still this process doing some transfers but it's talking a lot more and doing a lot more tasks as well this reload process and they basically manage the list of zones, the configuration file which they reread because you are changing your configuration and removing zones so this is part that is getting the complexity, even though this picture looks less complex than the NSD 3 picture if you could remember that which contains lots of additional programmes which are no longer needed.

So the implementation of this is going to be set in milestones where every milestone is a working server which you can access from a repository which is available and you could use this right now if you wanted to and check out a working serve which some of the features implemented, obviously, as they slowly start to get there, with every milestone we are planning a test phase, doing regression test we have built up for NSD in the past as well doing portability tests. We are trying to run portability tests on the hardware that we can get our handson which means we have Linux in some version Fedora and that sort of versions, as well as net  open BSD, I tried to test it on MINUX but it doesn't have enough system calls. Versions of various kinds, various hardware platforms, including Intel, as well as so wave various portability suite that we are supporting. So at the end of the project we are planning a test phase before moving it into production and to have a beta test that could use the new software and try this out and if you want to do that, contact me because you could also use one of the intermediate milestones to test it while it's being worked on.

So obviously, I am already working on the project and this is the result of some speed tests, which I will try to explain. This is sort of summary, it's not really that the speed test is meant to test the end result; it's just that one of the first versions was to change this internal database structure to change the speed and I have to measure it. NSD 3 and 4 I want to compare. There is BIND to provide one of the latest releases to provide a comparison but you probably already know and echo D on right, doesn't do anything, it sends back reply so you know what the system is capable of. QPS at 95% return value. The blue one is something like a root with 500 delegations in small zone, L1 is like TLD with a million delegations and L2 is hosting 100,000 zones with records in them so it's like a level 2 zone in DNS tree, and the idea is that this is the speed, the number of queries per second at which you get only 95% of the queries answered. This is obviously not what you want in real life, you want 99% but this is a speed comparison, it will just be a bit higher than this. The performance will be just be slightly better. What you can see is NSD is faster at the root than TLDs because the database is smaller, better caching in a processor cache and that sort of stuff. And then NSD 4 is like improving by about 20% but the new tree structure is good for records you have in TLD giving about 40 or 50% improvement. And once it's in internal tree, NSD doesn't really care with the zones that are there, probes on compiler syste, you can see the performance was pretty good for hosting 100,000 zones even though the start up was not so nice, this is what the new memory structures in NSD 4 are showing us as well. There is a very small difference, though, between one zone so it's probably something to do with L2 caches or that sort of thing in the system. This is using just one CPU on our standard test lab. And there is the results of bind as well to compare. NSD 4 is going to be there and faster because you have already got these results but I don't have NSD 4 which is obviously a nice thing and we had a logo as well.

If I ask for questions, I would like to know what you would see.

CHAIR: Do we see any questions? Have this name on this slide it's bad for me to ask questions.

PETER KOCH: One thing I like about NSD it is not yet suffering from this creeping featurism and and you know where to pay for the compliments and all that, so there a list of explicit nongoals for NSD 4 so things that will never, ever make it into that release? I am thinking of it is dangerous idea to use the name serve for signing for example, can we have a guarantee that that will never happen? Or what is the price tag for that?

WOUTER WIJNGAARDS: There has been some  let me put it like this: NSD is not a signer so it's not going to do that. Because it's not a signer it's not particularly useful to have dynamic update even though obviously with all these dynamicity that we are supporting, dynamic update is not, wouldn't be that hard to do; it's just that it doesn't make sense to do it because you need a siren with that and it's not a signer (signer). At the same time, people want to have signers because they want to host zones that are signed, so there is going to be some way for signers to give data probably by IXFR, which is nice standard protocol for sending data and I think that be would be the way to go.

Olaf: As an addition to this, my name is Olaf, director. What we are trying to do with the software, specifically with unbound and with NSD but also with open DNSSEC in which we are participating, is targeting software to specific functions. Signing and dynamic update is not authoritative name server core functionality, so to speak, it's a provisioning function, so in the case that you want to do provisions, dynamic provisions the way I would envision that is to have a system, an open DNSSEC based system and open DNSSEC will get there at some point that will accept dynamic updates and then I serve those out to the NSD server.

We are trying to keep NSD very simple. On the other hand, we would like to use NSD used and that is a very tough call, because there are environments there that do not use NSD because it's too hard to use it. If you have 50,000 zones and you have to restart and recompile your data set every time you add new zone that doesn't work at some point. That is the kind of environment that we are targeting with NSD 4: Make sure you can add zones easily and support the dine a.m.icity that is throughout in the marketplace today. But we are trying to do that in the most minimalistic way, so to keep the functionality blot as minimumal as possible.

CHAIR: Before we get into philosophical ideas 

AUDIENCE SPEAKER: Sorry for making this a marketing talk. That is not the intention.

SHANE KERR: This is Shane Kerr ISC. I had a quick question, I seem to remember someone submitted some patches nor NSD 3 to support SQL. Is that correct? Did I see that on the mailing list

WOUTER WIJNGAARDS: I don't recall that.

SHANE KERR: Maybe I am hallucinating. It's not in the plan.

SHANE KERR: My question is about external contributions, do you plan on having any kind of plugins or hooks or anything like that or this is NSD and this is what it does.

WOUTER WIJNGAARDS: There is used to be in 2.0 but it got removed because nobody used it and it was obviously a feature that was not needed so it was removed, so it doesn't seem to be on the cards and in the spirit of this week, yes we plan to support IPv6 in NSD 4. Thanks.

CHAIR: Thank you, Wouter.

(Applause)

Next we go to David knight and he is going to till all you want to know about IPv6 inaddr.arpa.

Dave Knight: Good morning, eye name is Dave Knight, I work in the DNS group at ICANN, I am going to talk to us this morning about changes that have happened recently in the IPv6 inaddr.arpa and zones. There is three main changes which have happened over the past year and a bit, those are wave new IANA zone management system where changes to the zones are made. The authority serves for both zones have changed to a new set of serves operated by the RIRs and ICANN, which are described in RFC 5855, and both zones have been signed in the last year.

The state just over a year ago for IPv6.arpa, the zone was authored at ICANN, I think that was as hand edited file, and an authority services provided by a subset the RIRs and ICANN. Inaddr.arpa was authored at ARIN and the authority service for that zone were some of the root servers, I think all except J. The current status, zone management has moved into the IANA using a new system which has been developed over the last couple of years. This allows the RIRs to make up dates directly for their delegations and they do that by sending XML forms to a restful http S service and issued certificates by ICANN which allow them to change their allegations.

A new set of authority servers have been created and these  for each of the zones that are a separate set of servers, operated by all of the RIRs and ICANN and they are named A through F in the newly created servers IP6servers.arpa inaddr.arpa zones. All four of these zones, so the two IPv6.arpa and inaddr.arpa and their respective server zones, have been signed during the past year. This is signed by our signing infrastructure at ICANN and uses  they are all signed with NSEC and all 2048 bit KSR and 1024, the KSK annual rollover is to keep our procedures current and to test our procedures.

The transition: Over the past year went as follows: The zone management system started development in early 2009. That was designed and implemented at ICANN with the cooperation throughout of the RIRs. The system was ready early last year and testing went on throughout the year and I think the first RIRs to begin using started doing so in March of this year.

IP 6.arpa was already being authored at ICANN, so the transition to the new system was relatively straightforward compared to the changes for inaddr.arpa. We started signing the zone fryer it being moved into the new management system in April of last year and in September the DSR set was added to the ARPA zone and at the end of last year it was redelegated to the new authority servers. Inaddr.arpa's transition of a bit more involved because it had to move from ARIN to ICANN, and throughout that transition, a blog was maintained at ICANN at inaddr.arpa transition dot ICANN.org as each of the changes were happening. That started in January. In the first instance, ICANN began pooling the zone from ARIN and redistributing it to the new servers and then during February, that is thank switched around so the authoritative version of the zone was being created at ICANN, and at that time, we signed the zone and redelegated it to the new servers and then it March the DSR set was pushed into the ARPA zone. At ICANN we operate the B IP 67ARPA (freeze). That is dedicated hardware for each of those located in our Los Angeles and Virginia facilities, Anycast between those locations and this is the query load we see now. I think this graph is from yesterday so for IP 6.arpa we are seeing around 600 queries per second and for inaddr.arpa around 4,000.

Any questions?

CHAIR: There is a question.

JIM REID: Some random guy off the street. Dave, I am a bit surprised it's just the reverse IP zones that are moving. What about shifting the whole of.arpa off the route servers?

DAVE KNIGHT: I have no opinion on that.

CHAIR: Is there a discussion about that?

DAVE KNIGHT: Not that I am aware of. I guess that is something for the IEB to talk about.

CHAIR: OK. Do I see any more questions? No. In that case, thank you very much.

(Applause)

The next will be 

SPEAKER: I am going to talk today about name serve control protocol. This is who we are. I think everyone in this room would agree DNS needs high availability and that good practice suggests that you should use more than one name server which is why NSD was talked about. How many people in the room actually do this, or are prepared say? So it's probably under half, and I suspect one of the reasons is due to the complexity of running two different name servers at the same time because all of the tools are different for each name server, and I think most ISPs are using some kind of proprietariry solution, I know of at least one that is using C F engine, somebody was talking about  there is a lot of uptake out there, tying things together.

So three years ago, the IETF did a Working Group there was a clear need for a common DNS name server manages am and control system and they wrote this draft which is just I think, Peter yesterday turned it into an RFC.

PETER KOCH: The night before.

SPEAKER: Three years to get the draft done. At the same time as the requirement draft came out, myself and some colleagues wrote an NSCP draft which was a name serve control protocol that would meet all the requirements in the requirements draft and that can be found at this URL, and NSCP is intended to be a single cross platform, cross implementation control protocol for name servers, so that you would be able to manage NSD with it and bind, whatever name server you wanted.

'00 draft came out in 2008, the '02 draft in March 2011, just acknowledgment, the other authors Morris. The '00 draft covered the data model as well its transport layer and modelling language but in order to concentrate on the data model we removed the transport layer and modelling language from the latest draft but we do intend to readd them once we have some more feedback on the data model, we have a more firm data model. It's just simplify the discussion on the list, of which there has been very little.

The '02 data model is targeted to a minimal data model for a DNSSEC enabled authoritative server. It doesn't have every BIND feature in it, it's not BIND reference manual printed up as in RFC format. We haven't looked or thought about resolvers yet but we intend to, and also, we intended to be able to gather statistics as well.

Thinking about deploying such a system, well, it's likely to rely on agents running on name servers, to sort of wrap the name server, but we do hope to see NSCP built into the name server itself and we are talking to ISC about buying ten and I am going to talk at lunchtime about NSD 4.

This is the data model as it stands. You obviously have the concept of a serve and under that our views, which I always consider as a virtual servers; others, there seems to be some debate as, it depends who you ask as to exactly what other view is. And there are zones, peers or any other system that the named server will talk to, master or secondary, a resolver, a stub, whatever. So peers reference by name, by access control entries in the ACLs and it also referenced by the zones to say who to send notifies to or IXFR from, and DNS SEC policy for the zone is taken straight from open DNSSEC, so it's basically the CASP part of DNSSEC and we plan to build mappings from that data model to NS B and to BIND to power DNS.

We do not intend, although you may stand up at the end and tell me I am wrong, we do not intend to include any support for zone content management. It's my opinion that zones are a version control or database issue and that NSCP should send the name serve a URL telling it where it can check if there's an output from, if it's a different version, for example, or it should be able to just configure SOA to allow dynamic updates to start on a master but that should be the limit of the RIR data going across the control protocol. But aknow a lot of people want the control protocol to zone configuration, so we will have to argue that out on the list.

So, we are very keen to receive some feedback on the data model. Does provide the minimum we need to configure an authority server to use in your organisations? And if you have comments on it, if could you send them to the IETF DNS op Working Group list or you can send them directly to me if you want to.

The 00 draft suggest use NETCONF as control channel as well as the transport and manipulation layer for data model and that was writ then a formal modelling language, YANG. Is stuff, it's readable, messages are in XML and contained within a very simple RFC layer in order get and set type RPC calls. Is extensible, has its concept of capabilities so when you connect to a NETCONF server say hello and tell you all the things it knows about will be one of those capabilities (freeze) extensions for some of the extra features in BIND that would be another capability, for example.

So, we will be interested to hear if anyone has any comments on whether NETCONF is suitable or not. You can respond on the list or directly to me. The draft as it stands is concerned with a single name server but in reality you are likely to want to manage groups of name servers and, for example, you might want to manage all the servers in your Anycast cloud or that are your secondaries or all the serves that serve /KPA*RPL dot com. What we really need is some feedback on potential use cases from operators and name server implementers and we really need to understand the requirements and issues that affect you when you are using it in your organisation. I come from a ccTLD background so I can do that, but unless some ISPs come forward with new cases they are going to get a ccTLD designed control protocol which I am sure won't necessarily be suitable.

I have asked OARC if they would be prepared to host a database of these cases and Wayne sent a message to the OARC list last night or this morning, asking if members felt that was appropriate and I am waiting to hear back about that. Again, if you have use cases and want to share them, you could send them to me and tell me how private or confidential they are or are not.

That is the end of the draft and again, comments to the list or comments to me. Any questions? There must be one question, surely.

CHAIR: We always have Jim Reid for questions.

JIM REID: John, great stuff. I wonder though if you have had much infruit registrars about their requirements for a control protocol because I would imagine they would have got a specific set of requirements that certainly don't fit into the conventional ISP model, might be more consistent with resolver capability, pretty much dealing with one large zone, what about the case of somebody with 100,000

SPEAKER: There haven't been any comments about the draft yet, of any technical nature. And when you talk to people they always say they want this system, they want a controlled protocol or name servers they have but getting comments has been proving difficult which is why I am going around the presentation circuit giving this presentation. Is that someone at the back wanting to ask a question or is he just standing up and having a walk?

AUDIENCE SPEAKER: Erik Klein, Google, forgive me for not having read any of the specks but a general question: Is it a notion, suppose I have a large farm of name servers like a very large geographically diverse farm and I want to push complex zone updates, is there some way to stage things so that I can make sure everything is transferred successfully and verified and very fast send around, OK make this the live version sort of thing.

SPEAKER: In the draft it's nothing about how it's implemented. And we do have a plan on how we'd implement it and if you have a need for that, then we can certainly do it and I think NETCONF is so extensible it would be possible to address that issue, yeah.

CHAIR: I do remember from the NETCONF description that was indeed a very, for having such a model about staging stuff and do final call.

SPEAKER: Yes, the NETCONF, if you use June /A*F, it seems very familiar, on roll back and that kind of thing available in it. Or rather not commit on roll back.

CHAIR: Any more questions? Thanks a lot.

CHAIR: Sander.



SANDER DEGEN: As I am from TNO Dutch Research Institute and half a way year ago we did some research now DNS clients behave and that study, the results of that I'd like to present here. I have to say that was sponsored by SIDN. I have 30 slides and 15 minutes, I will try to hurry up.

DNS traffic, if you look at picture you see you have clients, resolvers, authoritative servers and there is a lot of traffic going on, there is caches that are affecting how the traffic goes and if you look at DNSSEC, then it's going to make a change and we were curious to how that change would be.

Two characteristics of DNSSEC it's more data, of course you have the signatures in the packets so the packets will be larger, that can cause fragmented packets which can cause either problems because there is a router in between who blocks it or filters it out, or it generates more TCP traffic because it's necessary, and on the other hand, you can have validation problems which will cause messages to be transmitted by the validating resolver and that will cause retries.

So, our key question was: How will DNSSEC affect client querying? And if you are going to make a test environment, then of course, you will look at ServFail messages and how will a client handle that, no replies because if it was blocked, what will the response be and large packets because P packet over 512 bytes is larger than normal, and what we were thinking if you are going to test a setup anyway, why not also check out an NX do man, reversion refused, normal responses and partial responses, that is an answer without an answer. How those are treated.

So the experiment: This is the setup, we had a test setup with a client, a monitoring station, the resolvers which was really not there in the test and their controls DNS server, all based on virtual box using or testing with different operating system, with Windows you go into Mac OS, a lot of different browsers, for us the browsers are the most common clients and we were using L DNS test for the control DNS server which was really, really helpful. And for installation we tried to keep everything as much as possible on default.

How we did this the test was we had a page with links on the website and we clicked each link and the link caused DNS lookup for that link and if I say link you should think of servfail.dnslab.nl and that was served by the L DNS named server. We also did a test to check the operating system by doing ping command from command line and we were mostly curious on how and the number of requests that was send and the delays in which the requests were sent. If you look at the results. There are going to be a lot of slides. On the left side there is the browsers and then the operating system and resolver. We didn't do much with the other side; we pretended that the resolver had all the answers and we were mostly curious in how the resolver reacted to the operating system. This was the result for most, well for awful the browsers that we tested, they are all the same result both on Windows 7 and XP and and this is the case for valid lookup. It's nothing out of the ordinary; the browser does request to operating system and I want an IP address for this domain name and Windows sends it to resolver and resolvers gives an answer and browser passes it on. A truncated message, cause TCP connection, in blue, nothing weird, request, resolver, returns with truncated message and TCP, your connection is started. In the case of no response, then you see that operating system does two second delay in total it will send four queries to the resolver, that is OK. And ServFails, partial answers, refused and NX domain answers, all treated as one packet and nothing strange.

We tested with Safari and Mac OS. All valid and nothing strange. Truncated as usual. One extra lookup by the operating system. Serve failures, no response or reversion and refused, there is three extra retries and sevenandahalf second delays.

Then the last one, Firefox and 104, the usual, the usual, NX domain, there is two retries, the same as in Mac OS except there is another two, and if you test it from the ping command only you will see only two and if you test it from the browser you will see two extra retries, so our assumption is browser is causing another DNS lookup. There is no delays in the retries here. No response is 5 second delays, just slightly different from Mac OS and again, the double amount for browser interface. And for serve failures and recursion and refused there is four retries and  and an additional four retries, so eight, total, without any delays. That is the story for one single primary name server and only IPv4, if you add also IPv6 it doubles the packets. If you add a secondary name server it doubles the packets also. Furthermore, there is no caching in at all by default, and ServFails never cashed, so the combination and firebox, makes me a little bit sad, and this can be common issue with DNSSEC because you tend to get more ServFail replies.

This is screen showing, you probably can't see it, it's TCP dump of the ServFail spam and you see that it's 16 queries in.14 seconds so it's faster than you'd expect.

But that was the  we tested in test labs so you are never sure if it's also going to represent itself in real life and for this we had a data, DNS data, over one million packets provided by Dutch ISP and we had a graduate student go through it all and he can confirm that the ServFail spam does appear in practice, as well. It seems to be from Linux clients because of the TTL but we can't really be sure but it's just an indication. We also saw that the way the resolvers handle the, for example, ServFail messages is also interesting because for BIND will retry every ServFail that gets from an authoritative server, so if you get, well, 16 or 32 clients ServFail requests, BIND will double that. For unbound it's different, quite different; if you get one ServFail answer from authoritative server it will retry four times so five total, so the client will get  the client won't amplify the behaviour.

I will skip the time outs and the recurs and refused but it's similar.

The past, present and future: In the past, we did, besides this, an analysis of the code to see if quo find the problem of the behaviour. We did an analysis of how the behaviour affects the total traffic, traffic flows I showed you in one of the first slides, how does it affect the traffic to the route servers, the average TLD server. What we are doing now is DNS health monitor study, we are trying to think about creating DNS health monitor to have an idea about the situation of DNS on different areas, and we are planning on doing the similar study as we did before with the client behaviour but focused on mobile clients because mobile traffic is, the network isn't as capable as a fixed network, and still growing so we are curious how mobile clients will behave.

That is the presentation. Any questions?

AUDIENCE SPEAKER: /KAO*EU from Dutch ISP. I wondered, IPv6 enabled on 

SPEAKER: We tested it with several combinations and the way we tested was with IPv6 enabled, so you saw double the packets.

AUDIENCE SPEAKER: Not only double but for every search list entry you get another try on v6 and v4?

SPEAKER: Yes, you get one v6 and one v4 retry, yes.

AUDIENCE SPEAKER: Roy Arends, NomiNet. Can you get two slides back, please. On BIND 9 and unbound, the second sentence time out: One request, if it is time out it cunt get a response so how is it cached?

SANDER DEGEN: It's a good question. The way I meant this is, the client does a request for a domain that doesn't exist or does exist but the authoritative serve doesn't give a response. While the recurser retries it six times, it will cache all the clients' requests until it gets, it establishes that there is not going to be a response and then gives a response to the client so the client won't get sent back to the server any more.

AUDIENCE SPEAKER: State is being kept, that is what you meant, I think.

SANDER DEGEN: Yes.

SHANE KERR: Shane Kerr, ISC. You said you did an analysis of GLIP C, what was the result of that?

SANDER DEGEN: We will discuss that. It will take too long.

Richard Barns BBN: I was wondering since you were starting to look at DNSSEC if you looked at all at how these clients react to invalid signatures, DNS? Do the records come back, all the protocol stuff works but the signatures aren't valid?

SANDER DEGEN: Yes, something that was really interesting for us as well but we couldn't take that into the scope of this research but it's one of the things we would like to see in the next research.

CHAIR: I have got one last question. You say look at code of G LIP C but did you look at one particular implementation or did you go out and compare it.

SANDER DEGEN: No, one of the most recent versions we looked at. Of G LIP.C

CHAIR: Of G LIP C?

SANDER DEGEN: Yes.

CHAIR: Different and not everybody has the same 

SANDER DEGEN: We focused on the G LIP C that was opened in Firefox setup and specifically that version that was analysed.

CHAIR: OK. Thanks. Well, thanks, Sander.

(Applause)

Try speed it up.

MASATO MINDA: Hi, I am Masato,

MASATO MINDA: This is the second time I attend the RIPE meetings, I of love RIPE meetings but note: My main job is research  but in many time I am battling with PowerPoint, about ten years I like PowerPoint but today I hate PowerPoint. My English is very, very poor and broken, sorry.

I will talk about DNS traffic changes by dns.jp. We started the DNSSEC service at January this year. Started JP domain name, DNS SEC now. Before January we had some events to begin the DNSSEC, until end of September 120 which is networks and software. In October 17 we started signing of JP zone with NSEC3 opt out. October 29, we did, published  rollover, rollover in November 3. On December 10, was studying root zone. October 17 and 29 important events in this presentation. This graph is shown as DNS response, scale, is packet size. Size of the distribution less in  red colour is normal response and green colour also known as NX domain. Some of this is 100 percent. Before DNS sign traffic distribution is between 80 to 200 octets. On the colour, 110 octets and peak value is about 3%. After DNS sign the traffic distribution spread 80 to 700 octets. There are. Three peaks, 110 updates, 360 and 610 objecting at the times. The results. It's implementation like before 9.3 or  CN S has a  behaviour, has no E DNS fail. If it has TC bit then we try DO bit. The peak around 360 or 610, with DO bit. Its implementation has Sec capability. The peak around 360, one NSEC, the peak around  610 has two NSEC3.

In this room, JP zone has three delegation for some test use.

This shows number with DNSSEC ready resolvers. Little increase, little by little.

This is graph of number of the TCP connections. Timescale from October 11 to November 10, or 2010. Before NSEC signed TCP connections are really zero. After DNS signed, TCP connections about 10 per second, October 29 and  rollover, after this day, the TCP connections about 20 per second. Not published over ZSK but  and NSEC three  at this moment.

These graphs are change parameters of DNSSEC. Before October 29, peak around 360, 610. But after October 29, the peak around 360 degrees significant. TCP queries are increasing according to widen of packet size distribution. There are into the few environments which have the 512 octets limitation in UDP.

By the way, what is this spike? The spike is around 430. Under this peak, this spike seen every time. It's value is 427  sorry  427 octets is so popular. This is the anticipate to query IP address of JP's, for example at a.dns.jp, or d.dns.jp or many other combinations. It says 427 octets. Many  this query. Spike at this one. Can you see this? It's value 460 octets, the result of 427 minus 460 is eleven. Eleven is sizable EDNS 0. It's clearly has no EDNS 0. Here is the response size of DNSKEY, 1203, the response of the JP DNSKEY has scale, S KSK, 1 RR SIG by ZSK, will use the double signing KSK rollover but in this setting the DNS response size is 1769 octets. 1769 octets is too big for traditional MTU. Will decrease the response size of DNSKEY. First, we remove the for DNSKEY. Second, decrease ZSK of DNSKEY.

That is all.

CHAIR: Thank you. Are there any questions? I don't see anybody. I guess we will learn what happens after you did change the sizes, what results will be.

MASATO MINDA: About 1,400 to 30  maybe.

(Applause)

CHAIR: And our next would be about Jakob Schlyter about H S Ms and incredible  for long time don't hear anything about HSM. Suddenly, they are everywhere. I wonder why.

JAKOB SCHLYTER: So I briefly mention that at the meeting yesterday when talking about open DNSSEC and I was asked by the chairs to fill up some more of the Working Group's time with this. We had, we as in open DNSSEC and .se had a HSM review performed bay consulting company called Certezza at the end of last year and this is a short summary of the topics covered in this report. The report is covered by creative comments, if you want to use it in various ways.

So, we did this to, for two reasons: The first was to help TLD operators and DNS operators to, in their choice of HSM, so everyone doesn't have to evaluate all those different vendors themselves. But also, to encourage new product development. This is a rather, at least with the DNS community, a rather new area. We have no common view on requirements. We have, maybe, other requirements than the traditional PKI area has so far.

The scope was to limit this to network based HSMs that are certified for FIPS 1402 level 3 or better which is an American certification just indicating certain levels of security features, both to the actual product but also for development. We also wanted decent performance for some value of decent. In this case, it's about 1,000 signatures per second RSA 10, 24 bit keys. We thought that was reasonable number.

We also required a PKCS 11 interface for connecting to open DNSSEC.

I apologise for some characteristics cannot be displayed correct Lyon the screen. The vendors that participated in the review was A E B W E Y PE R, quite known in this area, especially for the use of keeper for root signing. Safenet Luna has been with us for an amount of time with different versions. /THAELS N shield, UK based company  now Frenchish, and the /ULT Mccofrom Germany.

And the test setup was basically to have a separate test LAN with, there was test software used, OD SHS M speed and PKCS 11 testing tool which is now in the open DNSSEC version repository, if you want to look at the code and see what it did, you can take a look there.

The common features that were up for review was or actually the common features that this boxes have, is they have some sort of UI, they don't have any web GUI, fortunately, this is serious stuff, as someone said. You can do some sort of remote administration via a command line interface. They do have more or less extensive support for rolebased administration, security or crypt officers, as they are called, versus operators, the commoners that can turn and enable the box. All of them use some sort of smart card authentication for managing the box, either locally or remotely via some sort of remote device.

I will not go into the detailed results but this is the type of results that you can find on report if you are interested. This is just a view of the supported algorithm and you can see it's pretty full matrix, everyone supports everything, more or less, especially things applicable for DNSSEC which is right now, only RSA, and the hashing functions at least in open DNSSEC case performed in the host itself, so the HSM is not used for that.

Performance varies a lot between the different HSMs tested. And it sometimes varies, depending on the licence you load on to the actual box; for example, the keeper has two different versions and whether you pay X or some amount, number of times X, you get different performance. It's the same hardware, of course. But it's quite reasonable. The set at Luna is the one with the best performance and we have to tweak the number of threads and whatnot to get this type of performance. This is not what you would get if you would just take your simple zone siren and run it.

AUDIENCE SPEAKER: What are the units?

SPEAKER: Per second, not per day. To compare, a normal smart card or small USB type of device that we use at .se for quite some time, performs about one to two signature per second and the  now, infamous oracle card does 13,000 seconds per 

AUDIENCE SPEAKER: Not seconds per.

JAKOB SCHLYTER: You do not want to use 4 K keys (freeze). Something, one thing which is not tested, which could have been and perhaps should have been, was to have other key sizes. Some of these hardwares are very specific tuned to 1024 and 2000 AE bits, as you can see, but there are other issues.

We had different types of communications, backup synchronization, we have security features, not only certification levels but also how you authenticate, if you could M of M support, requiring certain number of operators as for the root. If you could run virtual hosts inside the HSM which was some sort of security domains. And some usability issues, as well. And this is actually where the starts, among these features you can find what is the difference between the products. If you want to read the report, you can find it here. Maybe you want to take a picture. And from the open DNSSEC website. We hope that this could help the Working Group members and others to actually choose HSM if they find it useful, if they require it so they don't have to test all of them. I know some of you have tested multiple HSM. If you have any more results or feedback from such tests, that would be interesting to hear them, as well.

Thank you.

CHAIR: Thank you.

(Applause)

AUDIENCE SPEAKER: Why should I ininvest in protecting my key more than I invest in protecting my zone file?

JAKOB SCHLYTER: Blame shift. I would also say it's not only about protecting your keys, it's also about protecting your pros, operators and move risk from operations. I mean if you protect the keys, of course you have the  still, this is part of the risk am lis you have to do for your whole operations and in some cases it makes sense to use an HSM, depending on how you build your systems, in S E we think it's a vital component for protecting the infrastructure but I could also see people ending up with we don't need no HSM, we just see keys on disc.

AUDIENCE SPEAKER: If I am signing a zone, I am misstating the zone file will the HSM zone correct that mistake or sign it.

JAKOB SCHLYTER: Of course. It's Nash in and out if your keys are not protected and undetected compromise, they could issue false signatures for quite some time before you roll your keys so there are certain scenarios where it makes good sense to have an HSM. I know there are people who don't agree.

AUDIENCE SPEAKER: John Dickinson: I did a similar study when I was at NomiNet, looking at HSMs, and when of the really big differences between them that I found was the quality of the documentation, and it struck me that, you know, if an organisation can't write decent documentation then I don't trust them to write decent security code. Did your study include the quality of the documentation or anything?

JAKOB SCHLYTER: There are two parts of the documentation, it's the documentation required for the certification and that is usually pretty good and where I would start if I would look at what the box is actually doing, that is the FIPS security profile. The end user documentation varies. There are some parts of that in the report. I would ask people to ask the vendors for the documentation like PDFL before they buy it, I usually do that for all our products.

CHAIR: I have to stop it here because we are going out of time. Very quick question there.

AUDIENCE SPEAKER: LG Forestberg: Just in regards to performance, how do you think soft HSM on a decent sized server would do?

JAKOB SCHLYTER: Quite good.

AUDIENCE SPEAKER: Any guess? What is the highest number.

JAKOB SCHLYTER: Depending on the number of cores and size of the propeller, between 1,000 and 2,000, I think, but it's  both given, the HSMs and the soft HSM and what have you, is good enough.

CHAIR: Thank you.

(Applause)

CHAIR: We are running late but between you and the lunch.

SPEAKER: Wilfried Woeber: I am trying to be very brief and concise because we have only a couple of minutes left. There is one important statement I want to make right at the beginning. This is not or intended or going to become an FRoot bashing exercise. This is just a snapshot which I thought might be interesting to some of you. It's a combination of me trying to do a bit of scratching on the tech nothing underneath the hood and having a look at the results of the Atlas probes. This might have even been better placed in the MAT Working Group, but as this is in parallel with my own database Working Group later today and involves a little bit of DNS stiff thought I might submit it to Peter and get had his feedback. We came up with this route server itself is not going to do any dancing, nor would I, but just let's get into the thing.

I started to regularly have a look at the some of the pictures that the Atlas probes gives to me, this probe is located pretty much at the centre of the network operation centre and of our little national research network and as I'm a person triggering on patterns, I am an optical type; I singled out this picture that you can see close to the bottom of the screen here and it looks a little bit weird, isn't it? So, I did a little bit of scratching and tried to understand what this picture actually means, because as usual with many of the measurements, it's not so much the data that you got or looking at the graph; it's more the issue of how to interpret the thing.

So, this is  sorry, I pushed the wrong button. This is the picture I'm seeing from a vantage point which is close to a pretty decently connected network, and you can see that the steady state RTT on the lefthand side of the graph four axis to one of the instances of FRoot is slightly less than 200 milliseconds, which is not pretty good, which is not really good, but at the same time, not a problem because given that the DNS machinery and giving the enact there are other route servers, other instances of route servers around, a decent DNS environment would simply do the right thing and not query that machine, so, again, this is not bashing, not an operational issue; this is just trying to understand what is going on and the thing that goes on here and that triggered me is that at some point in time there was a flip to an even bigger RTT which is close to 400 milliseconds so the first question I started to ask myself is there any point in even carrying a trout an instance of a route name serve has RTT 400 millisecond probably doesn't make sense, but then it fell back to more than the steady state thing but to less than 400 milliseconds and, OK, I didn't catch it at that point in time so I was not able to actually come up with a trace route and do the proper investigation, but then the more interesting thing was that it dropped, RTT dropped for slightly more than a week to a very reasonable RTT in the range of whatever it is, below 50 milliseconds, 27 or 30something and then it went back to the thing that I was used to and then I had a look at the other probe that I am hosting which is hanging off a DSL line of a mass market DSL service provider and it turned out that the usual RTT is in the same range but this guy is seeing a pretty stable situation.

So I did a little bit of investigation and for some of these periods I found out that I actually use a root to an instance which is somewhere in the woods, not that I would want to poke fun at Venezuela or any other place but we are using the path from central Europe across the European national research aggregate, across Internet 2 in the United States going from California to Panama City and then going from Panama City further south to somewhere in Venezuela and at that point in time I guessed we would go on to St. Paulo, we probably hit something located in Venezuela which will la. The question is there is a direct path from the European research network community to Latin America, so why didn't we use this path that should have been faster?

So this is sort of the environment. The arrows here are just pointing at the interesting points in time and the last thing I tried to find out was, actually, who is hiding an instance of this particular route server from us, which would give us some 25 milliseconds, and some of the investigation and some of the explanations I received during this week is that this is sort of an interaction between the announcement of differently sized address blocks for the various instances where FRoot name serves live in. ISC is using announcement of a /23 address block and the more specific. It looks like that some of the parties hosting a local or a locally accessible instance, actually leaked the more specific announcement, which they shouldn't have done, and distribute that more or less across the globe.

And sort of, the open questions, some of them are already mostly answered is the open question is, has anyone else seen similar behaviour of instances of root name servers? What is the proper  what is the proper approach to actually try to understand what is going on and to debug and trace and understand the situation, because in my particular case I think it's quite complicated thing to, from the user's end, debug such a behaviour in an Anycast  this might not be restricted to DNS servers but any Anycast service. And to round it up before we go to the questions, since yesterday I can see an instance of this particular route name server which is well below 10 milliseconds RTT because there was obviously another change in provisions, another change in rooting and this instance is pretty close by, it's about 240 kilometres direct distance from Vienna to where this is located.

AUDIENCE SPEAKER: A couple of things. What you are seeing is normal behaviour, as you'd see from my multihome layers where, with multiple exit points which is basically one way of looking at Anycast. In your case, with the networks, we did try to get January to carry FRoot for more than one /TPH*FPS and it turned out for some reason, I don't know what the users, their control system, it was too complicated for them to do this, so we tried but we didn't, and so what happens is that, now, the whole set of networks that is dependent on /SKWRAO*E an to access the Internet or part of the Internet, is dependent on what /SKWRA*EPB picks around the world which is not directly under our control. What we do to it, rapid detection of events like these where there is a leak, if you have access to the routers or you call the them up, you can determine where the leak is being generate bid looking at AS path because not only do we use the prefix everywhere and we keep a combination of two things in the AS path and BGP path, the original AS is always the same and the next AS you see in the path is AS that is unique to each location where FRoot hosts a local node. So, if you see the next HOP from the first one you can identify where you are getting these prefixes from, and if it doesn't make sense, then that is already a lot of information for us to go and fix it as quickly as possible.

AUDIENCE SPEAKER: When did you actually have this problems?

WILIFRIED WOEBER: Well, problems, this behaviour started, as you can see in the graph, that is the calendar week so this is pretty recent.

AUDIENCE SPEAKER: 2011?

WILIFRIED WOEBER: Two or three weeks ago.

AUDIENCE SPEAKER: So I was seeing some problems in late 2009 with /SKWRA*EPB and FRoot, v4 going to Latin America and v6 over to Hong Kong I think and I knot notified both ISC and /SKWRAE ant and /SKWRAE ant fixed the issue fairly quickly. Basely if you notify the appropriate people, it won't take that long. They will know had you to debug it.

AUDIENCE SPEAKER: Wolfgang, RIPE NCC, operator of Kroot. Just in terms of Kroot, we had a similar strategy where we used more and less specific prefixes depending on our node capacities. We just recently actually decided to get rid of this and have one consistent prefix announcement size because our experience was that it produced more problems for us and is hard to troubleshoot than it actually solved for us in the same goes in terms of Anycast, for a lot of the other matters AS path prepending etc., it gets very, very hard to control if you have a lot of locations from which you do this because the side effects are basically, you cannot estimate them before you did it and in retrospect it's always hard to figure out what the effects were.

CHAIR: Last quick question.

AUDIENCE SPEAKER: RIPE NCC Robert, I am one of the people who is supposed to be responsible for this Atlas thingy. So I am in a difficult situation because we have so many good ideas that we want to do with Atlas and we have so little time do that because people like you, we appreciate your questions but you do come up with these  do more do more. So one of the things we really want to do is some kind of tiein between trace route measurements and RTT measurement, so for example, if we observe a sudden change, then we want to say what is going on, what is the change in the path. Maybe even a warning system based on that and that kind of stuff. This is in the making and we want to do this, you are not the only one who say this would be very useful, my probe could immediately do this automatically if something changes, I don't have to go to my equipment and the whole benefit of having such a probe is it can you see the thing that makes sense. So we believe this is the way forward and we are going to do it.

CHAIR: Thank you. I cannot avoid one comment. Looking at ICMPs are interested, you only measure when the light is on. Unless somebody is home and you get an answer, you don't see. You would would like to  between you and the lunch is now, any other business? Good. There is no other business.

(Applause)

Thanks for the staff to support this meeting, as well.

LIVE CAPTIONING BY AOIFE DOWNES RPR

DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.

WWW.DCR.IE



54