Archives

These are unedited transcripts and may contain errors.


DNS WORKING GROUP, 4 MAY 2011 AT 11:00:

CHAIR: So good morning. This is the first session of the domain name system working group here at ripe. Yes, some people are taking this last chance to escape, please do so. We are going to have a big of a packed agenda this morning. First some administrative. I will introduce myself after talking. Then we'll have a report from the RIPE NCC on the recent events and developments and future plans, an IETF report done by myself. We'll get an overview of the latest developments. We'll have a live development to be done. Brett Carr from Nominet will give us another update this time on developments in the second level domains underneath and we'll have a talk from Andrei about DNS anomaly detection. So that's the agenda for today. We'll have another session tomorrow morning. We'll reshuffle that so we'll got go through it now. As promised, I should introduce myself and cochairs. I'm Peter Koch, one of the three cochairs of the DNS Working Group.
And so if you have any suggestions on what the working group should actually do, any feedback on the presentations or whatever happens inside the working group or what you think should or should not happen, please talk to us or send an email. So with that, I would like to invite Wolfgang  sorry. It has been a long week. I'd like to invite Wolfgang to the stage to give the up state.

Wolfgang: Okay. Good morning everybody. I work for the RIPE NCC DNS services manager. As stated by Peter I'd like to give you a brief update on what we're currently working on, future plans and statistics. First this is probably known to a lot of you but there's some new stuff in there. The services that the DNS department is carrying out for the ripe community and the RIPE NCC, first and foremost, we operate reservice DNS for the /8s in IPv4 and the IPv6 space that we allocate out from. We also do the same thing as the secondary for the other RIRs and we operate Kroot, one of the 13 DNS root servers. New here is F reverse. This is a system that I'll go into in more detail. There's also going to be a presentation in the DNS working group session tomorrow by Dave Knight that is more detailed about this overall system. We also do secondary DNS service for ccTLDs for developing countries. For the ENUM working group here at the ripe meeting we operate e164.arpa and.

In terms of projects, one of the major projects we've been busy with is Anicast cluster. At the moment this consists of two deployments one at London and one at AMSIX in Amsterdam and it's already carrying a variety of critical zones, amongst them the F reverse zones in addr.arpa and ip6.arpa. And it also carries all of the primary zones and as a secondary for the other RIRs. F reverse specifically is only carrying addr.arpa and ip6.arpa. That has been specified in the IETF as the best current practice document. And at the moment, those systems are operated by the RIRs and ICANN under the IANA functions contract. Our provisions system, we roled out a new provisions system in Jan and it has been in production since then. The main thing to point out is what we changed with that is we changed from a periodic provisioning to a new realtime provisioning using dynamic up dates. We intend to extend the provisions features, that was the main goal, so that we can provision ERX space like any other space. And we also want to support zones that are smaller than /24. I'll give more details on that in just a bit. And we also noticed that the current delegation checker that we're using for the provisioning is quite complicated to use and our main goal in that is that we want to simplify it down so that it only performs checks that are vital to the zone's operation.

Well, now, one of the worst topics in my presentations, I'll start out with the ugly here. The ugly in the DNSSEC, we encountered a problem. On the 15th of February wet an outage in the E 164.arpa. Our signing system missed out over the signature over the DNSSEC key set. It renders it useless in terms of DNSSEC. We did some analysis with our vendor and unfortunately they could not reproduce the problem with the data we provided. They eventually concluded that it was probably due to high loads on the system during the role over. That brings me to the bad. On the 14th of April we had another one, it affected RIPE.net and one of our reverse zones. The problem here is it has the exact same system but didn't have any high system load. However, that brings me to the good. We could provide the vendor this time with enough data to reproduce it in the lab and we are at the moment awaiting a bug fix release which we're going to put into production before our next role over. What those outages got us to think of is we're not the only entity that had those problems and we'll probably not be the only one that has them in the future. As a matter of fact, a lot of code that is used for DNSSEC is not mature enough. And engineering folks have a hard time blindly trusting it. So we started this discussion at the meeting in San Francisco to ask other parties for their input into what they think a safeguard that we could put in place before something like that goes into production systems should actually do and should cover. We had a lot of input on that from large registries amongst others, SIDN, AFNIC and DNEC and initial work has started in NL Netlabs they're working on tools that would allow something you could consider as a proxy service that takes in a zone in one end and will only provide it on the other end if it validated against a certain set of trust anchors that you had to specify before. So you could use that andle validate the zone against the trust anchors you know are in your parent zone or published on the website. What I would like to point out here, this is a mailing list provided by NL Netlabs and anybody in here who has thoughts on this or would like to get involved, I encourage you to join this mailing list and give your input.

Moving on, better parts for DNSSEC for us. Since the root zone has been signed last year, there has been a lot of progress in TLDs being signed. From the domains that we have  my laser pointer broke down. From the domains, the blue part on the lefthand side here shows which ones checked provision so that's the vast majority. We recently enabled our IPv4 reverse zones in the addr.arpa parent. On the righthand side the only zones or parent zones more specifically, that we do not know which we will be able to put the trust anchors into them. However, we have a few that announced some plans to us. How do I go on? So this represents the same thing, but this is all prediction, how it's going to look by the end of this year. By the end of this year we expect to have only three zones left that we cannot provision our DS records into yet. And we believe that that's actually a huge progress in less than a year since the root zone has been signed.

Our regular statistics inside of our reverse zones, there's still an instead increase here. Unfortunately we couldn't carry on this trend but at least we're not decreasing and the total is about 450 delegations on our records in our zones.

Our regular information about Kroot, Kroot operations are stable. There's no major changes in traffic pattern. We peak at 25,000 a second and operate that with 18 Anicast instances. The one thing to mention here is with the instruction of the F reverse and the others, the in.addr.arpa zone was moved out and into the new systems. That changed there. I have some statistics on that. For addr.arpa we peak at about 6,000 couriers a second and we operate that from two instances, London and Amsterdam. For IPv6 we have a peak at 1200 a second. One thing to note, there is a huge difference in the diurnal pattern between IPv4 and IPv6 that we did not analyze yet, but I would be curious if you have theories on why that is the case.

Going on, statistics about IPv6, again similar to the delegation center records, we see an uptake there. Some of us might not like the speed but at least there is an uptake and that is a good sign. We are at about 500 queries at the moment. TCP queries and Kroot, there was a concern that they might get a strain on the root serve system. This chart shows that this is not really a problem for us. We have 40 queries a second, TCP and this peak was when the root zone was roled out. Since then we're stable so there's nothing to worry about here.

Going on to the future plans, our high priority for us is we need to extend our analytics. We need to be able to get a more clear picture on what happens with the query streams against our systems that includes Kroot, F reverse and AS 112. We want to run Apache Hdoop based cluster to do parallel computation on those traffic streams. To give you an idea of the amounts of data for K root alone we are talking about , 2.1 terabytes of compressed data per month that we have to deal with. What we intend to do is we have to do some development about the reading of P cap and we want to make that available in Ripe Labs. For the Anicast cluster that I mentioned at the beginning there's two major systems that we're going to migrate into the Anicast cluster, NS.ripe.net, to host their reverse zones as a secondary. It's carrying quite a number of zones at the moment with 4,500. The more complicated one is ccTLD service. There are a lot of dependences there, a lot of communications involved but we're confident we'll be able to finish this migration by year's end.

I also mentioned this briefly, zones that are smaller than a /24, we currently provision that manually using the specification in RFC 2317 and we want to move on to automatic provisioning. Our thinking is using this notation here where it will support a dash notation with which you can specify your site in the domain object.

For the dash notation there's a few changes necessary and it's pending with the database working group. If you have input on that and cannot be at the working group I'm happy to convey it, otherwise I would input you to give it right there. That proposal consists of two things. One is we want to get rid of this dash notation in the third octet. This has  this notation is quite confusing to people. What happens here is when you submit the domain object like that into the database it will not be stored with that name. It will be expanded to a hundred separate objects and if you want to modify them the next time you have to send a hundred distinct updates. Another issue is that this notation cannot be supported with our current DNSSEC because you can submit DNS records but not public keys, if you submit the DNS record to this, the DNS record contains the delegation name in it. It would change for every hundred of them and change the harsh with it. So simply set we cannot support the DNSSEC. Our suggestion was to deprecate and remove this feature because it's also not widely used.

The second part is allow such a dash notation with a slightly different implementation, it's going to be that we're not going to expand this dash notation. The object is just stored under this name. And the provisioning system will understand that this is a less than /24 and will go ahead and provision it accordingly.

And a large part of my presentation, I want to give another update from the database working group. Their action point on the clean up for forward domain data. There have been 33 ccTLD using the ripe database to use it for provisioning and the database working group decided, I believe together with the DNS working group that those should be removed. There have been 40 of them removed already. Most of them didn't use it actually anyway. And that leaves at the moment three which are actively working on getting their data out of the ripe database. And eventually once they have moved out, we'll deprecate and remove the syntax. And with that I conclude my update and I would be happy to take your questions.

AUDIENCE SPEAKER: Wolfgang, DNS working group something or other. Question really for the working group rather than yourself. Because of that DNSSEC fear that you had, I wonder if there's a need for better mechanisms or processes for reporting validation failures back to the zone owners or signers. I wonder if you had given any thought to that and is this something the working group should consider doing work on?

SPEAKER: We had the same problem, of course, thinking of e164.arpa and outages that could in the future the more validation we see cause problems in terms of people not being able to communicate that problem to us. We did not come up with a good solution to that just yet and I would be very happy to get input from the DNS working group but also the community outside of the ripe community here on what they think we should do about this.

AUDIENCE SPEAKER: Thank you.

CHAIR: Any other questions? I have a question. You mentioned the transition of the reverse space addr.arpa to the route servers to dedicated systems. Did you do or have the chance to do any analysis on the query counts? I think I saw some patterns on your slide there. But was the reduction of the queries at the root name servers consistent with the queries that showed up at the 

SPEAKER: Yes, I can not speak for all root servers but from the ones that we know, it is consistent. However, the shift  because the RIRs are operating this, so some of the all rights for the new F reverse, A reverse and so on, have different query loads at the moment. I'm not sure what those numbers are. Looking at K and F reverse, we had those drops and stuff consistent. We didn't see more traffic nor did we see traffic disappear.

CHAIR: Thank you. Any other questions? Okay. Thanks, Wolfgang, for the report.

(Applause.))

CHAIR: Okay. So I'm trying to give you a bit of of an IETF update as we usually do. Basically and mostly considering the previous IETF that took place in Prague earlier this year involving lots of beer and good food and of course the domain name system.

So, first of all, there are a couple of DNS related working groups in the IETF. By any chance, any of you following any of these working groups? Wow, cool. So there's DNS extensions, which is  I'm only talking to those of you who didn't raise your arms. The DNS extensions working group that is mainly concerned with the further developments and maintenance of the protocol, the DNS operations working group that would focus on developing best practice and look at DNS use in other protocols, ENUM where this community does have a separate working group for, and a new working group or rather new one called DANE, DNS Based Authentication of Named Entities: We'll get into this in a bit more detail. Of course there are more IETF working groups and since the DNS is so cheap everybody's using it, some of the working groups appear to be less DNS related but then the Devil is in the detail. There is the behave working group which was set up to deal with or tame the nice NATs and is now dealing with v4 v6 migration and involving DNS 64, for example, so there are documents in there. I'll not go into these. There's the multiple interfaces working group, as the name suggests, would deal with systems that have multiple interfaces, multiple connections to the Internet, of course, being single minded this day. But of course the DNS is of one concern there because usually we have a configuration for the local revolver to have a domain name and for a stop revolver to have a recursive to query, but we are usually assuming a unified name space for this and this might not be true in a multi access environment, private addresses. There's some interesting stuff going on there.

Finally the PROVREG is closed but the mailing list has been kept open. There was some recent activity there for EPP extensions for land rush and sun rise and this is linked to ICANN to open up gTLD space for new TLDs and people are try to get the big bucks.

Let's look at the individual working groups, DNS EXT had a small agenda, more or less two documents on it. The one was about IDN variantse and the working group has been looking for requirements for some time. The problem has been how do you translate the layer 9 requirement? Something should lead to the same user experience into anything that would be close to a technical criterion. And there has been some progress on the requirements document there, but all in all, people are more focused on their particular on solution than getting a set of requirements independent of those solutions and it's a challenging task.

Second document on the agenda was a draft on revolver optimisation and they suggest that  actually there would be current practice today and that should be codified which includes aggressive negative catching, yes we could sin they size X domains from responses, we could deduce that nothing underneath it exists. This is kind of not really in line how the resolution algorithm work but might have some advantages. Also dealing with suggest suggestions to increase credibility so that resolvers can learn when delegations change. It's about local optimisation, getting the resolvers more intelligent versus does it have to have in the way of standard or current practice, versus oh, yeah by the way, we're violating the architecture you are because there's a wide spread way the algorithm works and we're doing optimisation there.

DNS operations working group, we're dealing with 4641. It's in working group last call for another two weeks. DNSSEC practices. If you have any feedback on that you should look into this if you're a DNS operator dealing with DNSSEC and give feedback to the list Oregon the editors who are in the room.

So success, we have an RFC out. As of last night RFC 6168 has been published. It gives requirements for management name servers for the DNS. In tomorrow's section, there will be a presentation on proposed followup work, and we can get into more detail then.

Also the AS 112 documents are mostly done but there's new work knocking on the door and people proposing to do AS 112 like things.

ENUM, that's an easy thing. The final round of documents has been published. The working group is about to be closed. And there will be a very detailed presentation by the current acting chair of the working group in tomorrow's session of the RIPE ENUM working groups. I'll refrain from any detail here.

Finally there's DANE. This is the working group dealing with certificates in the DNS. We've heard about doing SSHFP, fingerprints in the DNS, have it in the DNS so you can trust them. The basic idea in DANE is let's do the same or implementation. Let's put them in and sign them so we can trust them. That's one of the fishy issues here, how much trust is justified by DNS signature over something in the DNS? Because remember DNSSEC was invented to protect the transport not to increase the credibility of the data that is already in there. The working group has a couple of action items. There were intense discussions, heated debates, in Prague. One of the action items is to think about integrating this  easy idea of putting certificates in the DNS into the TLS processing chain. Kind of familiar scheme here. There would be a deployment curve, so there will be migration and coexistence, which is a theme that has been dealt with in the ripe community for different protocol. Currently the working group is dealing with use cases document that has been in working group last call for two days that will actually address these questions, so now I have found the certificate in the DNS but also the server has presented me some certificate. Which of these should take precedence? How should I deal with conflicts? What's the actual idea behind this? Can the DNS be used to constrain the number or shape of certificate authorities that are supposed to or allow to certificate anything for this particular server? There's work on the protocol which involves designing a record type and gory details on the wire level. Very active working group, very many participants, diverse audience, the killer application for DNSSEC that we have all been looking for for a long time.

That's basically the report. Any questions?

AUDIENCE SPEAKER: Jim Reid again. There were some mumblings about having some kind of barb off or some kind of BoF in DNS API. Do you know if that actually took place? If anything significant came out of it?

SPEAKER: Attempts to standardize API's application programming interfaces do have a history in the IETF and the IETF is kind of adverse to take out work on this because they usually point to entities like whoever does Posex these days, so yes, there was a barb off, something, so an informal gathering. But so far this group hasn't come up with an Internet draft or something to nail down. But maybe Jaap has more details.

AUDIENCE SPEAKER: It was invitation only and very formal and I wasn't invited. So I don't know what happened there.

SPEAKER: Thank you. Jakob Schlyter.

Jakob Schlyter: My name is Jakob Schlyter. I work with Open DNSSEC and I'm here today to give you a short status update of what we're up to for some time and our plans for the future. Anyone using Open DNSSEC by the way? Should be about 200 people. Yes, thank you.

Open DNSSEC is what we would say at least hope and planning for a turnkey solution for DNSSEC. It's a signer only, doesn't contain any name server as you normally consider name server, contains some components that speak DNS but you wouldn't use it as a name server. The key features are as many of you know, it's a policy driver configuration derived from a policy description of key parameters. We have support for PKCS 11. We have no dependencecy on OpenSSL or other cryptolibraries. We are able to share keys between zones, if you have a number of zones like 50,000 or so maybe you want to share keys and pool them among the zones. And we do currently scale to about 50,000 zones. These are all features that we believe only exist in Open DNSSEC currently. This is, of course, not a competition, but you know.

These are the original authors and the current work has been mostly centered between .se and NL NetLabs and Nominet. We are currently in a phase where we are trying to get the organization in better shape. Mostly to give better support, to do this we created a company to be able to  /EUGTS a none for profit company. We will be able to provide longterm support for development, secure funding, provide training classes. Some of you have already been attending our classes. We have new classes scheduled for later this year, May and June. And there will also be some consulting services and it will  there could be profit but it has to go back to the company itself, due to this type of construction.

The owners, which are currently in process, is CIRA, Nominet,.se and SIDN. If you're interested in the contributing with resources or money or whatnot, please get in touch. (Froze.)

We have an architecture you are board, we had our first meeting yesterday, these are the people who will set the longterm goals for Open DNSSEC and the people you can talk to if you have any ideas or more strategic technical issues. Some of them are actually here, like the first one.

So we had released a number of smaller versions beginning last February. We started off by quite a number of dependencies, we had a python based signer, we moved into 1.1 later, we increased performance, we store key method data and supported EPP. We said goodbye to python earlier this year. We have a new signer written in C with all zones in memory, which is good and bad. And we have a new lease scheduled for any day now which has a multithreaded signer engine with vastly increased performance if you have have large zones, like .uk or .se, whatever, when you want to have a lot of threads working sign on simultaneously, if you do a complete resign. Also some HSMs, like the oracle one, only gets reasonable performance if you have multiple threads. In the oracle case you need about 100 threads to get something out of the HSM.

Later this year we have new releases scheduled with support for input and output adaptors which means you can have your zoned data in MySQL, this is all plugable and expandible for your pleasure.

Later this year we will  we will refactor an enforcer, the component of Open DNSSEC, we'll calculate all the keys to be use and had create keys that will support algorithm rollover, support transition between NSEC and NSEC 3 and give more performance if you have a large number of zones, like large DNS operators. I should also give a small heads up on soft HSM, we released a new version this morning and there will be continuous work on this as well. As I mentioned we have training classes. They are free to attend. That is, it's free to attend but you pay your own travel. We have them in Stockholm and we have them scheduled for may and June and September, October, November. They are two days each and you'll be very welcome to join us.

We will also provide the lab server setup we'll use in these training classes. We'll reproduce them in Amazon EC2 and the images will be available for free. So if you want to continue training back home or train your costaff or colleagues or whatever, you can grab them and buy a couple of them if they can keep the data for you. I'm not sure.

We have quite a number of users, top of top level domains are using open DNSSEC in production. ICANN is using DNSSEC for all zones except the root and SurfNET are using it. We also have large operators, also really stars looking into using Open DNSSEC, which we think is great.

One last thing, together with .se, we had a consultant company wrote a summary of network based HSMs report. This report was published in December and it's available on this slide. It will compare the recent HSMs available today which AP keeper, Safenet, and the cryptoserver. They all have their propers and cons. This is a pretty good summary. If you look into buying an HSM, I would recommend you read this report before you start your own tests because it could save you some time.

Any questions? And if you don't want to ask them in public, feel free to grab me after the meeting and I'll try to answer them as I can.

CHAIR: I guess that was exhausting and complete.

Jakob Schlyter: We even have time left.

CHAIR: You're five minutes ahead of schedule. Bad you. Thank you.

(Applause.))

CHAIR: Next on the agenda is Shane Kerr who will give us a live demo of BIND 10, so fasten your seat belts.



Shane Kerr: I am Shane Kerr from ISC. I'm the BIND 10 program manager. And I sent Peter a request to do a presentation here and he said no slide wear please. So hopefully we'll manage to get a demo going here.

I swear I tried this.

AUDIENCE SPEAKER: Yeah, yeah, yeah.

AUDIENCE SPEAKER: Sing a song.

Shane Kerr: It's the demo effect. Oh, no, say it ain't so. Nothing?

AUDIENCE SPEAKER: Just tell us what you see on screen.

(Applause.)



Shane Kerr: We actually did try this beforehand. This sucks.

It's a good thing we have an extra five minutes, right? We'll try this with SSH. Don't try this at home, kids.

CHAIR: Your reverse mapping is not working maybe.

Shane Kerr: Crap. Maybe we should try someone else and sort it out over the break. I apologise, everyone.

(Applause.))

AUDIENCE SPEAKER: That's what you get for being courageous.

(Applause.)

Shane Kerr: Turning to our are regular scheduled program. I've been working on a project called BIND 10 for the past couple of years. I thought it was time to show some of what we have going on. Here's my presentation here. I'll be uploading the slides later. As you see we've already should some problems.

So I the first thing about BIND 10 is that all the source code unlike our previous versions, we have our get repository available, you can down load the latest version at any time. There's nothing super special about that, if you've ever used GIT you'll be familiar with how this stuff works. Use your GIT clone and it does the right thing. There we go. It takes like 20 seconds or so normally assuming my v6 is working. You can down load the latest versions from this morning. I don't recommend this for normal use. We do snap shot releases every six weeks but if you want the latest and greatest, any changes that happened in the last six weeks, this is the right way to do it. Here we go. Anyway we don't have to wait for that to finish.

Right so it's the normal it's the normal thing that you would expect from any modern software, config make, make install it, I've already done that in advance. And the reason I did that, it takes quite a few minutes for it to compile and the compilation takes 20 minutes, something like that, which I don't have time for.

So I'll just start up the screen here so we can start the server. I want to start the server  there we go. I'll do it without the  oh, shoot. I'll do it without verbose mode. You can tell this is a true live demo. Waiting. Waiting. Server starts. Pretty much what you'd expect. It runs and gives a bunch of debug information. We're working on integrating our logging subsystem. The currently log in gets printed to standard out. We have a nice new logging system which has unique identifiers which will make debugging easier in the future and translation into other languages easier. Right now just presenting to the jean. Other the left, the BIND 10 awe, that's the  BIND 10 is the main process that wraps everything. What this is all about is  let's see. So you can see we've got a number of different processes running. I mentioned this in a few presentations about BIND 10 in the past. It works like a post fix, it's a set of cooperative processes running. The first one is the BIND 10 process, that's actually a Python program that runs and runs and controls the whole system. And these other processes are actually performing the function alternate, so  you can guess what they are from the names. We prepended each process with B10 dash and we run ran these multiple processes for a number of reasons, one is scaleability, another it gives us fault isolation, and the reason we want fault isolation is so we can  if one of the processes dies for whatever reason  let's kill  let's try killing out outbound transfer process. It's running its route. We'll make sure it's really really dead. There we go. You can see the master process running things will detect it died, tell you how it exited and restart it. If there's a bug in the outbound transfer or it gets killed for whatever reason or maybe a hardware error, it will stop that process and not take the entire system down. I've seen systems with kernel bugs in where you have processes lock up, in this case part of the system functionality will be affected but not the whole thing

We set it up right now so if you kill a process in rapid success, what we started instantly, there's a back up algorithm for it. It just does  when you start it up it does the normal thing, the version dot bind, all the developers who worked on it and things like that. Right. And I want to show you  well, okay, you can see it's  what happens if you don't have any zone data, you do a normal thing and get refused like you'd expect. If you shut down the server, I'll use Control C to do that, it shuts down all the supporting processes. As an administrator you don't have to worry about cleaning up all these things. The server will handle all the management processes and things like that. It may seem scary and weird for people used to running a single process but it's actually okay, quite normal. That's just the server starting and stopping.

So, yeah I put a little demo zone. It's not there. And so you can see the loader zone off the side. We have a little program that loads the file. We have a number of different backends that we can run. So it runs out of an SQ database, so it's not  we need to read the zone file and put it into the database so the server is use it. If I do it again, you see we have data for that. And I'll show you my nice little

Right. So, yeah, I'll give you a quick view of the database. If you're familiar with SQ light it's nothing weird, nothing unusual, pretty standard set up. It's kind of what you'd expect. We've got data in here. It does mean it's a little bit slow if you're using the SQ back end but fast start up times.

I guess I might as well show this one as well.

We also support sending signed data out as well. This is NSEC 3, I think. Yeah, I don't know. Anyway, the server also supports NSEC. At the time it was the only SQL DNSSEC server. Right now you can use power DNSSEC for that also. I think they're legal in terms of stability and reliability now because they're just released.

So I guess that's most of what I wanted to show you, just the server starting and stopping. One last thing before I'm done here. This is BIND 10 running. It's nothing super fancy, it's just a name server from one point of view.

We also have  okay, so we also try to change the way the configuration works a little bit so we have a little config program that you use to change the system settings. So instead of changing configuration files, you'll actually  you can actually show  we got tab completion and stuff like that. You can see all the different configuration operations possible on the server. These are all queried from the server at run time so configuration programs don't need to know a lot of what the server is doing. It allows us to extend the server later without having to update the command programs and it's easy to right write a program, find out what options are possible and not change it every time the server changes it or know a lot about what's happening behind the scene.

I guess that's about it then.

CHAIR: Thanks very much, Shane. And thanks for trying hard.

We have time for one or one and a half questions, anybody.

I have a question for you. Who would be the brave souls putting this into production at the moment?

Shane Kerr: Right now we have it in production at ISC only for BIND 10 itself. So the BIND 10 dot ISC.org server is served by by BIND 10. Several of the vanities fun terror for their own domain. Talking to the operations team this week or the managers here and I'm going to try to convince them to run it on SK 112. Hopefully that will happen in the next few weeks. Also we're going to be setting up with for the recursive side an open revolver using BIND 10 as a revolver.

CHAIR: Okay. Thanks okay. Thanks very much.

(Applause.)


CHAIR: So we have two more presentations before the lunch break.

Brett Carr is on his way up to give us an update on the DNSSEC status.

Brett Carr: I work for Nominet. This morning I want to give you an update on what Nominet have been up to in relation to DNSSEC and a few plans we have for the future as well. I'll talk a little bit about .uk DNSSEC and how it was signed and it's current status. Some issues in 2010 with.uk, the deployment we're going through currently with SLD DNSSEC and a service we have planned for the future. So, .uk was actuallysigned in March 2010, just over a years ago. We use Open DNSSEC running on 64 bit centos, using oral HSMs. We run three sets of identical hardware and software split across two different locations. We didn't need to deploy deck but we decided to do that so we were consistent across all our infrastructures, all our zones because the SLDs need to be.

We have ZSK roled every six months. That's done automatically. KSK roled every three years. And a low TTL on DNS KEYs so if there's any issue in the future we can resolver rapidly and the time out is fairly quick, an hour.

So some of you may know already, we had a slight hiccup in our DNSSEC system in September 2010. In fact, we had several problems occur all at the same time which didn't help. You can ensure against lots of things but you don't expect several sets of lightning to strike at the same time.

So first of all we had an HSM hardware failure which use caused an OS panic. It happened on a Friday night about 8:00 at night. The operating system rebooted and started to look at it. Following that the HSM was locked so this was something we put into the design originally for security purposes, if the system reboots the HSM gets locked and the keys are unavailable for use. One of the problems we had, though, because of the outage, this kind of got lost somewhere and we spent a fair amount of time trying to figure out why the signing box wouldn't sign us on and it was because the HSM was locked. As a caution, the error message that comes up isn't the HSM is locked you get some bizarre error.

When we realised that was the case, the security procedures we have involve unlocking the HSM security officer and that's senior management and we didn't want to call them out on a Friday night. And the system is designed in such a way that the signatures are quite long and the.uk zone doesn't get many up dates so there wasn't any urgency to fix it so we decided to sleep on it and get together the relevant people the next day. Got together the next day and decided we didn't know the stages, what the issue was, so it fell over to our back up system off site. It went through various procedures and checks to check that the system is ready to sign us on and we went through that and everything looked fine. We didn't see there was a strange sort of bug stroke interdependcy in the way we sync the Open DNSSEC config from one server to another and that caused a stale configuration file that opened the DNSSEC write on the fly and assign the engine what to use and one of those was left behind and that caused us to start using an old key. So effectively we did a rollover, instantaneously. That caused a problem for anybody who had the keys catched because they couldn't get in the zone very suddenly and they wouldn't be able to validate until the cache timed out. We couldn't do anything about it except publicize it to as many as possible, if you're having issues thenflush your cache. At the moment not that many are doing validation, as time goes on  one of the reasons to public size it, is we've learned from it and when DNSSEC is more deployed less of these issues will occur.

So we looked at what happened and how we could learn from it, and what we came up with was several things we've changed in our systems. We don't lock the HSMs anymore. This is kind of a trade off between security and operational agility and all of our servers are in locked data centers where only Nominet staff have access. Physical security isn't that much of a concern really. We've changed the model that the HSMs don't get locked. We've greatly improved the checking procedures, much more checking to the point where we would spot the error we made last time easily and there's much more checking basically.

And we've avenue reduced the TTL and DNS KEY so if this outage happens again we've only got an outage window of up to an hour.

So moving on, we're in the middle of deemploying DNSSEC on the second level domain. You can see all the levels there that Nominet serve. Those zones are very charge, most of them, and dynamically updated on a consistent basis all the time. This is quite a challenge because we need to continue signing basically. So to do this we've been testing for quite a long time and are now deemploying bind 9.7.3. This is simple to deploy. Use DNSSEC key gender to generate the key, add timing, add these two items to your code zone, and tell you how long you want the signatures to be valid for and it will go in and send up and those up dates get signed as they come in.

So current status is we have me.uk signed a week and a half, co.uk last week and .co.uk which is our biggest zone is currently validatable. One more thing is we're only using a single key in our SLD zones. Most of you are aware that one is used to sign the other, we're only using a single key that's because we're the parent and we don't have an administrative overhead.

Speaking of rollovers, we're breaking the model here as well. We've decided not to do scheduled rollovers. We have a set of criteria. When we want to do a rollover we'll do one. We won't schedule them or announce them. We don't need to announce them. The general Internet shouldn't notice if a rollover is done properly.

We will  we expect to be able to accept DS records for signed zones from our registrar  from the registrars who have indicated they're capable to do so from the 18th of May. I'm not sure there will be but we have some interest and we are talking to some registrars.

So the last thing to talk about is Nominet have been working on DNSSEC for quite a long time and we're approaching the end of deploying it on our infrastructure. But we actually really really want DNSSEC to go further and push it into the rest of the Internet. One way we thought we could do that is to operate a signing service. So basically what we're going to do is a registrar gives us an unsigned zone, we'll have some infrastructure in our data center, signs that zone and we'll sent a notify to them and they can pick it up from us and any Jones zones in the.uk we'll add it to the relevant zone. Initially we'll only be doing it for.uk but planning on doing it for all domains and we'll hand the DNSSEC record back for the registrar to put in the relevant zone.

That's it if anybody has any questions or comments I'll be happy to attempt to answer them?

CHAIR: Thanks Brett. Questions?

AUDIENCE SPEAKER: Can you go back a couple of slides. John Dickenson.

Just about your noscheduled rollover, just to clarify that, because you want people to use the root key rather than using 5011, right?

SPEAKER: Can you say that again?

AUDIENCE SPEAKER: You're doing no scheduled rollovers because you want people to use the root key not because you're doing 5011?

SPEAKER: It's just  we don't see  we want people to use the root key definitely and we don't see any reason to do a rollover if there's no reason. It just adds complexities.

AUDIENCE SPEAKER: You're not doing 5011s?

SPEAKER: No.

AUDIENCE SPEAKER: That could be another reason for people not to need to notice.

SPEAKER: Right.

AUDIENCE SPEAKER: If I understand correctly, my name is Jim Reid of DNS working group or something. If I understand correctly SCH is a mini registry of UK schools, what if the department of education decided they didn't want to use NSEC 3 but wanted to use DNSSEC Bis what impact would that have on your architecture you are and processes?

SPEAKER: That's simple for us to do. We can sign any of those zones with NSEC or NSEC 3.

AUDIENCE SPEAKER: But this idea of a single key

SPEAKER: It's a single key per zone.

AUDIENCE SPEAKER: Okay.

SPEAKER: The single key  we're not using a KSK and ZSK, we're using a single key for each zone. If anybody else wants to ask any questions outside? Any registrar from his the UK, I'll be happy to talk to anybody any time?

CHAIR: Thank you so much and congratulations.

(Applause.)

CHAIR: It brings us to the final presentation for this working group session. More to come tomorrow morning. This is brought to us by Ondrej Sury.



SPEAKER: This is the work we have done in our labs. So what we thought what we can do to detect anomalies in DNS which are not obvious from DSC, and the guys who are working on this project have loaded several papers and we have chosen this one which is based on some statistic methods. If you really want to know what's behind, just go read the paper. We modified the work of this method with DNS traffic, because the original is just for ripe networks.

The steps used in this method is that each packet is assigned to something called sketch using universal harsh family and the data or sketches are aggregated by the time at various levels. You can finetune that. With some statistical modelling using gamma distribution and other statistic stuff, the sketches are computed distance between the sketches and the reference parameters. And if the distance is over this threshold, then it's called anomaly, then you repeat with different harsh and intersect the the results and what's left, that's interesting.

We implemented two policies, the one is SrcIP and only the source IP is used as a connection identifier. You can give the destination IP port show no variability. And second one is Q name where connection identifier is a query name. We will probably implement more policies in the future. So we elevated 12 from one of our DNS servers. We filtered that to only contain DNS queries, and the evaluation were done on the tenminute time windows with no overlapping. We have found some results which could be split into the single spikes, repeated spikes. And I will show you just the visualization of those results. This is the AV home page. The scale is algorithmic and the policy is Q name. This was found  was found only in one of the tenminute window and it was not repeated in the 12 hours  we don't really know what happened but where there's spikes around 50 queries per second, which you can't really see from other data.

Then  this is also repeated spikes. This was analyzed as the FTP mirror of several distributions, it was found by both policies, and the green bar is the single IP address which created something like 200 queries per second at one point in the time. The IP address was also found in the second policy source IP address.

Then we found something that looked like the Spider BOT for pictures. This is the first line  well, it can be seen in the DSC results as well so we just check DSC results so we know it's  well, we know some  there's something there. And well this is the start of the spider BOT and this is the end, and well, what can be seen in the DSC, that it's a regular pattern and while we further analyze this and the queries came in alphabetical order. Then this was some polish IP address and did some name server scan. This could be seen in the DSC. This is the graph when it starts and when it ends. You can see it's quite intensive.

And this is what I'd call the other unknown. We didn't know what it was. It was a single IP address in Ukrane. But it's query for single A address and it was single spike 200 queries per second (and while this is more interesting because it couldn't be seen from the DSC, but this is the red thing at the bottom.

There's more to do and we will  well, we will work on that more. We want to implement more policies and intersect the the results between the policies so we detect less, well if you need to analyze all the results then you will probably not do it at all. We also want to implement different methods, not just the statistical one, test with more data. We submitted the data of the Internet and what could be interesting to analyze data records because we analyzed just the server but we don't have access for records of revolver. So we will have to get some data somewhere.

And we also would like to use the results so it's not just work for work. And we think it could be used for compute security, incident response team and also monitoring failure and DNS caches and some stuff.

Okay, the hunt has finished. Do you have any questions? If you have any questions about the status, probably don't ask me, read the paper. If you want to have access to the source code or the draft report, send me an email.

AUDIENCE SPEAKER: I just wanted to clarify and understand the method here. You showed these graphs for a few different query names where you were observing spikes. So were those specific query names that you started off looking for or did the algorithm automatically detect that spikes were occurring

SPEAKER: The algorithm did it automatically.

CHAIR: Any further questions? Doesn't look like it. Thank you very much, Andrei.

(Applause.))

CHAIR: This brings us to the end of today's session. Don't forget to return tomorrow morning. I would like to thank the presenters of today. My fellow working group chairs, the NCC staff for AV support, subscribing and watching the Jabber channel. Tomorrow morning we'll have  well, I guess another mixture of vendor information, presentation about NSD 4, some other reports, this time from Japan and from ICANN, again something about anomalies or DNS behavior observed in the wild. And not to forget during the meeting I got angry meetings people looking at the action items and missing the up dates. If you're interested in the action items remaining for this working group, please have a look at the website. They should be updated later this afternoon. A status update was sent to the mailing list and those of you interesting in the working group and not yet subscribed to the mailing list please do so. Go to the working group website, subscribe to the list and contribute to the discussion.

Thanks.