Preemptive Change Notification for TRRP

One problem with using the DNS to distribute routes is that it can be slow to fail over to an alternate Egress Tunnel Router (ETR) during an outage. Preemptive Change Notification (PCN) changes that dynamic by notifying recent requestors whenever a DNS Route entry changes.

PCN Type 1: Notification of change by the DNS server

The authoritative DNS Route server remembers all IP addresses which requested each Route entry during the entry's TTL. Immediately upon the Route entry's change, the Route Server sends a UDP message to port XX of the requestor. The first byte of the message is 0x01. The remaining bytes are the DNS query which should immediately expire.

If the destination resolver happens to be an ITR and the ITR supports PCN, the ITR should expire the specified DNS name and immediately re-request it. It should rate-limit such expire/re-requests to no more than once every 5 seconds for a given DNS name.

Note that the PCN message does not contain a new map. The ITR must re-request the map entry through the normal DNS lookup process. The ITR can't authenticate this message from the DNS route server, so it must avoid taking any unreasonable action in response.

The DNS query may be a Netmask query. If the ITR does not understand Netmask or Zone Transfer query, it should ignore the report.

Note that at least some of the destinations for the PCN1 message will be plain old DNS resolvers incapable of telling the associated ITR that a change has occured. That ITR must rely on the DNS TTL, host unreachables and PCN type 2 messages to determine that a change has occurred.

PCN Type 2: Notification of change by the ETR

The smart ETR receives a packet for which it is authoritative, but it does not currently know a route to that destination. This can happen when the ETR is still online but the customer's link to the ETR's ISP is down.

Instead of placing the packet on the wire where it will either fall in to another ITR or generate a host unreachable message, the ETR returns it to the ITR. The ETR sends a UDP message to port XX of the ITR. The first byte of the message is 0x02. The remaining bytes are as much of the GRE-encapsulated packet as will fit in 500 bytes.

When the ITR receives such a message, it should verify its validity by sending a GRE-encapsulated ICMP echo-request back through the ETR. If the ITR gets the same result (a PCN type 2 message containing the echo-request), it should disqualify that ETR from acting as a gateway to the destination IP address, and expire and relookup the cached routes. This expire/relookup should function like the "host unreachable" condition described for a regular TRRP ITR except that only the route associated with the particular destination IP address should be affected. All other cached routes which use that ETR should remain in full force.

Note that dumb GRE ETRs will not be cabaple of generating this message. Only next-generation ETRs can do this.

PCN Type 3: Notification of extended validity by the DNS server.

The authoritative DNS Route server remembers all IP addresses which requested each Route entry during the entry's TTL. No less than 2 seconds before the TTL would expire for an entry which was sent, the DNS server sends a UDP message to port XX of the requestor. The first byte of the message is 0x03. The next 4 bytes are an unsigned integer in network byte order which specifies the number of seconds since the route entry was last changed. The remaining bytes are the DNS route entry which whose TTL should be refreshed to its maximum value.

If the destination resolver happens to be an ITR and the ITR supports PCN, the ITR should start the TTL clock over from the originally received TTL time for the specified DNS name.

Note that the PCN message does not contain a new TTL. The ITR should use the TTL it received during the original lookup. The ITR can't authenticate this message from the DNS route server, so it must avoid taking any unreasonable action in response.

The the DNS Route server receives a destination unreachable message in response to an extended validity message, it should cease sending further PCN messages to that destination for 10 minutes and should "forget" about any route entries recently requested by that IP address.

The point of PCN3 is to allow the DNS server to keep feeding knowledge to the ITR about short-lived route entries (such as those for a mobile device) that haven't yet changed. Where used, this feed should cut the traffic overhead necessary to keep a mapping in the ITR's cache in half and should likewise reduce the traffic consumed by the DNS route server.

 

 

For future improvement:

How can the ITR specify in the DNS query that it supports PCN? If the ITR can declare its support then the DNS server can offer a longer TTL automatically.