<div dir="ltr"><div>Hi Job,</div><div>Thank you for the detailed investigation and the explanation.</div><div>I will make sure to pass it to my colleagues that are on none of these two mailing and also let them know of your availability as suggested.</div><div>Warm regards,</div><div><br></div><div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>==============================<br>Cedrick Adrien MBEYET<br></div><div>Ebene Cybercity, Mauritius <br></div><div>+230 5851 7674<br><br>+++ Never give up, Keep moving forward +++<br></div></div></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 2, 2023 at 8:42 PM Job Snijders <<a href="mailto:job@fastly.com">job@fastly.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear all,<br>
<br>
I took a look at what might have transpired. It appears there was an<br>
internally-inconsistent RRDP publication. Similar to the RSYNC protocol,<br>
the RRDP protocol does not offer any assurances about internal<br>
consistency. In this message I offer a step-by-step explanation and at<br>
the end of the email I theorize on how this could've happened.<br>
<br>
Impact:<br>
=======<br>
<br>
The problem revolves around a 'top level' manifest [1] which contained<br>
references to files which were not yet available via RRDP. The<br>
K1eJenypZMPIt_e92qek2jSpj4A.mft manifest referencing non-existing files<br>
negatively impacts about 77.33% of ROAs subordinate to the Afrinic trust<br>
anchor. Depending on the RRDP refetch timers of a validator, the impact<br>
may have lasted anywhere between 1 and 60 minutes.<br>
<br>
This impacted all RFC-compliant validators, the event was 'timing<br>
dependent' rather than 'implementation dependent': connecting at the<br>
wrong time caused problems.<br>
<br>
Step by step replay:<br>
====================<br>
<br>
A validator fetching Afrinic's RRDP Notification file at<br>
2023-01-01T03:21:51Z, might have fetched a notification XML file which<br>
contained a listing of deltas up until serial 58617 (in the RRDP session<br>
ID 11218e02-4ae9-4c95-a8fa-49df27f15272).<br>
<br>
<a href="https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58616/snapshot.xml" rel="noreferrer" target="_blank">https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58616/snapshot.xml</a><br>
<a href="https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/delta.xml" rel="noreferrer" target="_blank">https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/delta.xml</a><br>
<br>
The SHA256 hash of "K1eJenypZMPIt_e92qek2jSpj4A.mft" at serial 58616 was<br>
435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493, and<br>
the manifestNumber was 13F1.<br>
<br>
At rrdp_serial 58616 / K1eJenypZMPIt_e92qek2jSpj4A.mft / manifestNumber<br>
13F1, the listed SHA256 hash of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" was<br>
1768a7544c15081ddcd358a78b915a7221f3aee6cebb196a743b89a834364ca4. And<br>
indeed, if one downloads the above mentioned "snapshot.xml" file and<br>
unpacks the RRDP XML one will find a file by that name which matches<br>
that digest. The state at RRDP serial 58616 was internally consistent.<br>
<br>
Now, let's unpack the RRDP Delta which would bring the RRDP session to<br>
58617, the delta file contains 4 <publish/> elements:<br>
<br>
  58617 <a href="http://rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.crl" rel="noreferrer" target="_blank">rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.crl</a> (a4f73c2009f4095970f0f7cb4bb938eb03ff71e35925cd8bca39a64330f935c1 replaces 502d94adf603c4451a912828dfe9d7a46ebf45ec20f901381618fc71323da927)<br>
<br>
  58617 <a href="http://rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft" rel="noreferrer" target="_blank">rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft</a> (e745ccf5741fbe65c2e2b78a74ba3be4a82c9fd5330544e16332e725861f66e5 replaces 435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493)<br>
<br>
  58617 <a href="http://rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.crl" rel="noreferrer" target="_blank">rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.crl</a> (331a8991ca11ccd9bbf30e89e8e35d3b6ee0a18c23cca1289dfcf07bdee3d05f replaces 5a7399b06a692dd76e3b94fa52112c12f483db1499e5c899ff27b57952e48635)<br>
<br>
  58617 <a href="http://rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.mft" rel="noreferrer" target="_blank">rpki.afrinic.net/repository/member_repository/F36D8ADD/99DB6EFC6AC711EBB90AF548F8AEA228/JrOnWLLY0r61xvaBylvZJYx593c.mft</a> (cf22f16de6695f8509a6590f710778cc61a1bbdf1c11ae150dcfff1910032cae replaces 29219ecb0f79922d6f1e5d4b3d4305333d32f33720cf13ae17d84dd2fcdf2ff0)<br>
<br>
Let's focus on K1eJenypZMPIt_e92qek2jSpj4A.mft. The econtent of the<br>
manifest files whose SHA256 digests are<br>
435c65e0f7bc43eaea3234b3ad08b849735c1899c8e218ff2395d37cad720493 and<br>
e745ccf5741fbe65c2e2b78a74ba3be4a82c9fd5330544e16332e725861f66e5 decode<br>
as following:<br>
<br>
K1eJenypZMPIt_e92qek2jSpj4A.mft @ 13F1: <a href="https://sobornost.net/~job/manifest-13F1.txt" rel="noreferrer" target="_blank">https://sobornost.net/~job/manifest-13F1.txt</a><br>
K1eJenypZMPIt_e92qek2jSpj4A.mft @ 13F2: <a href="https://sobornost.net/~job/manifest-13F2.txt" rel="noreferrer" target="_blank">https://sobornost.net/~job/manifest-13F2.txt</a><br>
<br>
Thus, we conclude:<br>
  RRDP serial 58616 contained a manifest with number 13F1<br>
  RRDP serial 58617 contained a manifest with number 13F2<br>
<br>
Both 13F1 and 13F2 are signed by the proper keys, but manifestNumber<br>
13F2 is higher than 13F1; thus 13F2 is the manifest that must be used.<br>
<br>
Manifest 13F2 references a new version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer"<br>
by hash 8aa55347427b75faa64fdfd212ca013957f785e18ce887bbe56d0ae20552e66c,<br>
however, at RRDP serial 58617 the delta XML does *NOT* contain any new<br>
version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer"!<br>
<br>
In fact, an update for "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" only became<br>
visible at a later point in time: at RRDP serial 58618. Looking at<br>
<a href="https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58618/delta.xml" rel="noreferrer" target="_blank">https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58618/delta.xml</a><br>
we finally see a version of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" which<br>
matches the hash on the manifest that was published at serial 58617.<br>
<br>
In other words, AFRINIC published a RRDP delta (and snapshot) which were<br>
cryptographically valid, but internally inconsistent.<br>
<br>
Researchers can see this themselves if they analyse:<br>
<a href="https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/snapshot.xml" rel="noreferrer" target="_blank">https://rrdp.afrinic.net/11218e02-4ae9-4c95-a8fa-49df27f15272/58617/snapshot.xml</a><br>
The version of "K1eJenypZMPIt_e92qek2jSpj4A.mft" inside the 58617<br>
snapshot points to "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" expecting a file<br>
with sha256 message digest <br>
8aa55347427b75faa64fdfd212ca013957f785e18ce887bbe56d0ae20552e66c<br>
but the hash of "vY7ReUeW-s0Fq4qzboGCgYmDQXg.cer" actually is<br>
1768a7544c15081ddcd358a78b915a7221f3aee6cebb196a743b89a834364ca4<br>
<br>
As per RFC 9286 - the above scenario is considered a "publisher error"<br>
or a "substitution attack" (RPs can't know the difference between<br>
publisher errors and attacks); the RP is expected to proceed with the<br>
process described in Section 6.6 of RFC 9286.<br>
<br>
serial 58616 was good<br>
serial 58617 was bad<br>
serial 58618 was good<br>
<br>
While the issue was 'rectified' in the next publication, any clients<br>
that latched on to 58617 might take between 1 and 60 minutes to return<br>
for new data; completely unaware that the contents of the 58617 update<br>
were cryptographically valid, but logically mostly broken.<br>
<br>
How can this happen?<br>
====================<br>
<br>
This type of internal inconsistency could arise from deployment<br>
scenarios in which the RRDP XML files are synthesized from a bare<br>
directory on the filesystem - without additional context about internal<br>
consistency (e.g. when exactly the Signer software has written a<br>
coherent state to the filesystem, and it is safe to transform the files<br>
into RRDP).<br>
<br>
Software like <a href="https://github.com/NLnetLabs/rrdpit" rel="noreferrer" target="_blank">https://github.com/NLnetLabs/rrdpit</a> inherently is unaware<br>
whether the Signer software has finished writing to the filesystem (or<br>
still is 'half way' in the writing process). This means that a tool like<br>
"rrdpit" MUST only be invoked when the signer software is completly<br>
finished.<br>
<br>
Generating RRDP XML files while the Signer software still is 'half way'<br>
done writing; can result in accidentally smearing out what should've<br>
been the contents of a single RRDP XML Delta file, across multiple RRDP<br>
delta files.<br>
<br>
Why am I suspecting that a tool like "rrpdit" is used?<br>
======================================================<br>
<br>
The AfriNIC RRDP snapshots contain unexpected files, such as<br>
"rsync://<a href="http://rpki.afrinic.net/repository/AfriNIC-simple.tal" rel="noreferrer" target="_blank">rpki.afrinic.net/repository/AfriNIC-simple.tal</a>"; the signer<br>
implementations I am aware of would not include .tal files in the RRDP<br>
feed. This leads me to believe that a<br>
non-atomic/fragile-to-inconsistency process is used to convert a<br>
(rsync?) directory to RRDP files.<br>
<br>
Is "rrdpit" bad?<br>
================<br>
<br>
No. It is a very useful utility (I myself have used in it various lab<br>
tests), but needs to be handled with care: the utility is not aware of<br>
internal inconsistencies and cannot compensate for internal<br>
inconsistencies. The "rrdpit" utility is not appropriate for all<br>
deployment scenarios: it probably is best to use the native RRDP<br>
functionality of a Signer!<br>
<br>
How to avoid this?<br>
==================<br>
<br>
If AFRINIC is using the "<a href="http://rpki.net" rel="noreferrer" target="_blank">rpki.net</a>" (or a derivative) signer software,<br>
they might benefit most using the embedded RRDP functionality of the<br>
"<a href="http://rpki.net" rel="noreferrer" target="_blank">rpki.net</a>" software stack.<br>
<br>
If AfriNIC does not want to expose a webserver on the signer machine<br>
itself, they can simply rsync the ready-made RRDP XML files (produced by<br>
"<a href="http://rpki.net" rel="noreferrer" target="_blank">rpki.net</a>") to a webserver; (this approach contrasts with rsyncing the<br>
rsync files and using "rrdpit" - or equivalent tooling).<br>
<br>
Conclusion<br>
==========<br>
<br>
For a brief period of time AFRINIC published a set of RRDP files that<br>
led to an inconsistent stage, resulting in the temporary loss of 77% of<br>
ROAs.<br>
<br>
As I don't know the internals of AFRINIC's setup, so the above could all<br>
be a fitting - but wrong - theory. I am speculating with the public<br>
information available to me.<br>
<br>
I'm available for any questions, or to advise on this matter and review<br>
the current process workflow.<br>
<br>
Kind regards,<br>
<br>
Job<br>
<br>
[1]: <a href="https://console.rpki-client.org/rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft.html" rel="noreferrer" target="_blank">https://console.rpki-client.org/rpki.afrinic.net/repository/afrinic/K1eJenypZMPIt_e92qek2jSpj4A.mft.html</a><br>
<br>
On Sat, Dec 31, 2022 at 07:40:54PM -0800, Randy Bush wrote:<br>
> From: PacketVis <<a href="mailto:notifications@packetvis.com" target="_blank">notifications@packetvis.com</a>><br>
> Subject: bgp ta-malfunction - low severity - PacketVis<br>
> <br>
> Possible TA malfunction: 77.33% of the ROAs disappeared from AFRINIC.<br>
> <br>
> See more details about the event:<br>
> <a href="https://packetvis.com/#/bgp/event/2a35a5824772ae3b651293ec5d9b6367-37572a3c-b445-4075-9741-a419b516ca36/6d742c0ae811df9c41ab427a8ac09e07a93388c7" rel="noreferrer" target="_blank">https://packetvis.com/#/bgp/event/2a35a5824772ae3b651293ec5d9b6367-37572a3c-b445-4075-9741-a419b516ca36/6d742c0ae811df9c41ab427a8ac09e07a93388c7</a><br>
> <br>
> _______________________________________________<br>
> Sidrops mailing list<br>
> <a href="mailto:Sidrops@ietf.org" target="_blank">Sidrops@ietf.org</a><br>
> <a href="https://www.ietf.org/mailman/listinfo/sidrops" rel="noreferrer" target="_blank">https://www.ietf.org/mailman/listinfo/sidrops</a><br>
<br>
_______________________________________________<br>
afnog mailing list<br>
<a href="https://www.afnog.org/mailman/listinfo/afnog" rel="noreferrer" target="_blank">https://www.afnog.org/mailman/listinfo/afnog</a><br>
</blockquote></div>