Want to try Warp? We just enabled the beta for you

November 22, 2017, 6:00 pm

≪ Previous: Releasing AddThis on Cloudflare Apps: Making Disciplined Product Design Decisions

Tomorrow is Thanksgiving in the United States. It’s a holiday for getting together with family characterized by turkey dinner and whatever it is that happens in American football. While celebrating with family is great, if you use a computer for your main line of work, sometimes the conversation turns to how to setup the home wifi or can Russia really use Facebook to hack the US election. Just in case you’re a geek who finds yourself in that position this week, we wanted to give you something to play with. To that end, we’re opening the Warp beta to all Cloudflare users. Feel free to tell your family there’s been an important technical development you need to attend to immediately and enjoy!

Hello Warp! Getting Started

Warp allows you to expose a locally running web server to the internet without having to open up ports in the firewall or even needing a public IP address. Warp connects a web server directly to the Cloudflare network where Cloudflare acts as your web server’s network gateway. Every request reaching your origin must travel to the Cloudflare network where you can apply rate limits, access policies and authentication before the request hits your origin. Plus, because your origin is never exposed directly to the internet, attackers can’t bypass protections to reach your origin.

Warp is really easy to get started with. If you use homebrew (we also have packages for Linux and Windows) you can do:

$ brew install cloudflare/cloudflare/warp
$ cloudflare-warp login
$ cloudflare-warp --hostname warp.example.com --hello-world

In this example, replace example.com with the domain you chose at the login command. The warp.example.com subdomain doesn’t need to exist yet in DNS, Warp will automatically add it for you.

That last command spins up a web server on your machine serving the hello ~~warp~~ world webpage. Then Warp starts up an encrypted virtual tunnel from that web server to the Cloudflare edge. When you visit warp.example.com (or whatever domain you chose), your request first hits a Cloudflare data center, then is routed back to your locally running hello world web server on your machine.

If someone far away visits warp.example.com, they connect to the Cloudflare data center closest to them, and then are routed to the Cloudflare data center your Warp instance is connected to, and then over the Warp tunnel back to your web server. If you want to make that connection between Cloudflare data centers really fast, enable Argo, which bypasses internet latencies and network congestions on optimized routes linking the Cloudflare data centers.

To point Warp at a real web server you are running instead of the hello world web server, replace the hello-world flag with the location of your locally running server:

$ cloudflare-warp --hostname warp.example.com http://localhost:8080

Using Warp for Load Balancing

Let’s say you have multiple instances of your application running and you want to balance load between them or always route to the closest one for any given visitor. As you spin up Warp, you can register the origins behind Warp to a load balancer. For example, I can run this on 2 different servers (e.g. one on a container in ECS and one on a container in GKE):

$ cloudflare-warp --hostname warp.example.com --lb-pool origin-pool-1 http://localhost:8080

And connections to warp.example.com will be routed seamlessly between the two servers. You can do this with an existing origin pool or a brand new one. If you visit the load balancing dashboard you will see the new pool created with your origins in it, or the origins added to an existing pool.

You can also set up a health check so that if one goes offline, it automatically gets deregistered from the load balancer pool and requests are only routed to the online pools.

Automating Warp with Docker

You can add Warp to your Dockerfile so that as containers spin up or as you autoscale, containers automatically register themselves with Warp to connect to Cloudflare. This acts as a kind of service discovery.

A reference Dockerfile is available here.

Requiring User Authentication

If you use Warp to expose dashboards, staging sites and other internal tools to the internet that you don’t want to be available for everyone, we have a new product in beta that allows you to quickly put up a login page in front of your Warp tunnel.

To get started, go to the Access tab in the Cloudflare dashboard.

There you can define which users should be able to login to use your applications. For example, if I wanted to limit access to warp.example.com to just people who work at Cloudflare, I can do:

Enjoy!

Enjoy the Warp beta! (But don't wander too deep into the Warp tunnel and forget to enjoy time with your family.) The whole Warp team is following this thread for comments, ideas, feedback and show and tell. We’re excited to see what you build.

↧

The New DDoS Landscape

November 22, 2017, 7:28 pm

≫ Next: Cloudflare Apps Platform Update: November Edition

≪ Previous: Want to try Warp? We just enabled the beta for you

News outlets and blogs will frequently compare DDoS attacks by the volume of traffic that a victim receives. Surely this makes some sense, right? The greater the volume of traffic a victim receives, the harder to mitigate an attack - right?

At least, this is how things used to work. An attacker would gain capacity and then use that capacity to launch an attack. With enough capacity, an attack would overwhelm the victim's network hardware with junk traffic such that they can no longer serve legitimate requests. If your web traffic is served by a server with a 100 Gbps port and someone sends you 200 Gbps, your network will be saturated and the website will be unavailable.

Recently, this dynamic has shifted as attackers have gotten far more sophisticated. The practical realities of the modern Internet have increased the amount of effort required to clog up the network capacity of a DDoS victim - attackers have noticed this and are now choosing to perform attacks higher up the network stack.

In recent months, Cloudflare has seen a dramatic reduction in simple attempts to flood our network with junk traffic. Whilst we continue to see large network level attacks, in excess of 300 and 400 Gbps, network level attacks in general have become far less common (the largest recent attack was just over 0.5 Tbps). This has been especially true since the end of September when we made official a policy that would not remove any customers from our network merely for receiving a DDoS attack that's too big, including those on our free plan.

Far from attackers simply closing shop, we see a trend whereby attackers are moving to more advanced application-layer attack strategies. This trend is not only seen in metrics from our automated attack mitigation systems, but has also been the experience of our frontline customer support engineers. Whilst we continue to see very large network level attacks, note that they are occurring less frequently since the introduction of Unmetered Mitigation:

The New DDoS Landscape

To understand how the landscape has made such a dramatic shift, we must first understand how DDoS attacks are performed.

Performing a DDoS

From presentation by @IcyApril - First thing you absolutely need for a successful DDoS - is a cool costume. pic.twitter.com/WIC0LjF4ka
— majek04 (@majek04) November 22, 2017

The first thing you need before you can carry out a DDoS Attack is capacity. You need the network resources to be able to overwhelm your victim.

To build up capacity, attackers have a few mechanisms at their disposal; three such examples are Botnets, IoT Devices and DNS Amplification:

Botnets

Computer viruses are deployed for multiple reasons, for example; they can harvest private information from users or be blackmail users into paying them money to get their precious files back. Another utility of computer viruses is building capacity to perform DDoS Attacks.

A Botnet is a network of infected computers that are centrally controlled by an attacker; these zombie computers can then be used to send spam emails or perform DDoS attacks.

Consumers have access to faster Internet than ever before. In November 2016, the average UK broadband upload speed reached 4.3Mbps - this means a Botnet which has infected a little under 2,400 computers can launch an attack of around 10Gbps. Such capacity is plenty enough to saturate the end networks that power most websites online.

On August 17th, 2017, multiple networks online were subject to significant attacks from a botnet known as WireX. Researchers from a variety of organisations, including from Akamai, Cloudflare, Flashpoint, Google, Oracle Dyn, RiskIQ, Team Cymru, and other organizations cooperated to combat this botnet - eventually leading to hundreds of Android apps being removed and a process started to remove the malware-ridden apps from all devices.

IoT Devices

More and more of our everyday appliances are being embedded with Internet connectivity. Like other types of technology, they can be taken over with malware and controlled to launch large-scale DDoS attacks.

Towards the end of last year, we began to see Internet-connected cameras start to launch large DDoS attacks. The use of video cameras was advantageous to attackers in the sense they needed to be connected to networks with enough bandwidth to be capable of streaming video.

Mirai was one such botnet which targeted Internet-connected cameras and Internet routers. It would start by logging into the web dashboard of a device using a table of 60 default usernames and passwords, then installing malware on the device.

Where users set passwords instead of them merely being the default, other pieces malware can use Dictionary Attacks to repeatedly guess simple user-configured passwords, using a list of common passwords like the one shown below. I have self-censored some of the passwords, apparently users can be in a fairly angry state-of-mind when setting passwords:

The New DDoS Landscape

Passwords aside, back in May, I blogged specifically about some examples of security risks we are starting to see which are specific to IoT devices: IoT Security Anti-Patterns.

DNS Amplification

DNS is the phonebook of the Internet; in order to reach this site, your local computer used DNS to look up which IP Address would serve traffic for blog.cloudflare.com. I can perform this DNS queries from my command line using dig A blog.cloudflare.com:

The New DDoS Landscape

Firstly notice that the response is pretty big, it's certainly bigger than the question we asked.

DNS is built on a Transport Protocol called UDP, when using UDP it's easy to forge the requester of a query as UDP doesn't require a handshake before sending a response.

Due to these two factors, someone is able to make a DNS query on behalf of someone else. We can make a request for a relatively small DNS query, which will will then result in a long response being sent somewhere else.

Online there are open DNS resolvers that will take a request from anyone online and send the response to anyone else. In most cases we should not be exposing open DNS resolvers to the internet (most are open due to configuration mistakes). However, when intentionally exposing DNS resolvers online, security steps should be taken - StrongArm have a primer on securing open DNS resolvers on their blog.

Let's use a hypothetical to illustrate this point. Imagine that you wrote to a mail order retailer requesting a catalogue (if you still know what one is). You'd send a relatively short postcard with your contact information and your request - you'd then get back quite a big catalog. Now imagine you did the same, but sent hundreds of these postcards and instead included the address of someone else. Assuming the retailer was obliging in sending such a vast amount of catalogues, your friend could wake up one day with their front door blocked with catalogues.

In 2013, we blogged about how one DNS Amplification attack we faced almost broke the internet; however, in recent times aside from one exceptionally high attack (occurring just after we launched Unmetered DDoS Mitigation), DNS Amplification attacks have generally been a low proportion of attacks we see recently:

The New DDoS Landscape

Whilst we're seeing fewer of these attacks, you can find a more detailed overview on our learning centre: DNS Amplification Attack

DDoS Mitigation: The Old Priorities

Using the capacity an attacker has built up, they can send junk traffic to a web property. This is referred to as a Layer 3/4 attack. This kind of attack primarily seeks to block up the network capacity of the victim network.

Above all, mitigating these attacks requires capacity. If you get an attack of 600 Gbps and you only have 10 Gbps of capacity you either need to pay an intermediary network to filter traffic for you or have your network go offline due to the force of the attack.

As a network, Cloudflare works by passing a customer's traffic through our network; in doing so, we are able to apply performance optimisations and security filtering to the traffic we see. One such security filter is removing junk traffic associated with Layer 3/4 DDoS attacks.

Cloudflare's network was built when large scale DDoS Attacks were becoming a reality. Huge network capacity, spread out over the world in many different data centres makes it easier to absorb large attacks. We currently have over 15 Tbps and this figure is always growing fast.

The New DDoS Landscape

Preventing DDoS attacks needs a little more sophistication than just capacity though. While traditional Content Delivery Networks are built using Unicast technology, Cloudflare's network is built using an Anycast design.

In essence, this means that network traffic is routed to the nearest available Point-of-Presence and it is not possible for an attacker to override this routing behaviour - the routing is effectively performed using BGP, the routing protocol of the Internet.

Unicast networks will frequently use technology like DNS to steer traffic to close data centres. This routing can easily be overridden by an attacker, allowing them to force attack traffic to a single data centre. This is not possible with Cloudflare's Anycast network; meaning we maintain full control of how our traffic is routed (providing intermediary ISPs respect our routes). With this network design, have the ability to rapidly update routing decisions even against ISPs which ordinarily do not respect cache expiration times for DNS records (TTLs).

Cloudflare's network also maintains an Open Peering Policy; we are open to interconnecting our network with any other network without cost. This means we tend to eliminate intermediary networks across our network. When we are under attack, we usually have a very short network path from the attacker to us - this means there are no intermediary networks which can suffer collateral damage.

The New Landscape

I started this blog post with a chart which demonstrates the frequency of a type of network-layer attack known as a SYN Flood against the Cloudflare network. You'll notice how the largest attacks are further spaced out over the past few months:

The New DDoS Landscape

You can also see that this trend does not follow when compared to a graph of Application Layer DDoS attacks which we continue to see coming in:

The New DDoS Landscape

The chart above has an important caveat, an Application Layer attack is defined by what we determine an attack is. Application Layer (Layer 7) attacks are more indistinguishable from real traffic than Layer 3/4 attacks. The attacks effectively resemble normal web requests, instead of junk traffic.

Attackers can order their Botnets to perform attacks against websites using "Headless Browsers" which have no user interface. Such Headless Browsers work exactly like normal browsers, except that they are controlled programmatically instead of being controlled via a window on a user's screen.

Botnets can use Headless Browsers to effectively make HTTP requests that load and behave just like ordinary web requests. As this can be done programmatically, they can order bots to repeat these HTTP requests rapidly - effectively taking up the entire capacity of a website, taking it offline for ordinary visitors.

This is a non-trivial problem to solve. At Cloudflare, we have specific services like Gatebot which identify DDoS attacks by picking up on anomalies in network traffic. We have tooling like "I'm Under Attack Mode" to analyse traffic to ensure the visitor is human. This is, however, only part of the story.

A $20/month server running a resource intense e-commerce platform may not be able to cope with any more than a dozen concurrent HTTP requests before being unable to serve any more traffic.

An attack which can take down a small e-commerce site will likely not even be a drop in the ocean for Cloudflare's network, which sees around 10% of Internet requests online.

The chart below outlines DDoS attacks per day against Cloudflare customers; but it is important to bear in mind that this includes what we define as an attack. In recent times, Cloudflare has built specific products to help customers define what they think an attack looks like and how much traffic they feel they should cope with.

The New DDoS Landscape

A Web Developers Guide to Defeating an Application Layer DDoS Attack

One of the reasons why Application Layer DDoS attacks are so attractive is due to the uneven balance between the relative computational overhead required to for someone to request a web page, and the computational difficulty in serving one. Serving a dynamic website requires all kinds of operations; fetching information from a database, firing off API requests to separate services, rendering a page, writing log lines and potentially even pushing data down a message queue.

Fundamentally, there are two ways of dealing with this problem:

making the balance between requester and server, less asymmetric by making it easier to serve web requests
limiting requests which are in such excess, they are blatantly abusive

It remains critical you have a high-capacity DDoS mitigation network in front of your web application; one of the reasons why Application Layer attacks are increasingly attractive to attackers because networks have gotten good at mitigating volumetric attacks at the network layer.

Cloudflare has found that whilst performing Application Layer attacks, attackers will sometimes pick cryptographic ciphers that are the hardest for servers to calculate and are not usually accelerated in hardware. What this means is the attackers will try and consume more of your servers resources by actually using the fact you offer encrypted HTTPS connections against you. Having a proxy in front of your web traffic has the added benefit of ensuring that it has to establish a brand new secure connection to your origin web server - effectively meaning you don't have to worry about Presentation Layer attacks.

Additionally, offloading services which don't have custom application logic (like DNS) to managed providers can help ensure you have less surface area to worry about at the Application Layer.

Aggressive Caching

One of the ways to make it easier to serve web requests is to use some form of caching. There are multiple forms of caching; however, here I'm going to be talking about how you enable caching for HTTP requests.

Suppose you're using a CMS (Content Management System) to update your blog, the vast majority of visitors to your blog will see the identical page to every other visitor. It will only be the when an anonymous visitor logs in or leaves a comment that they will see a page that's dynamic and unlike every other page that's been rendered.

Despite the vast majority of HTTP requests to specific URLs being identical, your CMS has to regenerate the page for every single request as if it was brand new. Application Layer DDoS attacks exploit this to amplification to make their attacks more brutal.

Caching proxies like NGINX and services like Cloudflare allow you to specify that until a user has a browser cookie that de-anonymises them, content can be served from cache. Alongside performance benefits, these configuration changes can prevent the most crude Application Layer DDoS Attacks.

For further information on this, you can consult NGINX guide to caching or alternatively see my blog post on caching anonymous page views:

Rate Limiting

Caching isn't enough; non-idempotent HTTP requests like POST, PUT and DELETE are not safe to cache - as such making these requests can bypass caching efforts used to prevent Application Layer DDoS attacks. Additionally, attackers can attempt to vary URLs to bypass advanced caching behaviour.

Software exists for web servers to be able to perform rate limiting before anything hits dynamic logic; examples of such tools include Fail2Ban and Apache mod_ratelimit.

If you do rate limiting on your server itself, be sure to configure your edge network to cache the rate limit block pages, such that attackers cannot perform Application Layer attackers once blocked. This can be done by caching responses with a 429 status code against a Custom Cache Key based on the IP Address.

Services like Cloudflare offer Rate Limiting at their edge network; for example Cloudflare Rate Limiting.

Conclusion

As the capacity of networks like Cloudflare continue to grow, attackers move from attempting DDoS attacks at the network layer to performing DDoS attacks targeted at applications themselves.

For applications to be resilient to DDoS attacks, it is no longer enough to use a large network. A large network must be complemented with tooling that is able to filter malicious Application Layer attack traffic, even when attackers are able to make such attacks look near-legitimate.

↧

Cloudflare Apps Platform Update: November Edition

November 28, 2017, 10:00 am

≫ Next: What I learned at my first Cloudflare Retreat

≪ Previous: The New DDoS Landscape

Since our last newsletter, dozens of developers like you have reached out with ideas for new kinds of apps that weren’t yet possible. These are some of my favorite conversations because they help us find out which features should be prioritized. With your guidance, we’ve spent this month meticulously converting our supply of Halloween candy into those ideas. Let’s dive in and see what’s new!

💸 Paid App Product Enhancements

We’ve made it easier to upsell premium features with product specific options. Customers can try out exclusive features before making a purchase, on any site, even without Cloudflare account! Here’s an example of Lead Box using product specific radio buttons:

In this example, a customer can choose to see the newsletter option after choosing the "Pro" plan. Developers can now update the Live Preview in response to this choice. We’ve added new "_product" keyword for this event. Here’s a snippet on how Lead Box handles a customer changing products without refreshing the page:

{
  "preview": {
    "handlers": [
      {
        "options": ["_default"],
        "execute": "INSTALL_SCOPE.setOptions(INSTALL_OPTIONS)"
      },
      {
        "options": ["_product"],
        "execute": "INSTALL_SCOPE.setProduct(INSTALL_PRODUCT)"
      }
    ]
  }
}

let options = INSTALL_OPTIONS
let product = INSTALL_PRODUCT

function renderApp () {/*...*/}

window.INSTALL_SCOPE = {
  setOptions (nextOptions) {
    options = nextOptions
    renderApp()
  },
  setProduct (nextProduct) {
    product = nextProduct
    renderApp()
  }
}

🗣 Comments & Ratings

Our previous newsletter included two of our most requested features: customer feedback, and install metrics. Together these features have helped developers reach out to their customers and track down issues. Customers can now share their feedback publically with comments and ratings:

Comments for previous releases are initially hidden to emphasize the most recent feedback. As customers send in new feedback, previous ratings will have less of an impact on your app’s sentiment. Apps that score well with customers will gradually increase your visibility in the public listing as well!

🚦 Mananging DNS via Apps

We’ve saved the best for last! App developers can now manage a customer’s DNS records. The simplest way to define a DNS record is directly in your app’s install.json file. This for example, would allow a customer to create a CNAME to send traffic to your domain, and insert a A record on their root domain:

{
  "resources": [/*...*/],
  "hooks": [/*...*/],
  "options": {
    "properties": {
      "subdomain": {
        "type": "string"
      }
    }
  },
  "dns": [
    {
      "type": "CNAME",
      "name": "{{subdomain}}",
      "content": "shops.myservice.com"
    },
    {
      "type": "A",
      "content": "1.2.3.4",
      "ttl": 60,
      "proxied": true
    }
  ]
}

The customer can then confirm your changes after before completing their installation.

Requesting permission to access a customer’s email address and DNS entries.

DNS records make it possible to add new records to a customer’s account for your email services, blogging platforms, customer management systems, and much, much more!

⚙ Other Platform Improvements

We’ve made hundreds of changes since our last newsletter, some more visible than others. Here’s a quick recap of some our favorites:

New Cloudflare customers are onboarded with apps after registration
Updated docs on “item add” event
Developers can now optionally link to their public GitHub repository
A new input type: Numbers with units!

Thank you 🦃

In the spirit of Thanksgiving, we raise a gravy boat to everyone who made this winter a little warmer. To the all the Cloudflarians and developers who sent in feedback, we say thank you!

Reach out at @CloudflareApps and let us know what you’d like to see next!!!

Until next time! ⛄️️

— Teffen

↧

What I learned at my first Cloudflare Retreat

November 29, 2017, 9:31 am

≫ Next: Introducing the Cloudflare Warp Ingress Controller for Kubernetes

≪ Previous: Cloudflare Apps Platform Update: November Edition

What I learned at my first Cloudflare Retreat
For the last seven years, Cloudflare has taken the entire company off site for a few days at the end of the year for a company retreat. Back in 2010, this meant five people from the San Francisco office. This November, we had 453 employees from our San Francisco, Singapore, London, Champaign (Illinois), New York City, Washington (DC), and Austin (Texas) offices spend time together in Monterey, California.

Knowing that so many teammates would be coming in from all over the world, we used the days leading up to the retreat to hold global team meetings, conduct a session of our home-grown Making Great Managers workshop, and brought in Valerie Aurora from Frame Shift Consulting to lead Ally Skills workshops for the entire company.

On Thursday, buses departed from Cloudflare headquarters and took us all down to Monterey. Our CEO, Matthew Prince, delivered opening remarks over lunch. During his talk, we learned about the imminent acquisition of Neumob, his thoughts about growing pains and how to successfully scale, and were reminded that we are at our best when we are inclusive of everyone. We reflected on how far we’ve come and got an inspiring glimpse of where we are headed. I think we were all amazed to see how big the company had become when we are all gathered together in one place.

What I learned at my first Cloudflare Retreat

We spent the next few hours focused on our professional development with a few Harvard Business School professors. (Our founders, Matthew Prince and Michelle Zatlyn, met at HBS and actually started Cloudflare as a class project!). Four professors from Harvard Business School led the group through case studies around negotiation skills, using jazz as a metaphor for creative and innovative organizations, and successful business models.

Michelle brought the HBS professors on stage at dinner, and we heard some unique tidbits on the company’s history and early days. We learned that Matthew had a ton of business ideas in school but it wasn’t until he pitched Cloudflare to Michelle that she felt it was “a business she’d be proud to work on.” We also learned Professor Tom Eisenmann became their advisor while on a class trip to Silicon Valley in January 2009. Matthew and Michelle cornered him at the bar in the Sheraton in Palo Alto and wrote the business plan on the back of a napkin. Tom already had a full roster of students to advised but agreed to help them on one condition: they had to enter HBS’ business plan competition. Well, they did, and as Tom predicted, they won!

While we ate, Shawn Vanderhoven from The Wiseman Group led us in a discussion around leaders who amplify their teams to ultimate performance. Festivities continued late into the night with groups huddling around fire pits, enjoying a cool evening on terraces and conversing at the hotel bar.

After breakfast the next morning, the group was given four options for activities: biking, hiking, kayaking or visiting the Monterey Bay Aquarium. It was such a treat to be outside doing something active and interesting with coworkers - many from other teams and other offices.

What I learned at my first Cloudflare Retreat

Three weeks later, I'm still buzzing. The retreat fed the mind, but also the soul. The insights learned were super valuable, and the tools we were given will absolutely help us operate at a higher level. But I also really appreciate that we prioritized connecting with each other. That's often undervalued (if not completely overlooked), especially in companies at a similar size and stage.

It's easy to see why this is case. Many companies at our scale might view the activities as frivolous; that they're just too big to make something like this happen; that they can't keep operations running smoothly to have the entire company away at the same time or for that long. And while I hear these concerns, I'm grateful we were able to figure it out. Because you can't put a price on strengthening the ties that bind an organization.

That isn't to say that there wasn't a fair amount of pre-planning involved. Our SRE and Customer Support teams all took shifts to ensure our network operations were running smoothly and that our customers received the same high level of support they are used to. It took coordination, planning, and great work ethics but we proved this type of event is quite achievable.

When Matthew talks about what he looks for when hiring people, he looks for curiosity and empathy. The company retreat captured that—we were able to get away from our ‘regular’ jobs and learn a few new things. And by having these shared experiences with our coworkers, we now know each other better and are more empathic. This retreat was a great way to realize those values.

The goal wasn't really to "disconnect”; the train still needs to keep moving after all. But the change in scenery (and its beauty) and putting emphasis on shared values, where we came from and where we're headed made it possible to connect in meaningful ways.

We are already planning for next year, and have a few ideas on how to make it even better. Since we are an engineering-driven company, we’ll be sure to have some thought leadership activity around how to solve the biggest problems facing the Internet. We are thinking we’ll have everyone answer at least one support ticket before they leave :) We loved the beautiful setting in Monterey - and may even make it longer! I simply cannot wait!!

If this sounds like somewhere you’d want to want to work, check out our jobs page. We are hiring in all of our offices around the world.

↧

Introducing the Cloudflare Warp Ingress Controller for Kubernetes

December 5, 2017, 6:00 am

≫ Next: Make SSL boring again

≪ Previous: What I learned at my first Cloudflare Retreat

Introducing the Cloudflare Warp Ingress Controller for Kubernetes

It’s ironic that the one thing most programmers would really rather not have to spend time dealing with is... a computer. When you write code it’s written in your head, transferred to a screen with your fingers and then it has to be run. On. A. Computer. Ugh.

Of course, code has to be run and typed on a computer so programmers spend hours configuring and optimizing shells, window managers, editors, build systems, IDEs, compilation times and more so they can minimize the friction all those things introduce. Optimizing your editor’s macros, fonts or colors is a battle to find the most efficient path to go from idea to running code.

Introducing the Cloudflare Warp Ingress Controller for Kubernetes CC BY 2.0 image by Yutaka Tsutano

Once the developer is master of their own universe they can write code at the speed of their mind. But when it comes to putting their code into production (which necessarily requires running their programs on machines that they don’t control) things inevitably go wrong. Production machines are never the same as developer machines.

If you’re not a developer, here’s an analogy. Imagine carefully writing an essay on a subject dear to your heart and then publishing it only to be told “unfortunately, the word ‘the’ is not available in the version of English the publisher uses and so your essay is unreadable”. That’s the sort of problem developers face when putting their code into production.

Over time different technologies have tried to deal with this problem: dual booting, different sorts of isolation (e.g. virtualenv, chroot), totally static binaries, virtual machines running on a developer desktop, elastic computing resources in clouds, and more recently containers.

Ultimately, using containers is all about a developer being able to say “it ran on my machine” and be sure that it’ll run in production, because fighting incompatibilities between operating systems, libraries and runtimes that differ from development to production is a waste of time (in particular developer brain time).

Introducing the Cloudflare Warp Ingress Controller for Kubernetes CC BY 2.0 image by Jumilla

In parallel, the rise of microservices is also a push to optimize developer brain time. The reality is that we all have limited brain power and ability to comprehend the complex systems that we build in their entirety and so we break them down into small parts that we can understand and test: functions, modules and services.

A microservice with a well-defined API and related tests running in a container is the ultimate developer fantasy. An entire program, known to operate correctly, that runs on their machine and in production.

Of course, no silver lining is without its cloud and containers beget a coordination problem: how do all these little programs find each other, scale, handle failure, log messages, communicate and remain secure. The answer, of course, is a coordination system like Kubernetes.

Kubernetes completes the developer fantasy by allowing them to write and deploy a service and have it take part in a whole.

Sadly, these little programs have one last hurdle before they turn into useful Internet services: they have to be connected to the brutish outside world. Services must be safely and scalably exposed to the Internet.

Recently, Cloudflare introduced a new service that can be used to connect a web server to Cloudflare without needing to have a public IP address for it. That service, Cloudflare Warp, maintains a connection from the server into the Cloudflare network. The server is then only exposed to the Internet through Cloudflare with no way for attackers to reach the server directly.

That means that any connection to it is protected and accelerated by Cloudflare’s service.

Cloudflare Warp Ingress Controller and StackPointCloud

Today, we are extending Warp’s reach by announcing the Cloudflare Warp Ingress Controller for Kubernetes (it’s an open source project and can be found here). We worked closely with the team at StackPointCloud to integrate Warp, Kubernetes and their universal control plane for Kubernetes.

Introducing the Cloudflare Warp Ingress Controller for Kubernetes

Within Kubernetes creating an ingress with annotation kubernetes.io/ingress.class: cloudflare-warp will automatically create secure Warp tunnels to Cloudflare for any service using that ingress. The entire lifecycle of tunnels is transparently managed by the ingress controller making exposing Kubernetes-managed services securely via Cloudflare Warp trivially easy.

The Warp Ingress Controller is responsible for finding Warp-enabled services and registering them with Cloudflare using the hostname(s) specified in the Ingress resource. It is added to a Kubernetes cluster by creating a file called warp-controller.yaml with the content below:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: null
  generation: 1
  labels:
    run: warp-controller
  name: warp-controller
spec:
  replicas: 1
  selector:
    matchLabels:
      run: warp-controller
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        run: warp-controller
    spec:
      containers:
      - command:
        - /warp-controller
        - -v=6
        image: quay.io/stackpoint/warp-controller:beta
        imagePullPolicy: Always
        name: warp-controller
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - name: cloudflare-warp-cert
          mountPath: /etc/cloudflare-warp
          readOnly: true
      volumes:
        - name: cloudflare-warp-cert
          secret:
            secretName: cloudflare-warp-cert
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

The full documentation is here and shows how to get up and running with Kubernetes and Cloudflare Warp on StackPointCloud, Google GKE, Amazon EKS or even minikube.

One Click with StackPointCloud

Within StackPointCloud adding the Cloudflare Warp Ingress Controller requires just a single click. And one more click and you've deployed a Kubernetes cluster.

The connection between the Kubernetes cluster and Cloudflare is made using a TLS tunnel ensuring that all communication between the cluster and the outside world is secure.

Once connected the cluster and its services then benefit from Cloudflare's DDoS protection, WAF, global load balancing and health checks and huge global network.

The combination of Kubernetes and Cloudflare makes managing, scaling, accelerating and protecting Internet facing services simple and fast.

↧

Make SSL boring again

December 6, 2017, 6:00 am

≫ Next: CAA of the Wild: Supporting a New Standard

≪ Previous: Introducing the Cloudflare Warp Ingress Controller for Kubernetes

It may (or may not!) come as surprise, but a few months ago we migrated Cloudflare’s edge SSL connection termination stack to use BoringSSL: Google's crypto and SSL implementation that started as a fork of OpenSSL.

CTO tweet

We dedicated several months of work to make this happen without negative impact on customer traffic. We had a few bumps along the way, and had to overcome some challenges, but we ended up in a better place than we were in a few months ago.

TLS 1.3

We have already blogged extensively about TLS 1.3. Our original TLS 1.3 stack required our main SSL termination software (which was based on OpenSSL) to hand off TCP connections to a separate system based on our fork of Go's crypto/tls standard library, which was specifically developed to only handle TLS 1.3 connections. This proved handy as an experiment that we could roll out to our client base in relative safety.

However, over time, this separate system started to make our lives more complicated: most of our SSL-related business logic needed to be duplicated in the new system, which caused a few subtle bugs to pop up, and made it harder to roll-out new features such as Client Auth to all our clients.

As it happens, BoringSSL has supported TLS 1.3 for quite a long time (it was one the first open source SSL implementations to work on this feature), so now all of our edge SSL traffic (including TLS 1.3 connections) is handled by the same system, with no duplication, no added complexity, and no increased latency. Yay!

Fancy new crypto, part 1: X25519 for TLS 1.2 (and earlier)

When establishing an SSL connection, client and server will negotiate connection-specific secret keys that will then be used to encrypt the application traffic. There are a few different methods for doing this, the most popular one being ECDH (Elliptic Curve Diffie–Hellman). Long story short this depends on an elliptic curve being negotiated between client and server.

For the longest time the only widely supported curves available were the ones defined by NIST, until Daniel J. Bernstein proposed Curve25519 (X25519 is the mechanism used for ECDH based on Curve25519), which has quickly gained popularity and is now the default choice of many popular browsers (including Chrome).

This was already supported for TLS 1.3 connections, and with BoringSSL we are now able to support key negotiation based on X25519 at our edge for TLS 1.2 (and earlier) connections as well.

X25519 is now the second most popular elliptic curve algorithm that is being used on our network:

Elliptic curves usage

Fancy new crypto, part 2: RSA-PSS for TLS 1.2

Another one of the changes introduced by TLS 1.3 is the adoption of the PSS padding scheme for RSA signatures (RSASSA-PSS). This replaces the more fragile, and historically prone to security vulnerabilities, RSASSA-PKCS1-v1.5, for all TLS 1.3 connections.

RSA PKCS#v1.5 has been known to be vulnerable to known ciphertext attacks since Bleichenbacher’s CRYPTO 98 paper which showed SSL/TLS to be vulnerable to this kind of attacks as well.

The attacker exploits an “oracle”, in this case a TLS server that allows them to determine whether a given ciphertext has been correctly padded under the rules of PKCS1-v1.5 or not. For example, if the server returns a different error for correct padding vs. incorrect padding, that information can be used as an oracle (this is how Bleichenbacher broke SSLv3 in 1998). If incorrect padding causes the handshake to take a measurably different amount of time compared to correct padding, that’s called a timing oracle.

If an attacker has access to an oracle, it can take as little as 15,000 messages to gain enough information to perform an RSA secret-key operation without possessing the secret key. This is enough for the attacker to either decrypt a ciphertext encrypted with RSA, or to forge a signature. Forging a signature allows the attacker to hijack TLS connections, and decrypting a ciphertext allows the attacker to decrypt any connection that do not use forward secrecy.

Since then, SSL/TLS implementations have adopted mitigations to prevent these attacks, but they are tricky to get right, as the recently published F5 vulnerability shows.

With the switch to BoringSSL we made RSA PSS available to TLS 1.2 connections as well. This is already supported "in the wild", and is the preferred scheme by modern browsers like Chrome when dealing with RSA server certificates.

The dark side of the moon

Besides all these new exciting features that we are now offering to all our clients, BoringSSL also has a few internal features that end users won't notice, but that made our life so much easier.

Some of our SSL features required special patches that we maintained in our internal OpenSSL fork, however BoringSSL provides replacements for these features (and more!) out of the box.

Some examples include its private key callback support that we now use to implement Keyless SSL, its asynchronous session lookup callback that we use to support distributed session ID caches (for session resumption with clients that, for whatever reason, don't support session tickets), its equal-preference cipher grouping that allows us to offer ChaCha20-Poly1305 ciphers alongside AES GCM ones and let clients decide which they prefer, or its "select_certificate" callback that we use for inspecting and logging ClientHellos, and for dynamically enabling features depending on the user’s configuration (we were previously using the “cert_cb” callback for the latter, which is also supported by OpenSSL, but we ran into some limitations like the fact that you can’t dynamically change the supported protocol versions with it, or the fact that it is not executed during session resumption).

The case of the missing OCSP

Apart from adding new features, the BoringSSL developers have also been busy working on removing features that most people don't care about, to make the codebase lighter and easier to maintain. For the most part this worked out very well: a huge amount of code has been removed from BoringSSL without anyone noticing.

However one of the features that also got the axe was OCSP. We relied heavily on this feature at our edge to offer OCSP stapling to all clients automatically. So in order to avoid losing this functionality we spent a few weeks working on a replacement, and, surprise! we ended up with a far more reliable OCSP pipeline than when we started. You can read more about the work we did in this blog post.

ChaCha20-Poly1305 draft

Another feature that was removed was support for the legacy ChaCha20-Poly1305 ciphers (not to be confused with the ciphers standardized in RFC7905). These ciphers were deployed by some browsers before the standardization process finished and ended up being incompatible with the standard ciphers later ratified.

We looked at our metrics and realized that a significant percentage of clients still relied on this feature. These would be older mobile clients that don't have AES hardware offloading, and that didn't get software updated to get the newer ChaCha20 ciphers.

ChaCha20 Poly1305 ciphers usage

We decided to add support for these ciphers back to our own internal BoringSSL fork so that those older clients could still take advantage of them. We will keep monitoring our metrics and decide whether to remove them once the usage drops significantly.

Slow Base64: veni, vidi, vici

One somewhat annoying problem we noticed during a test deployment, was an increase in the startup time of our NGINX instances. Armed with perf and flamegraphs we looked into what was going on and realized the CPU was spending a ridiculous amount of time in BoringSSL’s base64 decoder.

It turns out that we were loading CA trusted certificates from disk (in PEM format, which uses base64) over and over and over in different parts of our NGINX configuration, and because of a change in BoringSSL that was intended to make the base64 decoder constant-time, but also made it several times slower than the decoder in OpenSSL, our startup times also suffered.

Of course the astute reader might ask, why were you loading those certificates from disk multiple times in the first place? And indeed there was no particular reason, other than the fact that the problem went unnoticed until it actually became a problem. So we fixed our configuration to only load the certificates from disk in the configuration sections where they are actually needed, and lived happily ever after.

Conclusion

Despite a few hiccups, this whole process turned out to be fairly smooth, also thanks to the rock-solid stability of the BoringSSL codebase, not to mention its extensive documentation. Not only we ended up with a much better and more easily maintainable system than we had before, but we also managed to contribute a little back to the open-source community.

As a final note we’d like to thank the BoringSSL developers for the great work they poured into the project and for the help they provided us along the way.

↧

CAA of the Wild: Supporting a New Standard

December 7, 2017, 6:00 am

≫ Next: On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions

≪ Previous: Make SSL boring again

CAA of the Wild: Supporting a New Standard

One thing we take pride in at Cloudflare is embracing new protocols and standards that help make the Internet faster and safer. Sometimes this means that we’ll launch support for experimental features or standards still under active development, as we did with TLS 1.3. Due to the not-quite-final nature of some of these features, we limit the availability at the onset to only the most ardent users so we can observe how these cutting-edge features behave in the wild. Some of our observations have helped the community propose revisions to the corresponding RFCs.

We began supporting the DNS Certification Authority Authorization (CAA) Resource Record in June behind a beta flag. Our goal in doing so was to see how the presence of these records would affect SSL certificate issuance by publicly-trusted certification authorities. We also wanted to do so in advance of the 8 September 2017 enforcement date for mandatory CAA checking at certificate issuance time, without introducing a new and externally unproven behavior to millions of Cloudflare customers at once. This beta period has provided invaluable insight as to how CAA records have changed and will continue to change the commercial public-key infrastructure (PKI) ecosystem.

As of today, we’ve removed this beta flag and all users are welcome to add CAA records as they see fit—without having to first contact support. Note that if you’ve got Universal SSL enabled, we’ll automatically augment your CAA records to allow issuance from our CA partners; if you’d like to disable Universal SSL and provide your own certificates, you’re welcome to do that too.
Below are some additional details on CAA, the purpose of this record type, and how its use has evolved since it was first introduced. If you’d rather just jump to the details of our implementation, click here and we’ll take you to the relevant section of the post.

The Publicly-Trusted PKI Ecosystem — Abridged

Before diving into CAA it’s helpful to understand the purpose of a public key infrastructure (PKI). Quite simply, PKI is a framework that’s used to secure communications between parties over an insecure public network. In “web PKI”, the PKI system that’s used to secure communications between your web browser and this blog (for example), the TLS protocol is used with SSL certificates and private keys to protect against eavesdropping and tampering.

While TLS handles the sanctity of the connection, ensuring that nobody can snoop on or mess with HTTPS requests, how does your browser know it’s talking to the actual owner of blog.cloudflare.com and not some imposter? Anyone with access to OpenSSL or a similar tool can generate a certificate purporting to be valid for this hostname but fortunately your browser only trusts certificates issued (or “signed by”) by certain well-known parties.

CAA of the Wild: Supporting a New Standard

These well-known parties are known as certification authorities (CAs). The private and public key that form the certificate for blog.cloudflare.com were generated on Cloudflare hardware, but the stamp of approval—the signature—was placed on the certificate by a CA. When your browser receives this “leaf” certificate, it follows the issuer all the way to a “root” that it trusts, validating the signatures along the way and deciding whether to accept the certificate as valid for the requested hostname.

Before placing this stamp of approval, CAs are supposed to take steps to ensure that the certificate requester can demonstrate control over the hostname. (In some cases, as you’ll learn below, this is not always the case, and is one of the reasons that CAA was introduced.)

Anthropogenic Threats

Given that people are imperfect beings and prone to making mistakes or poor judgement calls, it should come to the surprise of no one that the PKI ecosystem has a fairly blemished track-record when it comes to maintaining trust. Clients, CAs, servers, and certificate requesters are all created or operated by people who have made mistakes.

CAA of the Wild: Supporting a New Standard Jurassic Park. 1993, Stephen Spielberg [Film] Universal Pictures.

Client providers have been known to add compromising certificates to the local trust store or install software to intercept secure connections; servers have been demonstrated to leak private keys or be unable to properly rotate session ticket keys; CAs have knowingly mis-issued certificates or failed to validate hostname ownership or control reliably; and individuals requesting certificates include phishers creating convincing imposter versions of popular domains and obtaining valid and trusted certificates. Every party in this ecosystem is not completely without blame in contributing to a diminished sense of trust.

Of these many problems (most of which have already been addressed or are in the process of being resolved), perhaps the most unsettling is the willful mis-issuance of certificates by trusted CAs. Knowingly issuing certificates to parties who haven't demonstrated ownership, or issuance outside the parameters defined by the CA/Browser Forum (a voluntary and democratic governing body for publicly-trusted certificates) by CAs who have certificates present in trust stores severely undermines the value of that trust store, all certificates issued by that CA, and the publicly-trusted PKI ecosystem as a whole.

Solving One Problem...

To help prevent future mis-issuance by publicly trusted CAs, a new DNS resource record was proposed by those CAs to help reduce the risk of certificate mis-issuance: The Certification Authority Authorization (CAA) Resource Record.

The general idea is that the owner of any given domain (e.g., example.com) would add CAA records at their authoritative DNS provider, specifying one or more CAs who are authorized to issue certificates for their domain.

RFC6844 currently specifies three property tags for CAA records: issue, issuewild, and iodef.

The issue property tag specifies CAs who are authorized to issue certificates for a domain. For example, the record example.com. CAA 0 issue "certification-authority.net" allows the "Certification Authority" CA to issue certificates for example.com.
The issuewild property tag specifies CAs that are only allowed to issue certificates that specify a wildcard domain. E.g., the record example.com. CAA 0 issuewild "certification-authority.net" only allows the "Certification Authority" CA to issue certificates containing wildcard domains, such as *.example.com.
The iodef property tag specifies a means of reporting certificate issue requests or cases of certificate issuance for the corresponding domain that violate the security policy of the issuer or the domain name holder. E.g., the record example.com. CAA 0 iodef "mailto:example@example.com" instructs the issuing CA to send violation reports via email to the address provided at the attempted time of issuance.

CAA records with the issue and issuewild tags are additive; if more than one are returned as the response for a DNS query for a given hostname, the CAs specified in both records are both considered authorized.

If the authoritative DNS provider does not yet support CAA records or none are present in the zone file, the issuing CA is still authorized to issue when no records are present, largely preserving the issuance behavior before CAA records were an adopted standard.

As of 8 September 2017, all publicly-trusted CAs are now required to check CAA at issuance time for all certificates issued, thereby enabling certificate requestors (domain owners) to dictate which CAs can issue certificates for their domain.

... with More Problems.

RFC6844 specifies a very curious CAA record processing algorithm:

The search for a CAA record climbs the DNS name tree from the
   specified label up to but not including the DNS root '.'.

   Given a request for a specific domain X, or a request for a wildcard
   domain *.X, the relevant record set R(X) is determined as follows:

   Let CAA(X) be the record set returned in response to performing a CAA
   record query on the label X, P(X) be the DNS label immediately above
   X in the DNS hierarchy, and A(X) be the target of a CNAME or DNAME
   alias record specified at the label X.

   o  If CAA(X) is not empty, R(X) = CAA(X), otherwise

   o  If A(X) is not null, and R(A(X)) is not empty, then R(X) =
      R(A(X)), otherwise

   o  If X is not a top-level domain, then R(X) = R(P(X)), otherwise

   o  R(X) is empty.

While the above algorithm is not easily understood at first, the example immediately following it is much easier to comprehend:

For example, if a certificate is requested for X.Y.Z the issuer will
   search for the relevant CAA record set in the following order:

      X.Y.Z

      Alias (X.Y.Z)

      Y.Z

      Alias (Y.Z)

      Z

      Alias (Z)

      Return Empty

In plain English, this means that if the owner of example.com requests a certificate for test.blog.example.com, the issuing CA must

Query for a CAA record at test.blog.example.com.. If a CAA record exists for this hostname, the issuing CA stops checking for CAA records and issues accordingly. If no CAA record exists for this hostname and this hostname exists as an A or AAAA record, the CA then moves up the DNS tree to the next highest label.
Query for a CAA record at blog.example.com.. Just like the first check, if no CAA record exists for this hostname and this hostname exists as an A or AAAA record, the CA then continues traversing the DNS tree.
Query for a CAA record at example.com.
Query for a CAA record at com.

At the end of the last step, the issuing CA has climbed the entire DNS tree (excluding the root) checking for CAA records. This functionality allows a domain owner to create CAA records at the root of their domain and have those records apply to any and all subdomains.

However, the CAA record processing algorithm has an additional check if the hostname exists as a CNAME (or DNAME) record. In this case, the issuing CA must also check the target of the CNAME record. Revisiting the example above for test.blog.example.com. where this hostname exists as a CNAME record, the issuing CA must

Query for a CAA record at test.blog.example.com.. Since test.blog.example.com. exists as the CNAME test.blog.example.com. CNAME test.blog-provider.net., the issuing CA must next check the target for the presence of a CAA record before climbing the DNS tree.
Query for a CAA record at test.blog-provider.net.

The issuing CA in this example is only at step two in the CAA processing algorithm and it has already come across two separate issues.

First, the issuing CA has checked the hostname requested on the certificate (test.blog.example.com.) and since that hostname exists as a CNAME record, it has also checked the target of that record (test.blog-provider.net.). However, if test.blog-provider.net. itself is also a CNAME record, the the CAA record processing algorithm states that the issuing CA must check the target of that CNAME as well.

In this case it is fairly simple to create a CNAME loop (or very long CNAME chain) either via an accidental misconfiguration or with malicious intent to prevent the issuing CA from completing the CAA check.

Second, example.com and blog-provider.net might not be owned and operated by the same entity or even exist in the same network. The RFC authors appear to be operating under the assumption that CNAME records are still used as they were in November 1987:

A CNAME RR identifies its owner name as an alias, and specifies the corresponding canonical name in the RDATA section of the RR.

This behavior may have been true thirty years ago, but consider the number and prevalence of SaaS providers in 2017; who truly is authoritative for the content addressed by a DNS record, the content creator (and likely domain name owner who is using a CNAME) or the SaaS provider to whom the content creator subscribes?

Many if not all major service or application providers include clauses in their respective terms of service with regards to content added by the user to their account for the service. Intellectual property must be user-generated, properly licensed, or otherwise lawfully obtained. As content uploaded by users is difficult to control, liability limitations, indemnification, and service termination are all commonly used when the intellectual property added to the service is owned by another party, is unlawful, or is outside the definition of acceptable use. From a content perspective within the scope of a service provider's terms of service, it’s difficult to consider the service provider canonical for the content hosted there.

Using a real world example, there are currently 401,716 CNAME records in the zone files for domains on Cloudflare whose target is ghs.google.com. This hostname is given to subscribers of Google's Blogger service to use as the target of a CNAME so that a Blogger subscriber may use their own domain name in front of Google's service. With one CAA record, Google could dictate that nearly half a million blogs with a vanity domain name can only have one CA or no CAs issue certificates unless each of those hostnames created their own CAA records to allow issuance.

Even without following CNAMEs to their targets, the DNS climbing algorithm is not unproblematic. Operators of top-level domains may decide that only certain issuing CAs are trustworthy or certain CAs are advantageous for business and create CAA records only allowing those CAs to issue. We already see this in action today with the pseudo-top level domain nl.eu.org which has the CAA record nl.eu.org. CAA 0 issue "letsencrypt.org", which only allows issuance through Let's Encrypt without CAA records being present at the subdomains of nl.eu.org.

The authors of RFC6844 were also unwilling to secure their record and its use from any potential man-in-the-middle attacks—

Use of DNSSEC to authenticate CAA RRs is strongly RECOMMENDED but not required.

Without DNSSEC, DNS responses are returned to the requestor in plain-text. Anyone in a privileged network position (a recursive DNS provider or ISP) could alter the response to a CAA query to allow or deny issuance as desired. As only about 820,000 .com domain names out of the more than 130 million registered .com domain names are secured with DNSSEC, perhaps the low adoption rates influenced the decision to not make DNSSEC with CAA mandatory.

Moving beyond the RFC, the CA/Browser Forum's Baseline Requirements (BR) attempt to clarify aspects regarding the behavior a CA should take when checking for CAA.

CAs are permitted to treat a record lookup failure as permission to issue if:
- the failure is outside the CA's infrastructure;
- the lookup has been retried at least once; and
- the domain's zone does not have a DNSSEC validation chain to the ICANN root

Effectively, this means that if the CAA response is either a SERVFAIL, REFUSED, or the query times out, regardless of whether or not a CAA record exists, the CA is permitted to issue if this query fails more than once while attempting to issue a certificate. However, multiple CAs have told us that DNS lookup failures will prevent issuance regardless of the above conditions in the BR. In this case, something as benign as a transient network error could result in a denial of issuance or worse, any recursor that doesn’t understand and a query for CAA record will prevent issuance. We actually saw this with Comcast’s resolvers and reported the bug to their DNS provider.

There’s an additional security gap in that neither the RFC nor the BR indicate where the issuing CA should query for CAA records. It is acceptable within the current standards to query any DNS recursor for these records as well as the authoritative DNS provider for a domain. For example, an issuing CA could query Google’s Public DNS or a DNS recursor provided by their ISP for these responses. This enables compromised DNS recursors or one run by a rogue operator to alter these responses, either denying issuance or allowing issuance by a CA not approved by the domain owner. The RFC and BR should be amended so that an issuing CA must always query these records at the authoritative provider to close this gap.

CAA and Cloudflare

As of today, CAA records are no longer in beta and all customers are able to add CAA records for their zones. This can be done in the DNS tab of the Cloudflare dashboard or via our API.
CAA of the Wild: Supporting a New Standard

When creating CAA records in the dashboard, an additional modal appears to help clarify the different CAA tag options and format the record correctly as an incorrectly formatted record would result in every CA not being able to issue a certificate.

Cloudflare is in a unique position to be able to complete SSL validation for a domain and have certificates issued on the domain owner's behalf. This is how Cloudflare is able to automatically provision and have our Universal SSL certificates issued for every domain active on Cloudflare for free.

Cloudflare partners with multiple CAs to issue certificates for our managed SSL products: Universal SSL, Dedicated Certificates, and SSL for SaaS. Since CAA record checking is now mandatory for publicly-trusted CAs, Cloudflare automatically adds the requisite CAA records to the zone when a user adds one or more CAA records in the Cloudflare dashboard for their domain in order to allow our partner CAs to continue to be able to issue certificates for each of our SSL products.

Some site owners may want to manage their own SSL certificates in order to be compliant with their own standard operating procedures or policies. Alternately, domain owners may only want to trust specific CAs that are not the CAs Cloudflare currently partners with to issue Universal SSL certificates. In the latter case, users now have the ability to disable Universal SSL.

CAA of the Wild: Supporting a New Standard

When Universal SSL is disabled, the CAA records added to allow our partner CAs to issue certificates are deleted and all Universal SSL certificates available for the zone are removed from our edge. Dedicated certificates, custom certificates, as well as SSL for SaaS certificates are all individually managed by each customer and can be added or removed as needed.

A Bright and Hopefully Not-Too-Distant Future

CAA as it exists today does very little to reduce the attack surface around certificate issuance while making it more difficult for well-intended parties to participate. That said, CAA is a young standard recently adopted by the web PKI community with many involved individuals as well as active parties working toward addressing some of the gaps present in the current RFC and committed to making the overall CAA experience better for both the certificate requesters and CAs.

To start, an errata report exists to clarify the CAA record processing algorithm and to reduce the degree in which the targets of CNAME records should be checked. Similarly, the DNS tree-climbing behavior of the CAA record processing algorithm is still up for debate. There are also active discussions around implementation issues, such as the recognition that some authoritative DNS resolvers incorrectly sign empty responses to CAA queries when DNSSEC is enabled for a zone and how to handle these cases in a way that would still allow CAs to issue. Proposals exist suggesting new tags or property fields in CAA records, such as a CA requiring an account number in the property value for issue tags, or only allowing specific validation types (e.g., DV, OV, or EV). The Electronic Frontier Foundation (EFF) has expressed interest in hardening CAs against non-cryptographic attacks, particularly with a focus on the domain validation process for obtaining certificates. While not directly pertaining to CAA, any ideas or proposed hardening solutions might increase reliance on CAA or completely obviate the need for it altogether.

CAA of the Wild: Supporting a New Standard CC BY-NC 2.5 by xkcd.com

As with all internet standards, none are perfect and all are far from permanent—CAA included. Being in the position to implement new standards at scale, seeing what effect adoption of those standards has, and working with the Internet community to address any issues or gaps is a privilege and allows us to live up to our mission of building a better Internet.

We're thrilled to be involved in efforts to make CAA (and any other standard) better for anyone and everyone.

↧

On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions

December 7, 2017, 12:44 pm

≫ Next: Building a new IMDB: Internet Mince Pie Database

≪ Previous: CAA of the Wild: Supporting a New Standard

On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions

Cloudflare has been recognized as a leader in the “Forrester Wave^TM: DDoS Mitigation Solutions, Q4 2017.”

The DDoS landscape continues to evolve. The increase in sophistication, frequency, and range of targets of DDoS attacks has placed greater demands on DDoS providers, many of which were evaluated in the report.

This year, Cloudflare received the highest scores possible in 15 criteria, including:

Length of Implementation
Layers 3 and 4 Attacks Mitigation
DNS Attack Mitigation
IoT Botnets
Multi-Vector Attacks
Filtering Deployment
Secure Socket Layer Investigation
Mitigation Capacity
Pricing Model

We believe that Cloudflare’s position as a leader in the report stems from the following:

An architecture designed to address high-volume attacks. This post written in October 2016 provides some insight into how Cloudflare’s architecture scales to meet the most advanced DDoS attacks differently than legacy scrubbing centers.
In September 2017, due to the size and effectiveness of our network, we announced the elimination of “surge pricing” commonly found in other DDoS vendors by offering unmetered mitigation. Regardless of what Cloudflare plan a customer is on—Free, Pro, Business, or Enterprise—we will never terminate a customer or charge more based on the size of an attack.
Because we protect over 7 million Internet properties, we have a unique view into the types of attacks launched across the Internet, especially harder-to-deflect Layer 7 application attacks. This allows us to reduce the amount of manual intervention and use automated mitigations to more quickly detect and block attacks.
Our DDoS mitigation solution helps protect customers by integrating with not only a stack of other security features, such as SSL and WAF, but also with a full suite of performance features. With a highly scalable network of over 118 data centers, Cloudflare can both accelerate legitimate traffic and block malicious DDoS traffic.

This combination of scale, ease-of-use through automatic mitigations, and integration with performance solutions, continues to advance our mission to help build a better Internet.

At Cloudflare, our mission is to help build a better internet - one that is performant, secure and reliable for all. We do this through a combination of scale, ease-of use, and data-driven insights that enable us to deliver automatic mitigation. It is because of this focus and these types of innovation that we were able to offer unmetered DDoS mitigation at no additional cost to all of our customers this year. We are honored to be recognized as a leader in the Forrester Wave^TM: DDoS Mitigation Solutions, Q4 2017 report.

To check out the full report, download your complimentary copy here.

On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions

↧

Building a new IMDB: Internet Mince Pie Database

December 8, 2017, 6:55 am

≫ Next: The end of the road for Server: cloudflare-nginx

≪ Previous: On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions

Mince Pies CC-BY-SA 2.0 image by Phil! Gold

Since joining Cloudflare I’ve always known that as we grew, incredible things would be possible. It’s been a long held ambition to work in an organisation with the scale to answer a very controversial and difficult question. To do so would require a collection of individuals with a depth of experience, passion, dedication & above all collaborative spirit.

As Cloudflare’s London office has grown in the last 4 years I believe 2017 is the year we reach the tipping point where this is possible. A paradigm-shift in the type of challenges Cloudflare is able to tackle. We could finally sample every commercially available mince pie in existence before the 1st of December. In doing so, we would know conclusively which mince pie we should all be buying over Christmas to share with our friends & families.

What is a mince pie?

For the uninitiated, a Mince Pie is “a sweet pie of British origin, filled with a mixture of dried fruits and spices called mincemeat, that is traditionally served during the Christmas season in the English world.” - Wikipedia for Mince Pie

The original Mince Pie was typically filled with a mixture of minced meat, suet and a variety of fruits and spices like cinnamon, cloves and nutmeg. Today, many mince pies are vegetarian-friendly, containing no meat or suet. They are churned out by both large commercial operations and 200 year-old, family-run bakeries alike to feed hungry Brits at Christmas. Some factories peak at at more than 27pps (pies per second).

Mince Pie Sketch

Review Methodology

Early on we settled on 4 key metrics to score each pie on, on a scale from 1-10. When reviewing anything with a scientific approach, consistency is key. Much like a well made pastry case.

What does and does not constitute a pie?

Very quickly we realised that we needed some hard rules on what counted as a pie. For example, we had some "Frangipane Mince Pies" from Marks & Spencer which caused a lot of controversy– these do not have a top, but instead cover the mince with a baked frangipane.

Although these rule-breaking pies were not included in our leaderboard, they're definitely worth a try... one reviewer described the filling as “inoffensive” and another left these comments;

"ZERO air gap! How do you solve the problem of an air gap in your pie? Fill it with some delicious tasty frangipane, that's how. The crunchy almonds on the top really cut through the softness with some texture, too. Most excellent."

Tom Arnfeld, Systems Engineer

Pastry / Filling Ratio The ratio is really key to a good mince pie, but it is also possible for other aspects of the pie to be bad while the ratio itself is excellent. To be clear (there was confusion and debate internally) a score of 5/10 ratio would mean the ratio was average in quality. It does not directly measure the ratio itself. A 10/10 would have the perfect ratio of pastry to mince. Air gap was also a consideration, and the detailed comments each reviewer made on each pie often explain this.

Pastry The right pastry needs to be not too thick, crispy but still have chew and be moist but not soggy. It should hold the filling without it spilling out.

Filling The filling itself is probably the thing most pies were scored harshly on. A good filling has a variety of fruits and textures and possibly even some other flavours such as brandy.

Overall We left it to each reviewer to add an overall score judging the entire pie.

Results

30 types of mince pie, 68 reviewers, 18 hungry Cloudflare staff.

Top 5 Pies

After collecting all of the reviews together, here’s our top 5 pies.

1. Heston from Waitrose Spiced Shortcrust Mince Pies Lemon

£0.75p per pie

Look, I'm going to get straight to it: if you are a mince pie traditionalist, these are not the mince pies for you. HOWEVER, for me they were a revelation. They push boundaries, break all the rules but somehow retain the essence of a great mince pie.

Pastry: TWO DIFFERENT TYPES for the base and the topping. Some people have an issue with the pastry 'rubble' on top. I found it DELIGHTFUL. 9/10

Filling: Not only was there excellent standard mincemeat, but also there was a thin layer of lemon jelly in there as well. SPLENDID. Strong nutmeg and clove themes throughout.

Extra marks for the surreal box art. ("Ceci n'est pas une pie")

Sam Howson, Support Engineer

2. Dunns Traditional Deep-Fill Pastry Mince Pies

£1.35p per pie

Pastry / Filling Ratio: Absolutely no visible air gap. TFL could learn from this. Actually fulfilling the promise of deep filled for the first time ever. 10/10

Pastry: Really great. Well constructed, cooked & even all the way around. We've said this a lot but probably "needs more butter" 8/10

Filling: Ironically the reason I like this is also the reason why it's not getting a 10. The citrus - it's so good to have some citrus in there - sorely lacking in lots of other fillings. But it's just too much - it's really the only discernible flavour. Some booze wouldn't go amiss here. 8/10

Overall: This competition might be Dunn Dunn Dunn. Spectacular. 8.5

Simon Moore, Lead Customer Support Engineer

3. Fortnum & Mason

£1.83p per pie

These were by far the most controversial, from our reviewers' point of view.

Pastry / Filling Ratio: About right... a well filled pie with little air gap and a pastry that wasn't so thick that it would diminish the filling. 9/10

Pastry: The weakest part, butter and soft, lovely all over... except that the base was also like this and needed to be a bit crisper and less soft. 7/10

Filling: Subtle flavours, good spices, a hint of Christmas warmth. Could be improved with a little more aftertaste like the K&C ones had, but this is a good filling and with lots of filling, very tasty. 8/10

Overall: A pretty good pie, one of the best yet but the pastry not being crisp and solid underneath makes it fall apart in your hands a little. 8/10

David Kitchen, Engineering Manager

For contrast, one anomalous reviewer wrote;

Pastry / Filling Ratio: EXCELLENT. The pie is deep, and with little air gap. Other reviews mention an air gap however, so QC is clearly not a priority in spite of its cost. 9/10

Pastry: Just fine. Kind of dry, not very buttery. Sugar was nice on top, but really nothing to write home about. Structurally this pie was terrible as the lid lifted clean off when I tried to get it out of its tin. 4/10

Filling: The worst part of this pie, by a massive margin. The mince was almost a pureé, with no discernable fruit textures. Very lightly spiced, again with no flavour components of it being distinguishable. Lack of texture here made it feel like a mince pie for the elderly or small children as after the pastry had disintegrated, there was nothing to chew. 3/10

Overall: Decadently priced farce of a pie filled with a disappointment flavoured mincemeat. Might as well have been packaged in a Londis box. As my girlfriend said "That sounds like it needs covering in cream and then putting in the bin". I wholeheartedly agree. 3/10

Bhavin Tailor, Support Engineer

4. Marks & Spencer Standard Mince Pies (red box)

(price unknown)

Pastry / Filling Ratio: Filled to the brim. Excellent! 9/10

Pastry: Buttery, right thickness, just lovely. Although a bit too much on the sweet side. 7/10

Filling: Great! Nice flavours, still some texture, nothing overwhelms the other flavours. Just wonderful. 8/10

Overall: Great pastry, could eat it all day. 8/10

Tom Strickx, Network Automation Engineer

5. Marks & Spencer Extra Special Mince Pies

£0.33p per pie

Pastry / Filling Ratio: Massive air gap. Probably has more air than mince, which is very disappointing. 5/10

Pastry: Buttery, but a bit too thick. Feels a bit heavy. 7/10

Filling: Very noticeable brandy smell, luckily not as present in the taste. Subtle brandy flavour, but doesn't overwhelm the actual mince. A+ 9/10

Overall: Great flavour, love the touch of brandy, unfortunately a bit let down by the filling ratio, and the heaviness of the crust. 7/10

Tom Strickx, Network Automation Engineer

Other Entrants

While on our quest to try every pie on the market, we encountered some great ones that are worth a mention. Of aldi mince pies we've tried a single Marks & Spencer pie asda best taste of all, but one had the lidlest per pie cost.

Greggs

£0.25p per pie

Pastry / Filling Ratio: There's no easy way of saying this, there's more pastry here than there are Greggs branches in Coventry. The only way I would score this lower was if there was no mince at all and I was just eating a solid puck of pastry. 1/10

Pastry: Overcooked, brittle and really just miserable. I'm giving it a point only because it exists. 1/10

Filling: Sweet & Bland. 1/10

Overall: Horrible. 1/10

Simon Moore, Lead Customer Support Engineer

Bigger than the standard supermarket ones, which is definitely nice.

Pastry / Filling Ratio: I think kids these day call it "dat gap". Unfortunately in this case, it's not a good thing.

Pastry: Buttery, crumbly, good thickness, bit bland. 6/10

Filling: Bit bland as well, no specific highlights or notes of flavour. 6/10

Overall: Pretty bland, but not too shabby.

Tom Strickx, Network Automation Engineer

Costco

£0.44p per pie

Pastry/Filling Ratio: 9/10 To solve the problem of the 'air gap', Costco decided to top the mince pie with sponge cake. This technically makes the pastry/filling ratio near-perfect since the tiny amount of pastry on the outside matches the tiny amount of filling. Well played, Costco.

Pastry: 3/10 The pastry itself is acceptable but minimal - the majority of the cake is sponge. Yes, I said cake - this is no pie.

Filling: 1/10 They decided to fill it one currant high. And not even like a currant standing proud like the California Raisins, this is a teensy portion. I'm pretty sure the icing on top is thicker than the filling.

Overall: 2/10 This is not a mince pie, it's clearly a sugar cake that has some regulation-mandated minimum amount of mincemeat content to call it a mince pie. Making the 'pies' huge doesn't compensate for anything. Poor showing, Costco.

Chris Branch, Systems Engineer

Mr Kipling

£0.25p per pie (from Tesco)

Pastry / Filling Ratio: 5/10

Pastry: More salty than buttery. 4/10

Filling: Drabness cloaked in excessive sweetness. 4/10

Mr Kipling purports his products to be "exceedingly good" in his television advertisements, but this pie did not lend support to that claim.

David Wragg, Systems Engineer

Aldi (Cognac Steeped)

£0.38p per pie

Pastry / Filling Ratio: This pie has more air than Michael Jordan 4/10

Pastry: Good but far too thick on the lid. 6/10

Filling: I can detect the booze but it's just not really adding anything. There's nothing of distinction here. 5/10

Overall: 4/10 The ratio and the lid spoil what would otherwise be a serviceable mince pie.

Simon Moore, Lead Customer Support Engineer

Jimmy’s Home-made Pies

With so many mince pies moving through the office on a daily basis, one of our resident staff bakers decided to bake some of his own to add into the mix. Jimmy Crutchfield (Systems Reliability Engineer) brought in 12 lovingly made pies for us all to try...

Jimmy Alpha

Pastry / Filling Ratio: Nice and deep, but a little too moist. Surprising given apparently the mincemeat was shop bought – you'd think it would have the right consistency.

Pastry: The pastry was pretty well cooked, and not too dry. I think it could do with a bit more butter though. 8/10

Filling: The added apple bits introduced some delightful new texture. 6/10

Overall: Little in the way of decoration on the top, though bonus points for the lovingly home-made look. Pretty excited about the next version. 9/10

Tom Arnfeld, Systems Engineer

A couple of weeks later, Jimmy had his hand at making a second batch, too!

Pastry / Filling Ratio: Small air gap. 8/10

Pastry: Buttery & crumbly, very good. 9/10

Filling: Just a little too tart. 8/10

Overall: I'd be happy if i'd paid for a box of them. 9/10

Michael Daly, Systems Reliability Engineering Manager

Falling by the wayside

There are too many pies and reviews to mention in full detail, but here’s a full list of the other pies we haven’t mentioned, sorted by their rating.

Sainsbury’s Bakery (fresh)
Carluccio’s
Coco di Mama Mini Mince Pies
Riverford Farm Shop Classic
Tesco Standard
Marks & Spencer Lattice-top
Marks & Spencer All Butter
Tesco Finest (with Cognac)
Aldi Sloe Gin Mince Tarts
Waitrose All Butter
Gail’s
LIDL Favorina
Co-op Irresistible
Sainsbury’s Deep Filled
LIDL Brandy Butter
Aldi Almond Mince Tarts

With so many reviews from so many staff, we’d like to thank everyone that took part in our quest! Alex Palaistras, Bhavin Tailor, Chris Branch, David Kitchen, David Wragg, Etienne Labaume, John Graham-Cumming, Jimmy Crutchfield, Lorenz Bauer, Matthew Bullock, Michael Daly, Sam Howson, Scott Pearson, Simon Moore, Sophie Bush, Tim Ruffles, Tom Arnfeld, Tom Strickx.

If you want to join a passionate, dedicated, talented and mince pie-filled team - we’re hiring!

↧

The end of the road for Server: cloudflare-nginx

December 11, 2017, 6:00 am

≫ Next: McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border

≪ Previous: Building a new IMDB: Internet Mince Pie Database

Six years ago when I joined Cloudflare the company had a capital F, about 20 employees, and a software stack that was mostly NGINX, PHP and PowerDNS (there was even a little Apache). Today, things are quite different.

The end of the road for Server: cloudflare-nginx CC BY-SA 2.0 image by Randy Merrill

The F got lowercased, there are now more than 500 people and the software stack has changed radically. PowerDNS is gone and has been replaced with our own DNS server, RRDNS, written in Go. The PHP code that used to handle the business logic of dealing with our customers’ HTTP requests is now Lua code, Apache is long gone and new technologies like Railgun, Warp, Argo and Tiered Cache have been added to our ‘edge’ stack.

And yet our servers still identify themselves in HTTP responses with

Server: cloudflare-nginx

Of course, NGINX is still a part of our stack, but the code that handles HTTP requests goes well beyond the capabilities of NGINX alone. It’s also not hard to imagine a time where the role of NGINX diminishes further. We currently run four instances of NGINX on each edge machine (one for SSL, one for non-SSL, one for caching and one for connections between data centers). We used to have a fifth but it’s been deprecated and are planning for the merging of the SSL and non-SSL instances.

As we have done with other bits of software (such as the KyotoTycoon distributed key-value store or PowerDNS) we’re quite likely to write our own caching or web serving code at some point. The time may come when we no longer use NGINX for caching, for example. And so, now is a good time to switch away from Server: cloudflare-nginx.

We like to write our own when the cost of customizing or configuring existing open source software becomes too high. For example, we switched away from PowerDNS because it was becoming too complicated to implement all the logic we need for the services we provide.

Over the next month we will be transitioning to simply:

Server: cloudflare

If you have software that looks for cloudflare-nginx in the Server header it’s time to update it.

We’ve worked closely with companies that rely on the Server header to determine whether a website, application or API uses Cloudflare, so that their software or service is updated and we’ll be rolling out this change in stages between December 18, 2017 and January 15, 2018. Between those dates Cloudflare-powered HTTP responses may contain either Server: cloudflare-nginx or Server: cloudflare.

↧

McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border

December 11, 2017, 2:43 pm

≫ Next: The FCC Wants to Kill Net Neutrality - Use Battle for the Net on Cloudflare Apps to Fight Back

≪ Previous: The end of the road for Server: cloudflare-nginx

McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border

Five key facts to know about McAllen, Texas

McAllen, Texas is on the southern tip of the Rio Grande Valley
The city is named after John McAllen, who provided land in 1904 to bring the St. Louis, Brownsville & Mexico Railway railway into the area
McAllen, Texas is named the City of Palms
The border between Mexico and the USA is less than nine miles away from the data center
McAllen, Texas is where Cloudflare has placed its 119th data center

Second datacenter in Texas; first on the border with Mexico

While McAllen is close to the Mexican border, its importance goes well beyond that simple fact. The city is halfway between Dallas, Texas (where Cloudflare has an existing datacenter) and Mexico City, the center and capital of Mexico. This means that any Cloudflare traffic delivered into Mexico is better served from McAllen. Removing 500 miles from the latency equation is a good thing. 500 miles equates to around 12 milliseconds of round-trip latency and when a connection operates (as all connections should), as a secure connection, then there can be many round trip communications before the first page starts showing up. Improving latency is key, even if we have a 0-RTT environment.

McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border Image courtesy of gcmap service

However, it gets better! A significant amount of Mexican Cloudflare traffic is delivered to ISPs and telcos that are just south of the Mexican border, hence McAllen improves their performance even more-so. Cloudflare chose the McAllen Data Center in order to provide those ISPs and telcos a local interconnect point.

Talking of interconnection - what’s needed is a solid IXP footprint

As astute readers of the Cloudflare blog know, the Cloudflare network interconnects to a large number of Internet Exchanges globally. Why should that be any different in McAllen, Texas. It’s not. As of last week, there was a brand new Internet Exchange (IX) in McAllen, Texas.

McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border Image, with permission, from Joel Pacheco’s Facebook page

MEX-IX is that new IX and it provides a whole new way to interconnect with Mexican carriers, many of which are present in McAllen already. Cloudflare will enable peering on that IX as quickly as we can.

Next up, we go south!

Cloudflare has plenty of datacenter presence in South America, however Panama is the only datacenter we have operating within Central America. That means that we still have to work on Belize, Costa Rica, El Salvador, Guatemala, Honduras, and Nicaragua.

But there’s one more place we need to deploy into in order to move the Mexican story forward and that’s Mexico City. More about that in a later blog.

Cloudflare will continue to build new datacenters, including the ones south of the border, and ones around the globe. If you enjoy the idea of helping build one of the world's largest networks, come join our team!

↧

The FCC Wants to Kill Net Neutrality - Use Battle for the Net on Cloudflare Apps to Fight Back

December 11, 2017, 7:00 pm

≫ Next: Why Some Phishing Emails Are Mysteriously Disappearing

≪ Previous: McAllen, Texas: Cloudflare opens 119th Data Center just north of the Mexico border

The FCC Wants to Kill Net Neutrality - Use Battle for the Net on Cloudflare Apps to Fight Back

TL;DR - Net neutrality is under attack. There's an app on Cloudflare Apps that empowers site owners to host a popup on their sites, encouraging users to contact their congresspeople to fight back. Everyone should be doing this right now, before the December 14th FCC vote.

Use Battle for the Net to Call your Congressperson »

Attend Cloudflare's Save the Internet! Net Neutrality Call-A-Thon »

The Federal Communications Commission (FCC) has scheduled a vote to kill its net neutrality rules this Thursday, December 14th. Unfortunately, the expectation is that the FCC will vote to repeal its net neutrality rules. Read about this on Business Insider, Bloomberg, or TechCrunch.

Net neutrality is the principle that networks should not discriminate against content that passes through them. The FCC’s net neutrality rules protect the Internet, users, and companies from abusive behavior by the largest Internet Service Providers (ISPs). Without net neutrality rules in place, ISPs may be able to legally create a "pay to play" system and charge websites to provide content to their customers more quickly. This will create a disadvantage for startups, bloggers, and everyone else who cannot afford to pay fees for their websites to offer faster service.

Cloudflare founders and employees strongly believe in the principle of network neutrality. Cloudflare co-founder and COO Michelle Zatlyn, sat on the FCC's Open Internet Advisory Committee, which guided the FCC to vote to preserve net neutrality in 2015. Cloudflare co-founder and CEO Matthew Prince and other employees have written four blog posts 1, 2, 3, 4, describing Cloudflare’s views on net neutrality.

I am extremely disappointed that net neutrality is under threat. I am extremely grateful that I work at a company made of people who are fighting for it and I'm hopeful our community (you) can make a difference.

Read, watch, listen, and learn much more about net neutrality and its importance below.

For now, here is my favorite video explanation of net neutrality, by John Oliver, host of Last Week Tonight on HBO.

Battle for the Net

Because net neutrality is under attack, the Battle for the Net app is once again live on Cloudflare Apps. The app can be used to “Break the Internet” for the two days before the FCC’s vote, as part of an internet-wide protest.

This app allows site owners to add a pop-up to their sites that will directly connect users to their respective US congresspeople so they may articulate their stance for net neutrality.

On a site that uses the Battle for the Net app, users are greeted with a pop-up which briefly explains that net neutrality is under attack, displays a countdown to the day and time (Thursday, December 14th) the FCC will vote to kill the net neutrality rules, and provides an entry field for the user to enter their phone number.

When a user enters their phone number and clicks the "CALL CONGRESS" button, they'll immediately receive an automated phone call from Battle for the Net. The recording instructs the user to enter their zip code, so they'll be connected to their specific congressperson.

Users may select the option to become a daily caller by pressing 1. This will initiate a process where users will receive calls at the same time each day, connecting them to their congresspeople.

To make one-time calls, users can just stay on the line. The recording delivers a recommended script to inform the congressperson on the line that the user supports net neutrality and wants the congressperson to contact FCC Chairman Pai and oppose the repeal.

Here's the written script:

Be polite, introduce yourself, and say: “I support the existing Title II net neutrality rules and I would like you to publicly oppose the FCC’s plan to repeal them.”

When done with the first call, users may press * to be directed to another call to their next congressperson.

I live in San Francisco, so my first call was directed to Representative Nancy Pelosi. My second call was directed to Senator Dianne Feinstein.

On Cloudflare Apps, you can preview Battle for the Net and see how it'd look on a site. Cloudflare users can install the app on their site with the click of a button. You can see how a user can enter a phone number into the pop-up. Users are given share links on their screens, so they may share the action they take on Facebook or Twitter. They are also given the option to donate to the cause.

Though the pop-up covers a significant portion of the page, it can be easily discarded by clicking the "x" in the upper right corner.

Use Battle for the Net to Call your Congressperson »

Learn more about Battle for the Net and the Break the Internet protest on their site.

Are you in San Francisco? Attend Cloudflare’s Save the Internet! Net Neutrality Call-A-Thon

Join us at Cloudflare (or remotely) in the fight for net neutrality

The event will kick off with an introduction to net neutrality and why we think it's important. We'll preview the Battle for the Net app and use it to find our local representatives and call and tweet at them, letting them know we want them to take a stand for net neutrality.

Pizza will be provided for callers. Bring your own cell phone.

Tuesday, December 12th: 12:00pm-1:00pm

Cloudflare San Francisco
101 Townsend Street
San Francisco, CA 94107

If you can't make it on-site, join the effort by calling through the app, remotely.

More about the FCC

The FCC is a 5-member Commission, made up of three Republicans, including Pai, and two Democrats, including Clyburn.
Its purpose is to regulate interstate and international communications by radio, television, wire, satellite, and cable in all 50 states, the District of Columbia and U.S. territories.
It's an independent agency of the US government, overseen by Congress. The Commission is responsible for implementing and enforcing America’s communications law and regulations. It has over 1,700 employees and a budget of almost $400 Million.

Conclusion

I was encouraged, for a moment, when I read an article, claiming that the FCC would not be voting on the repeal of net neutrality in November. But now net neutrality is under attack again, just one month later. We need to fight for it. Join the fight.

Call your Congressperson »

Attend Cloudflare's Save the Internet! Net Neutrality Call-A-Thon »

↧

Why Some Phishing Emails Are Mysteriously Disappearing

December 12, 2017, 6:00 am

≫ Next: The Curious Case of Caching CSRF Tokens

≪ Previous: The FCC Wants to Kill Net Neutrality - Use Battle for the Net on Cloudflare Apps to Fight Back

Why Some Phishing Emails Are Mysteriously Disappearing

Phishing is the absolute worst.

Unfortunately, sometimes phishing campaigns use Cloudflare for the very convenient, free DNS. To be clear –– there’s a difference between a compromised server being leveraged to send phishing emails and an intentionally malicious website dedicated to this type of activity. The latter clearly violates our terms of service.

In the past, our Trust and Safety team would kick these intentional phishers off the platform, but now we have a new trick up our sleeve and a way for their malicious emails to mysteriously disappear into the ether.

Background: How Email Works

SMTP - the protocol used for sending email - was finalized in 1982, when it was just a small community online. Many of them knew and trusted each other, and so the protocol was built entirely on trust. In an SMTP message, the MAIL FROM field can be arbitrarily defined. That means you could send an email from any email address, even one you don’t own.

This is great for phishers, and bad for everyone else.

The solution to prevent email spoofing was to create the Sender Policy Framework (SPF). SPF allows the domain owner to specify which servers are allowed to send email from that domain. That policy is stored in a DNS TXT record like this one from cloudflare.com:

$ dig cloudflare.com txt
"v=spf1 ip4:199.15.212.0/22 ip4:173.245.48.0/20 include:_spf.google.com include:spf1.mcsv.net include:spf.mandrillapp.com include:mail.zendesk.com include:customeriomail.com include:stspg-customer.com -all"

This says that email clients should only accept cloudflare.com emails if they come from an IP in the ranges 199.15.212.0/22, 173.245.48.0/20, or one of the IP ranges found in the SPF records for the other domains listed. Then if a receiving email server receives an email from someone@cloudflare.com from the server at 185.12.80.67, that email server would check the SPF records of all the allowed domains until it finds that 185.12.80.67 is allowed because 185.12.80.0/22 is listed in mail.zendesk.com’s SPF record:

$ dig txt mail.zendesk.com
"v=spf1 ip4:192.161.144.0/20 ip4:185.12.80.0/22 ip4:96.46.150.192/27 ip4:174.137.46.0/24 ip4:188.172.128.0/20 ip4:216.198.0.0/18 ~all"

Additional methods for securing email were created after SPF. SPF only validates the email sender but doesn’t do anything about verifying the content of the email. (While SMTP can be sent over an encrypted connection, SMTP is notoriously easy to downgrade to plaintext with a man in the middle attack.)

To verify the content, domain owners can sign email messages using DKIM. The email sender includes the message signature in an email header called DKIM-Signature and stores the key in a DNS TXT record.

$ dig txt smtpapi._domainkey.cloudflare.com
"k=rsa\; t=s\; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDPtW5iwpXVPiH5FzJ7Nrl8USzuY9zqqzjE0D1r04xDN6qwziDnmgcFNNfMewVKN2D1O+2J9N14hRprzByFwfQW76yojh54Xu3uSbQ3JP0A7k8o8GutRF8zbFUA8n0ZH2y0cIEjMliXY4W4LwPA7m4q0ObmvSjhd63O9d8z1XkUBwIDAQAB"

There’s one more mechanism for controlling email spoofing called DMARC. DMARC sets the overarching email policy, indicates what to do if the policies are not met and sets a reporting email address for logging invalid mail attempts. Cloudflare’s DMARC record says that noncomplying emails should be sent to junk mail, 100% of messages should be subject to filtering and if policies are not met, send the report to the two email addresses below.

$ dig txt _dmarc.cloudflare.com
"v=DMARC1\; p=quarantine\; pct=100\; rua=mailto:rua@cloudflare.com, mailto:gjqhulld@ag.dmarcian.com"

When an email server receives an email from someone@cloudflare.com, it first checks SPF, DKIM and DMARC records to know whether the email is valid, and if not, how to route it.

Stopping Phishy Behavior

For known phishing campaigns using the Cloudflare platform for evil, we have a DNS trick for getting their phishing campaigns to stop. If you remember, there are three DNS records required for sending email: SPF, DKIM and DMARC. The last one is the one that defines the overarching email policy for the domain.

What we do is rewrite the DMARC record so that the overarching email policy instructs email clients to reject all emails from that sender. We also remove the other DNS record types used for sending email.

"v=DMARC1; p=reject"

When an email client receives a phishing email, the corresponding DNS records instruct the client not to accept the email and the phishing email is not delivered.

You can see it in action on our fake phish domain, astronautrentals.com.

astronautrentals.com is configured with an SPF record, a DKIM record, and a DMARC record with a policy to accept all email.

Why Some Phishing Emails Are Mysteriously Disappearing

However, because it is a known (fake) phishing domain, when you query DNS for these records, SPF will be missing:

$ dig astronautrentals.com txt
astronautrentals.com.    3600    IN  SOA art.ns.cloudflare.com. dns.cloudflare.com. 2026351035 10000 2400 604800 3600

DKIM will be missing:

$ dig _domainkey.astronautrentals.com txt
astronautrentals.com.    3600    IN  SOA art.ns.cloudflare.com. dns.cloudflare.com. 2026351035 10000 2400 604800 3600

And DMARC policy will be rewritten to reject all emails:

$ dig _dmarc.astronautrentals.com txt
"v=DMARC1\; p=reject"

If we try to send an email from @astronautrentals.com, the email never reaches the recipient because the receiving client sees the DMARC policy and rejects the email.

This DMARC alteration happens on the fly –– it's a computation we do at the moment when we answer the DNS query –– so the original DNS records are still shown to the domain owner in the Cloudflare DNS editor. This adds some mystery to why the phish attempts are failing to send.

Using DNS To Combat Phishing

Phishing is the absolute worst, and the problem is that it sometimes succeeds. Last year Verizon reported that 30% of phishing emails are opened, and 13% of those opened end with the receiver clicking on the phishing link.

Keeping people safe on the internet means decreasing the number of successful phishing attempts. We're glad to be able to fight phish using the DNS.

↧

The Curious Case of Caching CSRF Tokens

December 13, 2017, 6:00 am

≫ Next: There’s Always Cache in the Banana Stand

≪ Previous: Why Some Phishing Emails Are Mysteriously Disappearing

It is now commonly accepted as fact that web performance is critical for business. Slower sites can affect conversion rates on e-commerce stores, they can affect your sign-up rate on your SaaS service and lower the readership of your content.

In the run-up to Thanksgiving and Black Friday, e-commerce sites turned to services like Cloudflare to help optimise their performance and withstand the traffic spikes of the shopping season.

The Curious Case of Caching CSRF Tokens

In preparation, an e-commerce customer joined Cloudflare on the 9th November, a few weeks before the shopping season. Instead of joining via our Enterprise plan, they were a self-serve customer who signed-up by subscribing to our Business plan online and switching their nameservers over to us.

Their site was running Magento, a notably slow e-commerce platform - filled with lots of interesting PHP, with a considerable amount of soft code in XML. Running version 1.9, the platform was somewhat outdated (Magento was totally rewritten in version 2.0 and subsequent releases).

Despite the somewhat dated technology, the e-commerce site was "good enough" for this customer and had done it's job for many years.

They were the first to notice an interesting technical issue surrounding how performance and security can often feel at odds with each other. Although they were the first to highlight this issue, into the run-up of Black Friday, we ultimately saw around a dozen customers on Magento 1.8/1.9 have similar issues.

Initial Optimisations

After signing-up for Cloudflare, the site owners attempted to make some changes to ensure their site was loading quickly.

The website developers had already ensured the site was loading over HTTPS, in doing so they were able to ensure their site was loading over the new HTTP/2 Protocol and made some changes to ensure their site was optimised for HTTP/2 (for details, see our blog post on HTTP/2 For Web Developers).

At Cloudflare we've taken steps to ensure that there isn't a latency overhead for establishing a secure TLS connection, here is a non-complete list of optimisations we use:

TLS Session Resumption
OCSP Stapling
Fast Elliptic Curve Cryptography prioritised
Dynamic TLS Record Sizing

Additionally, they had enabled HTTP/2 Server Push to ensure critical CSS/JS assets could be pushed to clients when they made their first request. Without Server Push, a client has to download the HTML response, interpret it and then work out assets it needs to download.

Big images were Lazy Loaded, only downloading them when they needed to be seen by the users. Additionally, they had enabled a Cloudflare feature called Polish. With this enabled, Cloudflare dynamically works out whether it's faster serve an image in WebP (a new image format developed by Google) or whether it's faster to serve it in a different format.

These optimisations did make some improvement to performance, but their site was still slow.

Respect The TTFB

In web performance, there are a few different things which can affect the response times - I've crudely summarised them into the following three categories:

Connection & Request Time - Before a request can be sent off for a website to load something, a few things need to happen: DNS queries, a TCP handshake to establish the connection with the web server and a TLS handshake to establish a secure connection
Page Render - A dynamic site needs to query databases, call APIs, write logs, render views, etc before a response can be made to a client
Response Speed - Downloading the response from web server, browser-side rendering of the HTML and pulling the other resources linked in the HTML

The e-commerce site had taken steps to improve their Response Speed by enabling HTTP/2 and performing other on-site optimisations. They had also optimised their Connection & Response Time by using a CDN service like Cloudflare to provide fast DNS and reduce latency when optimising TLS/TCP connections.

However, they now realised the critical step they needed to optimise was around the Page Render that would happen on their web server.

By looking at a Waterfall View of how their site loaded (similar to the one below) they could see the main constraint.

The Curious Case of Caching CSRF Tokens Example Waterfall view from WebSiteOptimization.com

On the initial request, you can see the green "Time to First Byte" view taking a very long time.

Many browsers have tools for viewing Waterfall Charts like the one above, Google provide some excellent documentation for Chrome on doing this: Get Started with Analyzing Network Performance in Chrome DevTools. You can also generate these graphs fairly easily from site speed test tools like WebPageTest.org.

Time to First Byte itself is an often misunderstood metric and often can't be attributed to a single fault. For example; using a CDN service like Cloudflare may increase TTFB by a few milliseconds, but do so to the benefit of an overall load time. This can be as the CDN is adding additional compression functionality to speed up the response, or simply as it has to establish a connection back to the origin web server (which isn't visible by the client).

There are instances where it is important to debug why TTFB is a problem. For example; in this instance, the e-commerce platform was taking upwards of 3 seconds just to generate the HTML response. In this case, it was clear the constraint was the server-side Page Render.

When the web server was generating dynamic content, it was having to query databases and perform logic before a request could be served. In most instances (i.e. a product page) the page would be identical to every other request. It would only be when someone would add something to their shopping cart that the site would really become dynamic.

Enabling Cookie-Based Caching

Before someone logs into the the Magento admin panel or adds something to their shopping cart, the page view is anonymous and will be served up identically to every visitor. It will only be the when an anonymous visitor logs in or adds something to their shopping cart that they will see a page that's dynamic and unlike every other page that's been rendered.

It therefore is possible to cache those anonymous requests so that Magento on an origin server doesn't need to constantly regenerate the HTML.

Cloudflare users on our Business Plan are able to cache anonymous page views when using Magneto using our Bypass Cache on Cookie functionality. This allows for static HTML to be cached at our edge, with no need for it to be regenerated from request to request.

This provides a huge performance boost for the first few page visits of a visitor, and allows them still to interact with the dynamic site when they need to. Additionally, it helps keep load down on the origin server in the event of traffic spikes, sparing precious server CPU time for those who need it to complete dynamic actions like paying for an order.

Here's an example of how this can be configured in Cloudflare using the Page Rules functionality:

The Curious Case of Caching CSRF Tokens

The Page Rule configuration above instructs Cloudflare to "Cache Everything (including HTML), but bypass the cache when it sees a request which contains any of the cookies: external_no_cache, PHPSESSID or adminhtml. The final Edge Cache TTL setting just instructs Cloudflare to keep HTML files in cache for a month, this is necessary as Magento by default uses headers to discourage caching.

The site administrator configured their site to work something like this:

On the first request, the user is anonymous and their request indistinguishable from any other - their page can be served from the Cloudflare cache
When the customer adds something to their shopping cart, they do that via a POST request - as methods like POST, PUT and DELETE are intended to change a resource, they bypass the Cloudflare cache
On the POST request to add something to their shopping cart, Magento will set a cookie called external_no_cache
As the site owner has configured Cloudflare to bypass the cache when we see a request containing the external_no_cache cookie, all subsequent requests go direct to origin

This behaviour can be summarised in the following crude diagram:

The Curious Case of Caching CSRF Tokens

The site administrators initially enabled this configuration on a subdomain for testing purposes, but noticed something rather strange. When they would add something to the cart on their test site, the cart would show up empty. If they then tried again to add something to the cart, the item would be added successfully.

The customer reported one additional, interesting piece of information - when they tried to mimic this cookie-based caching behaviour internally using Varnish, they faced the exact same issue.

In essence, the Add to Cart functionality would fail, but only on the first request. This was indeed odd behaviour, and the customer reached out to Cloudflare Support.

Debugging

The customer wrote in just as our Singapore office were finishing up their afternoon and was initially triaged by a Support Engineer in that office.

The Support Agent evaluated what the problem was and initially identified that if the frontend cookie was missing, the Add to Cart functionality would fail.

No matter which page you access on Magento, it will attempt to set a frontend cookie, even if it doesn't add an external_no_cache cookie

When Cloudflare caches static content, the default behaviour is to strip away any cookies coming from the server if the file is going to end up in cache - this is a security safeguard to prevent customers accidentally caching private session cookies. This applies when a cached response contains a Set-Cookie header, but does not apply when the cookie is set via JavaScript - in order to allow functionality like Google Analytics to work.

They had identified that the caching logic at our network edge was working fine, but for whatever reason Magento would refuse to add something to a shopping cart without a valid frontend cookie. Why was this?

As Singapore handed their shift work over to London, the Support Engineer working on this ticket decided to escalate the ticket up to me. This was largely as, towards the end of last year, I had owned the re-pricing of this feature (which opened it up to our self-service Business plan users, instead of being Enterprise-only). That said; I had not touched Magneto in many years, even when I was working in digital agencies I wasn't the most enthusiastic to build on it.

The Support Agent provided some internal comments that described the issue in detail and their own debugging steps, with an effective "TL;DR" summary:

The Curious Case of Caching CSRF Tokens

Debugging these kinds of customer issues is not as simple as putting breakpoints into a codebase. Often, for our Support Engineers, the customers origin server acts as a black-box and there can be many moving parts, and they of course have to manage the expectations of a real customer at the other end. This level of problem solving fun, is one of the reasons I still like answering customer support tickets when I get a chance.

Before attempting to debug anything, I double checked that the Support Agent was correct that nothing had gone wrong on our end - I trusted their judgement and no others customers were reporting their caching functionality had broken, but it is always best to cross-check manual debugging work. I ran some checks to ensure that there were no regressions in our Lua codebase that controls caching logic:

Checked that there were no changes to this logic in our internal code respository
Check that automated tests are still in place and build successfully
Run checks on production to verify that caching behaviour still works as normal

As Cloudflare has customers across so many platforms, I also checked to ensure that there were no breaking changes in Magento codebase that would cause this bug to occur. Occasionally we find our customers accidentally come across bugs in CMS platforms which are unreported. This, fortunately, was not one of those instances.

The next step is to attempt to replicate the issue locally and away from the customers site. I spun up a vanilla instance of Magento 1.9 and set it up with an identical Cloudflare configuration. The experiment was successful and I was able to replicate the customer issue.

I had an instinctive feeling that it was the Cross Site Request Forgery Protection functionality that was at fault here and I started tweaking my own test Magento installation to see if this was the cases.

Cross Site Request Forgery attacks work by exploiting the fact that one site on the internet can get a client to make requests to another site.

For example; suppose you have an online bank account with the ability to send money to other accounts. Once logged in, there is a form to send money which uses the following HTML:

<form action="https://example.com/send-money">
Account Name:
<input type="text" name="account_name" />
Amount:
<input type="text" name="amount" />
<input type="submit" />
</form>

After logging in and doing your transactions, you don't log-out of the website - but you simply navigate elsewhere online. Whilst browsing around you come across a button on a website that contains the text "Click me! Why not?". You click the button, and £10,000 goes from your bank account to mine.

This happens because the button you clicked was connected to an endpoint on the banking website, and contained hidden fields instructing it to send me £10,000 of your cash:

<form action="https://example.com/send-money">
<input type="hidden" name="account_name" value="Junade Ali" />
<input type="hidden" name="amount" value="10,000" />
<input type="submit" value="Click me! Why not?" />
</form>

In order to prevent these attacks, CSRF Tokens are inserted as hidden fields into web forms:

<form action="https://example.com/send-money">
Account Name:
<input type="text" name="account_name" />
Amount:
<input type="text" name="amount" />
<input type="hidden" name="csrf_protection" value="hunter2" />
<input type="submit" />
</form>

A cookie is first set on the clients computer containing a random session cookie. When a form is served to the client, a CSRF token is generated using that cookie. The server will check that the CSRF token submitted in the HTML form actually matches the session cookie, and if it doesn't block the request.

In this instance, as there was no session cookie ever set (Cloudflare would strip it out before it entered cache), the POST request to the Add to Cart functionality could never verify the CSRF token and the request would fail.

Due to CSRF vulnerabilities, Magento applied CSRF protection to all forms; this broke Full Page Cache implementations in Magento 1.8.x/1.9.x. You can find all the details in the SUPEE-6285 patch documentation from Magento.

Caching Content with CSRF Protected Forms

To validate that CSRF Tokens were definitely at fault here, I completely disabled CSRF Protection in Magento. Obviously you should never do this in production, I found it slightly odd that they even had a UI toggle for this!

Another method which was created in the Magento Community was an extension to disable CSRF Protection just for the Add To Cart functionality (Inovarti_FixAddToCartMage18), under the argument that CSRF risks are far reduced when we're talking about Add to Cart functionality. This is still not ideal, we should ideally have CSRF Protection on every form when we're talking about actions which change site behaviour.

There is, however, a third way. I did some digging and identified a Magento plugin that effectively uses JavaScript to inject a dynamic CSRF token the moment a user clicks the Add to Cart button but just before the request is actually submitted. There's quite a lengthy Github thread which outlines this issue and references the Pull Requests which fixed this behaviour in the the Magento Turpentine plugin. I won't repeat the set-up instructions here, but they can be found in an article I've written on the Cloudflare Knowledge Base: Caching Static HTML with Magento (version 1 & 2)

Effectively what happens here is that the dynamic CSRF token is only injected into the web page the moment that it's needed. This is actually the behaviour that's implemented in other e-commerce platforms and Magento 2.0+, allowing Full Page Caching to be implemented quite easily. We had to recommend this plugin as it wouldn't be practical for the site owner to simply update to Magneto 2.

One thing to be wary of when exposing CSRF tokens via an AJAX endpoint is JSON Hijacking. There are some tips on how you can prevent this in the OWASP AJAX Security Cheat Sheet. Iain Collins has a Medium post with further discussion on the security merits of CSRF Tokens via AJAX (that said, however you're performing CSRF prevention, Same Origin Policies and HTTPOnly cookies FTW!).

There is an even cooler way you can do this using Cloudflare's Edge Workers offering. Soon this will allow you to run JavaScript at our Edge network, and you can use that to dynamically insert CSRF tokens into cached content (and, then either perform cryptographic validation of CSRF either at our Edge or the Origin itself using a shared secret).

But this has been a problem since 2015?

Another interesting observation is that the Magento patch which caused this interesting behaviour had been around since July 7, 2015. Why did our Support Team only see this issue in the run-up to Black Friday in 2017? What's more, we ultimately saw around a dozen support tickets around this exact issue on Magento 1.8/1.9 over the course over 6 weeks.

When an Enterprise customer ordinarily joins Cloudflare, there is a named Solutions Engineer who gets them up and running and ensures there is no pain; however when you sign-up online with a credit card, your forgo this privilege.

Last year, we released Bypass Cache on Cookie to self-serve users when a lot of e-commerce customers were in their Christmas/New Year release freeze and not making changes to their websites. Since then, there were no major shopping events; most the sites enabling this feature were new build websites using Magento 2 where this wasn't an issue.

In the run-up to Black Friday, performance and coping under load became a key consideration for developers working on legacy e-commerce websites - and they turned to Cloudflare. Given the large, but steady, influx of e-commerce websites joining Cloudflare - the low overall percentage of those on Magento 1.8/1.9 became noticeable.

Conclusion

Caching anonymous page views is an important, and in some cases, essential mechanism to dramatically improve site performance to substantially reduce site load, especially during traffic spikes. Whilst aggressively caching content when users are anonymous, you can bypass the cache and allow users to use the dynamic functionality your site has to offer.

When you need to insert a dynamic state into cached content, JavaScript offers a nice compromise. JavaScript allows us to cache HTML for anonymous page visits, but insert a state when the users interact in a certain way. In essence, defusing this conflict between performance and security. In the future you'll be able to run this JavaScript logic at our network edge using Cloudflare Edge Workers.

It also remains important to respect the RESTful properties of HTTP and ensure GET, OPTIONS and HEAD requests remain safe and instead using POST, PUT, PATCH and DELETE as necessary.

If you're interested in debugging interesting technical problems on a network that sees around 10% of global internet traffic, we're hiring for Support Engineers in San Francisco, London, Austin and Singapore.

↧

There’s Always Cache in the Banana Stand

December 14, 2017, 8:09 am

≫ Next: Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

≪ Previous: The Curious Case of Caching CSRF Tokens

There’s Always Cache in the Banana Stand

We’re happy to announce that we now support all HTTP Cache-Control response directives. This puts powerful control in the hands of you, the people running origin servers around the world. We believe we have the strongest support for Internet standard cache-control directives of any large scale cache on the Internet.

Documentation on Cache-Control is available here.

Cloudflare runs a Content Distribution Network (CDN) across our globally distributed network edge. Our CDN works by caching our customers’ web content at over 119 data centers around the world and serving that content to the visitors nearest to each of our network locations. In turn, our customers’ websites and applications are much faster, more
available, and more secure for their end users.

A CDN’s fundamental working principle is simple: storing stuff closer to where it’s needed means it will get to its ultimate destination faster. And, serving something from more places means it’s more reliably available.

There’s Always Cache in the Banana Stand

To use a simple banana analogy: say you want a banana. You go to your local fruit stand to pick up a bunch to feed your inner monkey. You expect the store to have bananas in stock, which would satisfy your request instantly. But, what if they’re out of stock? Or what if all of the bananas are old and stale? Then, the store might need to place an order with the banana warehouse. That order might take some time to fill, time you would spend waiting in the store for the banana delivery to arrive. But you don’t want bananas that badly; you’ll probably just walk out and figure out some other way to get your tropical fix.

Now, what if we think about the same scenario in the context of an Internet request? Instead of bananas, you are interested in the latest banana meme. You go to bananameme.com, which sits behind Cloudflare’s edge network, and you get served your meme faster!

Of course, there’s a catch. A CDN in-between your server (the “origin” of your content) and your visitor (the “eyeball” in network engineer slang) might cache content that is out-of-date or incorrect. There are two ways to manage this:

1) the origin should give the best instructions it can on when to treat content as stale.

2) the origin can tell the edge when it has made a change to content that makes content stale.

Cache-Control headers allow servers and administrators to give explicit instructions to the edge on how to handle content.

Challenges of Storing Ephemeral Content (or: No Stale Bananas)

When using an edge cache like Cloudflare in-between your origin and visitors, the origin server no longer has direct control over the cached assets being served. Internet standards allow for the origin to emit Cache-Control headers with each response it serves. These headers give intermediate and browser caches fine-grained instruction over how content should be cached.

The current RFC covering these directives (and HTTP caching in general) is RFC 7234. It’s worth a skim if you’re into this kind of stuff. The relevant section on Response Cache-Control is laid out in section 5.2.2 of that document. In addition, some interesting extensions to the core directives were defined in RFC 5861, covering how caches should behave when origins are unreachable or in the process of being revalidated against.

To put this in terms of bananas:

George Michael sells bananas at a small stand. He receives a shipment of bananas for resale from Anthony’s Banana Company (ABC) on Monday. Anthony’s Banana Company serves as the origin for bananas for stores spread across the country. ABC is keenly interested in protecting their brand; they want people to associate them with only the freshest, perfectly ripe bananas with no stale or spoiled fruit to their name.

To ensure freshness, ABC provides explicit instructions to its vendors and eaters of its bananas. Bananas can’t be held longer than 3 days before sale to prevent overripening/staleness. Past 3 days, if a customer tries to buy a banana, George Michael must call ABC to revalidate that the bananas are fresh. If ABC can’t be reached, the bananas must not be sold.

To put this in terms of banana meme SVGs:

Kari uses Cloudflare to cache banana meme SVGs at edge locations around the world to reduce visitor latency. Banana memes should only be cached for up to 3 days to prevent the memes from going stale. Past 3 days, if a visitor requests https://bananameme.com/, Cloudflare must make a revalidation request to the bananameme.com origin. If the request to origin fails, Cloudflare must serve the visitor an error page instead of their zesty meme.

If only ABC and Kari had strong support for Cache-Control response headers!

If they did, they could serve their banana related assets with the following header:

Cache-Control: public, max-age=259200, proxy-revalidate

Public means this banana is allowed to be served from an edge cache. Max-age=259200 means it can stay in cache for up to 3 days (3 days * 24 hours * 60 minutes * 60 seconds = 259200). Proxy-revalidate means the edge cache must revalidate the content with the origin when that expiration time is up, no exceptions.

For a full list of supported directives and a lot more examples (but no more bananas), check out the documentation in our Help Center.

↧

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

December 14, 2017, 11:41 am

≫ Next: The Athenian Project: Helping Protect Elections

≪ Previous: There’s Always Cache in the Banana Stand

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

This is a guest post by Elie Bursztein who writes about security and anti-abuse research. It was first published on his blog and has been lightly edited.

This post provides a retrospective analysis of Mirai — the infamous Internet-of-Things botnet that took down major websites via massive distributed denial-of-service using hundreds of thousands of compromised Internet-Of-Things devices. This research was conducted by a team of researchers from Cloudflare (Jaime Cochran, Nick Sullivan), Georgia Tech, Google, Akamai, the University of Illinois, the University of Michigan, and Merit Network and resulted in a paper published at USENIX Security 2017.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

At its peak in September 2016, Mirai temporarily crippled several high-profile services such as OVH, Dyn, and Krebs on Security via massive distributed Denial of service attacks (DDoS). OVH reported that these attacks exceeded 1 Tbps—the largest on public record.

What’s remarkable about these record-breaking attacks is they were carried out via small, innocuous Internet-of-Things (IoT) devices like home routers, air-quality monitors, and personal surveillance cameras. At its peak, Mirai infected over 600,000 vulnerable IoT devices, according to our measurements.

This blog post follows the timeline above

Mirai Genesis: Discusses Mirai’s early days and provides a brief technical overview of how Mirai works and propagates.
Krebs on Security attack: Recounts how Mirai briefly silenced Brian Krebs website.
OVH DDoS attack: Examines the Mirai author’s attempt to take down one of the world’s largest hosting providers.
The rise of copycats: Covers the Mirai code release and how multiple hacking groups end-up reusing the code. This section also describes the techniques we used to track down the many variants of Mirai that arose after the release. Finally, this section discusses the targets and the motive behind each major variants.
Mirai's takedown of the Internet: Tells the insider story behind Dyn attacks including the fact that the major sites (e.g., Amazon) taken down were just massive collateral damage.
Mirai’s attempted takedown of an entire country: Looks at the multiple attacks carried out against Lonestar, Liberia’s largest operator.
Deutsche Telekom goes dark: Discusses how the addition of a router exploit to one of the Mirai variant brought a major German Internet provider to its knees.
Mirai original author outed?: Details Brian Krebs’ in-depth investigation into uncovering Mirai’s author.
Deutsche Telekom attacker arrested: Recounts the arrest of the hacker who took down Deutsche Telekom and what we learned from his trial.

Mirai Genesis

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

The first public report of Mirai late August 2016 generated little notice, and Mirai mostly remained in the shadows until mid-September. At that time, It was propelled in the spotlight when it was used to carry massive DDoS attacks against Krebs on Security the blog of a famous security journalist and OVH, one of the largest web hosting provider in the world.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

While the world did not learn about Mirai until at the end of August, our telemetry reveals that it became active August 1st when the infection started out from a single bulletproof hosting IP. From thereon, Mirai spread quickly, doubling its size every 76 minutes in those early hours.

By the end of its first day, Mirai had infected over 65,000 IoT devices. By its second day, Mirai already accounted for half of all Internet telnet scans observed by our collective set of honeypots, as shown in the figure above. At its peak in November 2016 Mirai had infected over 600,000 IoT devices.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Retroactively looking at the infected device services banners using Censys' Internet-wide scanning reveals that most of the devices appear to be routers and cameras as reported in the chart above. Each type of banner is represented separately as the identification process was different for each so it might be that a device is counted multiple times. Mirai was actively removing any banner identification which partially explains why we were unable to identify most of the devices.

Before delving further into Mirai’s story, let’s briefly look at how Mirai works, specifically how it propagates and its offensive capabilities.

How Mirai works

At its core, Mirai is a self-propagating worm, that is, it’s a malicious program that replicates itself by finding, attacking and infecting vulnerable IoT devices. It is also considered a botnet because the infected devices are controlled via a central set of command and control (C&C) servers. These servers tell the infected devices which sites to attack next. Overall, Mirai is made of two key components: a replication module and an attack module.

Replication module

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

The replication module is responsible for growing the botnet size by enslaving as many vulnerable IoT devices as possible. It accomplishes this by (randomly) scanning the entire Internet for viable targets and attacking. Once it compromises a vulnerable device, the module reports it to the C&C servers so it can be infected with the latest Mirai payload, as the diagram above illustrates.

To compromise devices, the initial version of Mirai relied exclusively on a fixed set of 64 well-known default login/password combinations commonly used by IoT devices. While this attack was very low tech, it proved extremely effective and led to the compromise of over 600,000 devices. For more information about DDoS techniques, read this Cloudflare primer.

Attack module

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

The attack module is responsible for carrying out DDoS attacks against the targets specified by the C&C servers. This module implements most of the code DDoS techniques such as HTTP flooding, UDP flooding, and all TCP flooding options. This wide range of methods allowed Mirai to perform volumetric attacks, application-layer attacks, and TCP state-exhaustion attacks.

Krebs on Security attack

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Krebs on Security is Brian Krebs’ blog. Krebs is a widely known independent journalist who specializes in cyber-crime. Given Brian’s line of work, his blog has been targeted, unsurprisingly, by many DDoS attacks launched by the cyber-criminals he exposes. According to his telemetry (thanks for sharing, Brian!), his blog suffered 269 DDOS attacks between July 2012 and September 2016. As seen in the chart above, the Mirai assault was by far the largest, topping out at 623 Gbps.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Looking at the geolocation of the IPs that targeted Brian’s site reveals that a disproportionate number of the devices involved in the attack are coming from South American and South-east Asia. As reported in the chart above Brazil, Vietnam and Columbia appear to be the main sources of compromised devices.

One dire consequence of this massive attack against Krebs was that Akamai, the CDN service that provided Brian’s DDoS protection, had to withdraw its support. This forced Brian to move his site to Project Shield. As he discussed in depth in a blog post, this incident highlights how DDoS attacks have become a common and cheap way to censor people.

OVH attack

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Brian was not Mirai’s first high-profile victim. A few days before he was struck, Mirai attacked OVH, one of the largest European hosting providers. According to their official numbers, OVH hosts roughly 18 million applications for over one million clients, Wikileaks being one of their most famous and controversial.

We know little about that attack as OVH did not participate in our joint study. As a result, the best information about it comes from a blog post OVH released after the event. From this post, it seems that the attack lasted about a week and involved large, intermittent bursts of DDoS traffic that targeted one undisclosed OVH customer.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Octave Klaba, OVH’s founder, reported on Twitter that the attacks were targeting Minecraft servers. As we will see through this post, Mirai has been extensively used in gamer wars and is likely the reason why it was created in the first place.

According to OVH telemetry, the attack peaked at 1TBs and was carried out using 145,000 IoT devices. While the number of IoT devices is consistent with what we observed, the volume of the attack reported is significantly higher than what we observed with other attacks. For example, as mentioned earlier, Brian’s one topped out at 623 Gbps.

Regardless of the exact size, the Mirai attacks are clearly the largest ever recorded. They dwarf the previous public record holder, an attack against Cloudflare that topped out at ~400Gpbs.

The rise of copycats

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

In an unexpected development, on September 30, 2017, Anna-senpai, Mirai’s alleged author, released the Mirai source code via an infamous hacking forum. He also wrote a forum post, shown in the screenshot above, announcing his retirement.

This code release sparked a proliferation of copycat hackers who started to run their own Mirai botnets. From that point forward, the Mirai attacks were not tied to a single actor or infrastructure but to multiple groups, which made attributing the attacks and discerning the motive behind them significantly harder.

Clustering Mirai infrastructure

To keep up with the Mirai variants proliferation and track the various hacking groups behind them, we turned to infrastructure clustering. Reverse engineering all the Mirai versions we can find allowed us to extract the IP addresses and domains used as C&C by the various hacking groups than ran their own Mirai variant. In total, we recovered two IP addresses and 66 distinct domains.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Applying DNS expansion on the extracted domains and clustering them led us to identify 33 independent C&C clusters that had no shared infrastructure. The smallest of these clusters used a single IP as C&C. The largest sported 112 domains and 92 IP address. The figure above depicts the six largest clusters we found.

These top clusters used very different naming schemes for their domain names: for example, “cluster 23” favors domains related to animals such as 33kitensspecial.pw, while “cluster 1” has many domains related to e-currencies such as walletzone.ru. The existence of many distinct infrastructures with different characteristics confirms that multiple groups ran Mirai independently after the source code was leaked.

Clusters over time

Looking at how many DNS lookups were made to their respective C&C infrastructures allowed us to reconstruct the timeline of each individual cluster and estimate its relative size. This accounting is possible because each bot must regularly perform a DNS lookup to know which IP address its C&C domains resolves to.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

The chart above reports the number of DNS lookups over time for some of the largest clusters. It highlights the fact that many were active at the same time. Having multiple variants active simultaneously once again emphasizes that multiple actors with different motives were competing to infect vulnerable IoT devices to carry out their DDoS attacks.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Plotting all the variants in the graph clearly shows that the ranges of IoT devices infect by each variant differ widely. As the graph above reveals, while there were many Mirai variants, very few succeeded at growing a botnet large enough to take down major websites.

From cluster to motive

Notable clusters

Cluster	Notes
6	Attacked Dyn and gaming related targets
1	Original botnet. Attacked Krebs and OVH
2	Attacked Lonestar Cell

Looking at which sites were targeted by the largest clusters illuminates the specific motives behind those variants. For instance, as reported in the table above, the original Mirai botnet (cluster 1) targeted OVH and Krebs, whereas Mirai’s largest instance (cluster 6) targeted DYN and other gaming-related sites. Mirai’s third largest variant (cluster 2), in contrast, went after African telecom operators, as recounted later in this post.

Target	Attacks	Clusters	Notes
Lonestar Cell	616	2	Liberian telecom targeted by 102 reflection attacks
Sky Network	318	15, 26, 6	Brazilian Minecraft servers hosted in Psychz Networks data centers
104.85.165.1	192	1, 2, 6, 8, 11, 15 ...	Unknown router in Akamai’s network
feseli.com	157	7	Russian cooking blog
Minomortaruolo.it	157	7	Italian politician site
Voxility hosted C2	106	1, 2, 6, 7, 15 ...	Known decoy target
Tuidang websites	100	--	HTTP attacks on two Chinese political dissidence sites
execrypt.com	96	-0-	Binary obfuscation service
Auktionshilfe.info	85	2, 13	Russian auction site
houtai.longqikeji.com	85	25	SYN attacks on a former game commerce site
Runescape	73	—	World 26th of a popular online game
184.84.240.54	72	1, 10, 11, 15 ...	Unknown target hosted at Akamai
antiddos.solutions	71	—	AntiDDoS service offered at react.su.

Looking at the most attacked services across all Mirai variants reveals the following:

Booter services monetized Mirai: The wide diversity of targets shows that booter services ran at least some of the largest clusters. A booter service is a service provided by cyber criminals that offers on-demand DDoS attack capabilities to paying customers.
There are fewer actors than clusters: Some clusters have strong overlapping targets, which tends to indicate that they were run by the same actors. For example, clusters 15, 26, and 6 were used to target specific Minecraft servers.

Mirai’s takedown of the Internet

On October 21, a Mirai attack targeted the popular DNS provider DYN. This event prevented Internet users from accessing many popular websites, including AirBnB, Amazon, Github, HBO, Netflix, Paypal, Reddit, and Twitter, by disturbing the DYN name-resolution service.

We believe this attack was not meant to “take down the Internet,” as it was painted by the press, but rather was linked to a larger set of attacks against gaming platforms.

We reached this conclusion by looking at the other targets of the DYN variant (cluster 6). They are all gaming related. Additionally, this is also consistent with the OVH attack as it was also targeted because it hosted specific game servers as discussed earlier. As sad as it seems, all the prominent sites affected by the DYN attack were apparently just the spectacular collateral damage of a war between gamers.

Mirai’s attempted takedown of an entire country's network? October 31

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Lonestar Cell, one of the largest Liberian telecom operators started to be targeted by Mirai on October 31. Over the next few months, it suffered 616 attacks, the most of any Mirai victim.

The fact that the Mirai cluster responsible for these attack has no common infrastructure with the original Mirai or the DYN variant indicate that they were orchestrated by a totally different actor than the original author.

A few weeks after our study was published, this assessment was confirmed when the author of one of the most aggressive Mirai variant confessed during his trial that he was paid to takedown Lonestar. He acknowledged that an unnamed Liberia’s ISP paid him $10,000 to take out its competitors. This validated that our clustering approach is able to accurately track and attribute Mirai’s attacks.

Deutsche Telekom going dark

On November 26, 2016, one of the largest German Internet provider Deutsche Telekom suffered a massive outage after 900,000 of its routers were compromised.

Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

Ironically, this outage was not due to yet another Mirai DDoS attack but instead due to a particularly innovative and buggy version of Mirai that knocked these devices offline while attempting to compromise them. This variant also affected thousands of TalkTalk routers.

What allowed this variant to infect so many routers was the addition to its replication module of a router exploit targeting at the CPE WAN Management Protocol (CWMP). The CWMP protocol is an HTTP-based protocol used by many Internet providers to auto-configure and remotely manage home routers, modems, and other customer-on-premises (CPE) equipment.

Beside its scale, this incident is significant because it demonstrates how the weaponization of more complex IoT vulnerabilities by hackers can lead to very potent botnets. We hope the Deutsche Telekom event acts as a wake-up call and push toward making IoT auto-update mandatory. This is much needed to curb the significant risk posed by vulnerable IoT device given the poor track record of Internet users manually patching their IoT devices.

Mirai original author outed?

In the months following his website being taken offline, Brian Krebs devoted hundreds of hours to investigating Anna-Senpai, the infamous Mirai author. In early January 2017, Brian announced that he believes Anna-senpai to be Paras Jha, a Rutgers student who apparently has been involved in previous game-hacking related schemes. Brian also identified Josia White as a person of interest. After being outed, Paras Jha and Josia White and another individual were questioned by authorities and plead guilty in federal court to a variety of charges, some including their activity related to Mirai.

Deutsche Telekom attacker arrested

In November 2016, Daniel Kaye (aka BestBuy) the author of the Mirai botnet variant that brought down Deutsche Telekom was arrested at the Luton airport. Prior to Mirai, a 29-year-old British citizen was infamous for selling his hacking services on various dark web markets.

In July 2017 a few months after being extradited to Germany Daniel Kaye plead guilty and was sentenced to a one year and a half imprisonment with suspension. During the trial, Daniel admitted that he never intended for the routers to cease functioning. He only wanted to silently control them so he can use them as part of a DDoS botnet to increase his botnet firepower. As discussed earlier he also confessed being paid by competitors to takedown Lonestar.

In Aug 2017 Daniel was extradited back to the UK to face extortion charges after attempting to blackmail Lloyds and Barclays banks. According to press reports, he asked the Lloyds to pay about £75,000 in bitcoins for the attack to be called off.

Takeways

The prevalence of insecure IoT devices on the Internet makes it very likely that, for the foreseeable future, they will be the main source of DDoS attacks.

Mirai and subsequent IoT botnets can be averted if IoT vendors start to follow basic security best practices. In particular, we recommend that the following should be required of all IoT device makers:

Eliminate default credentials: This will prevent hackers from constructing a credential master list that allows them to compromise a myriad of devices as MIRAI did.
Make auto-patching mandatory: IoT devices are meant to be “set and forget,” which makes manual patching unlikely. Having them auto-patch is the only reasonable option to ensure that no widespread vulnerability like the Deutsche Telekom one can be exploited to take down a large chunk of the Internet.
Implement rate limiting: Enforcing login rate limiting to prevent brute-force attack is a good way to mitigate the tendency of people to use weak passwords. Another alternative would be using a captcha or a proof or work.

Thank you for reading this post until the end!

↧

The Athenian Project: Helping Protect Elections

December 15, 2017, 6:00 am

≫ Next: Highlights from Cloudflare's Weekend at YHack

≪ Previous: Inside the infamous Mirai IoT Botnet: A Retrospective Analysis

The Athenian Project: Helping Protect Elections

From cyberattacks on election infrastructure, to attempted hacking of voting machines, to attacks on campaign websites, the last few years have brought us unprecedented attempts to use online vulnerabilities to affect elections both in the United States and abroad. In the United States, the Department of Homeland Security reported that individuals tried to hack voter registration files or public election sites in 21 states prior to the 2016 elections. In Europe, hackers targeted not only the campaign of Emmanuel Macron in France, but government election infrastructure in the Czech Republic and Montenegro.

Cyber attack is only one of the many online challenges facing election officials. Unpredictable website traffic patterns are another. Voter registration websites see a flood of legitimate traffic as registration deadlines approach. Election websites must integrate reported results and stay online notwithstanding notoriously hard-to-model election day loads.

We at Cloudflare have seen many election-related cyber challenges firsthand. In the 2016 U.S. presidential campaign, Cloudflare protected most of the major presidential campaign websites from cyberattack, including the Trump/Pence campaign website, the website for the campaign of Senator Bernie Sanders, and websites for 14 of the 15 leading candidates from the two major parties. We have also protected election websites in countries like Peru and Ecuador.

Although election officials have worked hard to address the security and reliability of election websites, as well as other election infrastructure, budget constraints can limit the ability of governments to access the technology and resources needed to defend against attacks and maintain an online presence. Election officials trying to secure election infrastructure should not have to face a Hobson’s choice of deciding what infrastructure to protect with limited available resources.

The Athenian Project

Since 2014, Cloudflare has protected at-risk public interest websites that might be subject to cyberattack for free through Project Galileo. As part of Project Galileo, we have supported a variety of non-governmental election efforts helping to ensure that individuals have an opportunity to participate in their democracies. This support included protection of Electionland, a project to track and cover voting problems during the 2016 election across the country and in real-time.

When Project Galileo began, we did not anticipate that government websites in the United States might be similarly vulnerable because of resourcing concerns. The past few years have taught us otherwise. We at Cloudflare believe that the integrity of elections should not depend on whether state and local governments have sufficient resources to protect digital infrastructure from cyber attack and keep it online.

The common mission of those working on elections is to preserve citizen confidence in the democratic process and enhance voter participation in elections¹. To protect voters’ voices, election websites and infrastructure must be stable and secure. Prior to an election, websites provide critical information to the public such as registration requirements, voting locations and sample ballots. After an election, websites provide election results to citizens.

The institutions in which we place our trust must have the tools to protect themselves. Voter registration websites must stay online before a registration deadline, making it possible for voters who want to register to do so. Election websites should be available on election day notwithstanding increased traffic. Voters should have confidence that officials are doing everything they can to safeguard the integrity of election and voter data, and that election results will be available online.

That is why today, we are launching the Athenian Project, which builds on our work in Project Galileo. The Athenian Project is designed to protect state and local government websites tied to elections and voter data from cyberattack, and keep them online.

U.S. state and local governments can participate in the Athenian Project if their websites meet the following criteria:

The website is managed and owned by a state, county, or municipal government; and
The website is related to

The administration of elections, including the provision of information related to voting and polling places; or

Voter data, including voter registration or verification; or

The reporting of election results.

For websites that meet these criteria, Cloudflare will extend its highest level of protection for free.

We recognize that different government actors may have different challenges. We therefore intend to work directly with relevant state and municipal officials to address each site’s needs.

Protecting our Elections

In the last few months, we have been talking to a number of government officials about how we can help protect their elections. Today, we are proud to report that we helped the State of Alabama protect its website during its special general election for the U.S. Senate on Tuesday.

“In this year’s historic Senate Special election, it was crucial that our website be able to handle spikes in traffic and remain online in the event of attack,” said Jim Purcell, Acting Secretary of Information Technology for the State of Alabama. “It is very important to our state government and democracy as a whole that voters and the public be able to access registrar, election information, and election results. Cloudflare proved to be an excellent partner, helping us achieve this goal.”

By allowing voters to exercise their rights to register to vote, speak, and access information, the Internet can and should play a helpful role in democracy. Democracies depend on voters’ voices being enabled, not silenced. Helping to provide state and local governments the tools they need to keep websites online and secure from attack as they hold and report on elections restores the Internet’s promise and serves Cloudflare’s mission of helping to build a better Internet.

State of New York Board of Elections mission statement. ↩

↧

Highlights from Cloudflare's Weekend at YHack

December 16, 2017, 9:00 am

≫ Next: 2018 and the Internet: our predictions

≪ Previous: The Athenian Project: Helping Protect Elections

Highlights from Cloudflare's Weekend at YHack

Along with four other Cloudflare colleagues, I traveled to New Haven, CT last weekend to support 1,000+ college students from Yale and beyond at YHack hackathon.

Throughout the weekend-long event, student attendees were supported by mentors, workshops, entertaining performances, tons of food and caffeine breaks, and a lot of air-mattresses and sleeping bags. Their purpose was to create projects to solve real world problems and to learn and have fun in the process.

How Cloudflare contributed

Cloudflare sponsored YHack. Our team of five wanted to support, educate, and positively impact the experience and learning of these college students. Here are some ways we engaged with students.

1. Mentoring

Our team of five mentors from three different teams and two different Cloudflare locations (San Francisco and Austin) was available at the Cloudflare table or via Slack for almost every hour of the event. There were a few hours in the early morning when all of us were asleep, I'm sure, but we were available to help otherwise.

2. Providing challenges

Cloudflare submitted two challenges to the student attendees, encouraging them to protect and improve the performance of their projects and/or create an opportunity for exposure to 6 million+ potential users of their apps.

Challenge 1: Put Cloudflare in front of your hackathon project

Challenge 2. Make a Cloudflare App

Prizes were awarded to all teams which completed these challenges. Cloudflare awarded a total of ten teams for completing the challenges. The team which was judged to have created the best Cloudflare app won free admission to Cloudflare's 2018 Internet Summit as well as several swag items and introductions to Cloudflare teams offering internships next summer.

3. Distributing swag & cookies

Over 3,000 swag items (t-shirts, laptop stickers, and laptop camera covers) were distributed to almost every attendee. When the Cloudflare team noticed student attendees were going without snacks after dinner, I made a trip to a local grocery store and bought hundreds of cookies to distribute as well. After about an hour, none remained.

4. Sponsoring

As with most any hackathon, costs for food, A/V, venue rental, and other expenses are considerable. Cloudflare decided to financially support YHack to help offset some of these costs and be sure student participants were able to attend, learn, and build meaningful projects without having to pay admission fees.

Results of the hackathon

In groups of up to four, students teamed up to create projects. There was no set project theme, so students could focus on subject areas they were most passionate about. Judging criteria were "practical and useful", "polish", "technical difficulty", and "creativity" and several sponsors, such as Intuit, Cloudflare, JetBlue, and Yale's Poynter Fellowship in Journalism submitted suggested challenges for teams to work on.

I saw a lot of cool projects that could help college students save money on food or other expenses, projects that could help identify fake news sources, and apps that would work great on Cloudflare Apps.

There were dozens of great completed projects by the end of the Hackathon. I'd like to highlight a few great ones which used Cloudflare best.

The Winner: TL;DR

Akrama Mirza, a second year student at University of Waterloo, created an app for Cloudflare Apps which allows a website owner to automatically generate a TL;DR summary of the content on their pages and place it anywhere on their site. This app could be used to give a TL;DR summary of a blog post, article, report, etc.

Here you can see how TL;DR would display on TechCrunch's site.
The TL;DR app won the Cloudflare challenge for "make a Cloudflare App." Akrama has been invited to attend Cloudflare's 2018 Internet Summit in San Francisco, as a result. I've also introduced him to some internal teams, so he may explore internship opportunities with Cloudflare next summer.

Read more about TL;DR on Akrama's Devpost page.

Other great projects which used Cloudflare

Of the completed projects which put Cloudflare in front of their projects, there were three I most wanted to feature.

K2

"Bringing Wall Street to Main Street through an accessible trading and back-testing platform."

K2 is a comprehensive backtesting platform for currency data, specializing in cryptocurrency and offering users the opportunity to create trading algorithms and simulate them in real-time.

The K2 team sought to equalize the playing field by enabling the general public to develop and test trading strategies. Users may specify a trading interval, time frame, and currency symbol, and the K2 backend will visualize the cumulative returns and generate financial metrics.

Read more about K2 on the team's Devpost page or website.

Money Moves

Money Moves analyzes data about financial advisors and their attributes and uses machine's deep learning unsupervised algorithms to predict if certain financial advisors will most likely be beneficial or detrimental to an investor's financial standing. Ultimately, Money Moves will help investors select the best possible financial advisor in their region.

The Money Moves team liked that they could easily use Cloudflare to protect and improve the performance of their site. One of the team members, Muyiwa Olaniyan, already used Cloudflare to run his personal server at home, so the team decided to use Cloudflare, from the start.

Read more about Money Moves on the team's Devpost page or website.

Mad Invest

"Smart Bitcoin Investing Chatbot"

When the Facebook messenger chatbot is asked when it is best to invest in Bitcoin, current market trends are observed and a Mad Invest's model predicts how likely it is for the price to go up or down. A conclusion as to when it would be a good time to invest or sell is drawn and delivered by message to users.

Moving forward, the team intends to improve their model's predictions by developing a way to analyze the Chinese market, which represents 70% of Bitcoin traffic.

When telling me about their experience using Cloudflare and how it saved them time, the team delivered my favorite quote from the whole weekend. “We spent half an hour setting up Let's Encrypt for the SSL and we realized we could just put Cloudflare in front of it.” Exactly.

Read more about Mad Invest on the team's Devpost page.

Final thoughts:

I was honored to be part of a team that supported so many awesome students in the development of their projects at YHack. I was pleased to hear many attendees had already heard of or used Cloudflare before and many teams interacted with Cloudflare to protect and improve the performance of their projects and develop new apps. I look forward to being involved many more events in 2018 as well.

↧

2018 and the Internet: our predictions

December 21, 2017, 6:01 am

≫ Next: Technical reading from the Cloudflare blog for the holidays

≪ Previous: Highlights from Cloudflare's Weekend at YHack

At the end of 2016, I wrote a blog post with seven predictions for 2017. Let’s start by reviewing how I did.

Didn’t he do well Public Domain image by Michael Sharpe

I’ll score myself with two points for being correct, one point for mostly right and zero for wrong. That’ll give me a maximum possible score of fourteen. Here goes...

2017-1: 1Tbps DDoS attacks will become the baseline for ‘massive attacks’

This turned out to be true but mostly because massive attacks went away as Layer 3 and Layer 4 DDoS mitigation services got good at filtering out high bandwidth and high packet rates. Over the year we saw many DDoS attacks in the 100s of Gbps (up to 0.5Tbps) and then in September announced Unmetered Mitigation. Almost immediately we saw attackers stop bothering to attack Cloudflare-protected sites with large DDoS.

So, I’ll be generous and give myself one point.

2017-2: The Internet will get faster yet again as protocols like QUIC become more prevalent

Well, yes and no. QUIC has become more prevalent as Google has widely deployed it in the Chrome browser and it accounts for about 7% of Internet traffic. At the same time the protocol is working its way through the IETF standardization process and has yet to be deployed widely outside Google.

So, I’ll award myself one point for this as QUIC did progress but didn’t get as far as I thought.

2017-3: IPv6 will become the defacto for mobile networks and IPv4-only fixed networks will be looked upon as old fashioned

IPv6 continued to grow throughout 2017 and seems to be on a pretty steady trajectory upwards. Although it’s not yet deployed on ¼ of the top 25,000 web sites. Note that the large jump in IPv6 support that occurred in the middle of 2016 when Cloudflare enabled it by default for all our customers.

The Internet Society reported that mobile networks that switch to IPv6 see 70-95% of their traffic use IPv6. Google reports that traffic from Verizon is now 90% IPv6 and T-Mobile is turning off IPv4 completely.

Here I’ll award myself two points.

2017-4: A SHA-1 collision will be announced

That happened on 23 February 2017 with the announcement of an efficient way to generate colliding PDF documents. It’s so efficient that here are two PDFs containing the old and new Cloudflare logos. I generated these two PDFs using a web site that takes two JPEGs, embeds them in two PDFs and makes them collide. It does this instantly.

They have the same SHA-1 hash:

$ shasum *.pdf
e1964edb8bcafc43de6d1d99240e80dfc710fbe1  a.pdf
e1964edb8bcafc43de6d1d99240e80dfc710fbe1  b.pdf

But different SHA-256 hash:

$ shasum -a256 *.pdf
8e984df6f4a63cee798f9f6bab938308ebad8adf67daba349ec856aad07b6406  a.pdf
f20f44527f039371f0aa51bc9f68789262416c5f2f9cefc6ff0451de8378f909  b.pdf

So, two points for getting that right (and thanks, Nick Sullivan, for suggesting it and making me look smart).

2017-5: Layer 7 attacks will rise but Layer 6 won’t be far behind

The one constant of 2017 in terms of DDoS was the prevalence of Layer 7 attacks. Even as attackers decided that large scale Layer 3 and 4 DDoS attacks were being mitigated easily and hence stopped performing them so frequently, Layer 7 attacks continued apace with attacks in the 100s of krps common place.

Awarding myself one point because Layer 6 attacks didn’t materialize as much as predicted.

2017-6: Mobile traffic will account for 60% of all Internet traffic by the end of the year

Ericsson reported mid-year that mobile data traffic was continuing to grow strongly and grew 70% between Q116 and Q117. Stats show that while mobile traffic continued to increase its share of Internet traffic and passed 50% in 2017 it didn’t reach 60%.

Zero points for me.

2017-7: The security of DNS will be taken seriously

This has definitely happened. The 2016 Dyn DNS attack was a wake up call that often overlooked infrastructure was at risk of DDoS attack. In April 2017 Wired reported that hackers took over 36 Brazilian banking web sites by hijacking DNS registration, and in June Mozilla and ICANN proposed encrypting DNS by sending it over HTTPS and the IETF has a working group on what’s now being called doh.

DNSSEC deployment continued with SecSpider showing steady, continuous growth during 2017.

So, two points for me.

Overall, I scored myself a total of 9 out of 14, or 64% right. With that success rate in mind here are my predictions for 2018.

2018 Predictions

2018-1: By the end of 2018 more than 50% of HTTPS connections will happen over TLS 1.3

The roll out of TLS 1.3 has been stalled because of difficulty in getting it working correctly in the heterogenous Internet environment. Although Cloudflare has had TLS 1.3 in production and available for all customers for over a year only 0.2% of our traffic is currently using that version.

Given the state of standardization of TLS 1.3 today we believe that major browser vendors will enable TLS 1.3 during 2018 and by the end of the year more than 50% of HTTPS connections will be using the latest, most secure version of TLS.

2018-2: Vendor lock-in with Cloud Computing vendors becomes dominant worry for enterprises

In Mary Meeker’s 2017 Internet Trends report she gives on statistics (slide 183) on the top three concerns of users of cloud computing. These show a striking change from being primarily about security and cost to worries about vendor lock-in and compliance. Cloudflare believes that vendor lock-in will become the top concern of users of cloud computing in 2018 and that multi-cloud strategies will become common.

BillForward is already taking a multi-cloud approach with Cloudflare moving traffic dynamically between cloud computing providers. Alongside vendor lock-in, users will name data portability between clouds as a top concern.

2018-3: Deep learning hype will subside as self-driving cars don't become a reality but AI/ML salaries will remain high

Self-driving cars won’t become available in 2018, but AI/ML will remain red hot as every technology company tries to hire appropriate engineering staff and finds they can’t. At the same time deep learning techniques will be widely applied across companies and industries as it becomes clear that these techniques are not limited to game playing, classification, or translation tasks and can be widely applied.

Expect unexpected applications of techniques, that are already in use in Silicon Valley, when they are applied to the rest of the world. Don’t be surprised if there’s talk of AI/ML managed traffic management for highways, for example. Anywhere there's a heuristic we'll see AI/ML applied.

But it’ll take another couple of years for AI/ML to really have profound effects. By 2020 the talent pool will have greatly increased and manufacturers such as Qualcomm, nVidia and Intel will have followed Google’s lead and produced specialized chipsets designed for deep learning and other ML techniques.

2018-4: k8s becomes the dominant platform for cloud computing

A corollary to users’ concerns about cloud vendor lock-in and the need for multi-cloud capability is that an orchestration framework will dominate. We believe that Kubernetes will be that dominant platform and that large cloud vendors will work to ensure compatibility across implementations at the demand of customers.

We are currently in the infancy of k8s deployment with the major cloud computing vendors deploying incompatible versions. We believe that customer demand for portability will cause cloud computer vendors to ensure compatibility.

2018-5: Quantum resistant crypto will be widely deployed in machine-to-machine links across the internet

During 2017 Cloudflare experimented with, and open sourced, quantum-resistant cryptography as part of our implementation of TLS 1.3. Today there is a threat to the security of Internet protocols from quantum computers, and although the threat has not been realized, cryptographers are working on cryptographic schemes that will resist attacks from quantum computers when they arrive.

We predict that quantum-resistant cryptography will become widespread in links between machines and data centers especially where the connections being encrypted cross the public Internet. We don’t predict that quantum-resistant cryptography will be widespread in browsers, however.

2018-6: Mobile traffic will account for 60% of all Internet traffic by the end of the year

Based on the continued trend upwards in mobile traffic I’m predicting that 2018 (instead of 2017) will be the year mobile traffic shoots past 60% of overall Internet traffic. Fingers crossed.

2018-7: Stable BTC/USD exchanges will emerge as others die off from security-based Darwinism

The meteoric rise in the Bitcoin/USD exchange rate has been accompanied by a drumbeat of stories about stolen Bitcoins and failing exchanges. We believe that in 2018 the entire Bitcoin ecosystem will stabilize.

This will partly be through security-based Darwinism as trust in exchanges and wallets that have security problems plummets and those that survive have developed the scale and security to cope with the explosion in Bitcoin transactions and attacks on their services.

↧

Technical reading from the Cloudflare blog for the holidays

December 22, 2017, 6:17 am

≫ Next: TLS 1.3 is going to save us all, and other reasons why IoT is still insecure

≪ Previous: 2018 and the Internet: our predictions

During 2017 Cloudflare published 172 blog posts (including this one). If you need a distraction from the holiday festivities at this time of year here are some highlights from the year.

CC BY 2.0 image by perzon seo

The WireX Botnet: How Industry Collaboration Disrupted a DDoS Attack

We worked closely with companies across the industry to track and take down the Android WireX Botnet. This blog post goes into detail about how that botnet operated, how it was distributed and how it was taken down.

Randomness 101: LavaRand in Production

The wall of Lava Lamps in the San Francisco office is used to feed entropy into random number generators across our network. This blog post explains how.

ARM Takes Wing: Qualcomm vs. Intel CPU comparison

Our network of data centers around the world all contain Intel-based servers, but we're interested in ARM-based servers because of the potential cost/power savings. This blog post took a look at the relative performance of Intel processors and Qualcomm's latest server offering.

How to Monkey Patch the Linux Kernel

One engineer wanted to combine the Dvorak and QWERTY keyboard layouts and did so by patching the Linux kernel using SystemTap. This blog explains how and why. Where there's a will, there's a way.

Introducing Cloudflare Workers: Run JavaScript Service Workers at the Edge

Traditionally, the Cloudflare network has been configurable by our users, but not programmable. In September, we introduced Cloudflare Workers which allows users to write JavaScript code that runs on our edge worldwide. This blog post explains why we chose JavaScript and how it works.

CC BY 2.0 image by Peter Werkman

Geo Key Manager: How It Works

Our Geo Key Manager gives customers granular control of the location of their private keys on the Cloudflare network. This blog post explains the mathematics that makes the possible.

SIDH in Go for quantum-resistant TLS 1.3

Quantum-resistant cryptography isn't an academic fantasy. We implemented the SIDH scheme as part of our Go implementation of TLS 1.3 and open sourced it.

The Languages Which Almost Became CSS

This blog post recounts the history of CSS and the languages that might have been CSS.

Perfect locality and three epic SystemTap scripts

In an ongoing effort to understand the performance of NGINX under heavy load on our machines (and wring out the greatest number of requests/core), we used SystemTap to experiment with different queuing models.

How we built rate limiting capable of scaling to millions of domains

We rolled out a rate limiting feature that allows our customers to control the maximum number of HTTP requests per second/minute/hour that their servers receive. This blog post explains how we made that operate efficiently at our scale.

CC BY 2.0 image by Han Cheng Yeh

Reflections on reflection (attacks)

We deal with a new DDoS attack every few minutes and in this blog post we took a close look at reflection attacks and revealed statistics on the types of reflection-based DDoS attacks we see.

On the dangers of Intel's frequency scaling

Intel processors contain special AVX-512 that provide 512-bit wide SIMD instructions to speed up certain calculations. However, these instructions have a downside: when used the CPU base frequency is scaled down slowing down other instructions. This blog post explores that problem.

How Cloudflare analyzes 1M DNS queries per second

This blog post details how we handle logging information for 1M DNS queries per second using a custom pipeline, ClickHouse and Grafana (via a connector we open sourced) to build real time dashboards.

AES-CBC is going the way of the dodo

CBC-mode cipher suites have been declining for some time because of padding oracle-based attacks. In this blog we demonstrate that AES-CBC has now largely been replaced by ChaCha20-Poly1305 .

CC BY-SA 2.0 image by Christine

How we made our DNS stack 3x faster

We answer around 1 million authoritative DNS queries per second using a custom software stack. Responding to those queries as quickly as possible is why Cloudflare is fastest authoritative DNS provider on the Internet. This blog post details how we made our stack even faster.

Quantifying the Impact of "Cloudbleed"

On February 18 a serious security bug was reported to Cloudflare. Five days later we released details of the problem and six days after that we posted this analysis of the impact.

LuaJIT Hacking: Getting next() out of the NYI list

We make extensive use of LuaJIT when processing our customers' traffic and making it faster is a key goal. In the past, we've sponsored the project and everyone benefits from those contributions. This blog post examines getting one specific function JITted correctly for additional speed.

Privacy Pass: The Math

The Privacy Pass project provides a zero knowledge way of proving your identity to a service like Cloudflare. This detailed blog post explains the mathematics behind authenticating a user without knowing their identity.

How and why the leap second affected Cloudflare DNS

The year started with a bang for some engineers at Cloudflare when we ran into a bug in our custom DNS server, RRDNS, caused by the introduction of a leap second at midnight UTC on January 1, 2017. This blog explains the error and why it happened.

There's no leap second this year.

↧

Hello Warp! Getting Started

Using Warp for Load Balancing

Automating Warp with Docker

Requiring User Authentication

Enjoy!

Performing a DDoS

Botnets

IoT Devices

DNS Amplification

DDoS Mitigation: The Old Priorities

The New Landscape

A Web Developers Guide to Defeating an Application Layer DDoS Attack

Aggressive Caching

Rate Limiting

Conclusion

💸 Paid App Product Enhancements

🗣 Comments & Ratings

🚦 Mananging DNS via Apps

⚙ Other Platform Improvements

Thank you 🦃

Cloudflare Warp Ingress Controller and StackPointCloud

One Click with StackPointCloud

TLS 1.3

Fancy new crypto, part 1: X25519 for TLS 1.2 (and earlier)

Fancy new crypto, part 2: RSA-PSS for TLS 1.2

The dark side of the moon

The case of the missing OCSP

ChaCha20-Poly1305 draft

Slow Base64: veni, vidi, vici

Conclusion

The Publicly-Trusted PKI Ecosystem — Abridged

Anthropogenic Threats

Solving One Problem...

... with More Problems.

CAA and Cloudflare

A Bright and Hopefully Not-Too-Distant Future

What is a mince pie?

Review Methodology

What does and does not constitute a pie?

Results

Top 5 Pies

1. Heston from Waitrose Spiced Shortcrust Mince Pies Lemon

2. Dunns Traditional Deep-Fill Pastry Mince Pies

3. Fortnum & Mason

4. Marks & Spencer Standard Mince Pies (red box)

5. Marks & Spencer Extra Special Mince Pies

Other Entrants

Greggs

Costco

Mr Kipling

Aldi (Cognac Steeped)

Jimmy’s Home-made Pies

Falling by the wayside

Second datacenter in Texas; first on the border with Mexico

Talking of interconnection - what’s needed is a solid IXP footprint

Next up, we go south!

Battle for the Net

Are you in San Francisco? Attend Cloudflare’s Save the Internet! Net Neutrality Call-A-Thon

Further reading about net neutrality

More about the FCC

Conclusion

Background: How Email Works

Stopping Phishy Behavior

Using DNS To Combat Phishing

Initial Optimisations

Respect The TTFB

Enabling Cookie-Based Caching

Debugging

Caching Content with CSRF Protected Forms

But this has been a problem since 2015?

Conclusion

Challenges of Storing Ephemeral Content (or: No Stale Bananas)

To put this in terms of bananas:

To put this in terms of banana meme SVGs:

If only ABC and Kari had strong support for Cache-Control response headers!

Mirai Genesis

How Mirai works

Replication module

Attack module

Krebs on Security attack