FaceTime is Apple’s video and audio calling service. Like iMessage, FaceTime calls use the Apple Push Notification service (APNs) to establish an initial connection to the user’s registered devices. The audio/video contents of FaceTime calls are protected by end-to-end encryption, so no one but the sender and receiver can access them. Apple can’t decrypt the data.
The initial FaceTime connection is made through an Apple server infrastructure that relays data packets between the users’ registered devices. Using APNs notifications and Session Traversal Utilities for NAT (STUN) messages over the relayed connection, the devices verify their identity certificates and establish a shared secret for each session. The shared secret is used to derive session keys for media channels streamed using the Secure Real-time Transport Protocol (SRTP). SRTP packets are encrypted using AES256 in Counter Mode and authenticated with HMAC-SHA1. After the initial connection and security setup, FaceTime uses STUN and Internet Connectivity Establishment (ICE) to establish a peer-to-peer connection between devices, if possible.
Group FaceTime extends FaceTime to support up to 33 concurrent participants. As with classic one-to-one FaceTime, calls are end-to-end encrypted among the invited participants’ devices. Even though Group FaceTime reuses much of the infrastructure and design of one-to-one FaceTime, these group calls feature a key-establishment mechanism built on top of the authenticity provided by Apple Identity Service (IDS). This protocol provides forward secrecy, meaning that the compromise of a user’s device won’t leak the contents of past calls. Session keys are wrapped using AES-SIV and are distributed among participants using an ECIES construction with ephemeral P-256 ECDH keys.
When a new phone number or email address is added to an ongoing Group FaceTime call, active devices establish new media keys and never share previously used keys with the newly invited devices.