Building an End-to-End Encrypted Messenger
Contents
I was watching a Numberphile video about how WhatsApp’s encryption works, and something clicked. I had always been curious about end-to-end encryption the idea that I could send a message and only the recipient could read it, even if the server was compromised. But watching those animations of the Double Ratchet algorithm, seeing how ephemeral keys are exchanged and how perfect forward secrecy actually works… I had to try building it myself.
The videos that got me hooked:
- How WhatsApp discovered a huge encryption loophole
- The mathematics behind RSA encryption
- How Signal encryption works
I spent weeks researching. I read the Signal Protocol specification, watched conference talks, dug into academic papers. I figured I understood the theory well enough. With Flutter’s cross-platform capabilities, I thought I could build something similar in a few weeks.
I was wrong. It took six months, three database switches, two push notification systems, one failed native code experiment, and more changes of direction than I care to admit. But I learned more than I ever expected.
Chapter 1: The Naive Beginning
I scaffolded a Flutter app with a FastAPI backend, chose SQLite for the local database (lightweight, simple), and sketched out the chat UI. The plan was straightforward:
- Build the UI
- Add WebSockets for real-time messaging
- Sprinkle in some encryption
- Done
I had never heard of the Signal Protocol. I thought encryption meant “HTTPS to the server, AES on the database.” That first week, I built a working chat demo messages flowing between devices in real-time. I felt unstoppable.
Then a friend asked: “So the server can read all the messages?”
Chapter 2: The Encryption Rabbit Hole
End-to-end encryption meant the server should be a dumb pipe. Messages should be encrypted on the sender’s device and only decryptable on the recipient’s. The server would store gibberish.
I discovered the Signal Protocol the same encryption used by WhatsApp and Signal themselves. It uses something called X3DH (Extended Triple Diffie-Hellman) and a Double Ratchet algorithm that rotates keys with every message.
flowchart TD
A[📱 Building a Signal-Protocol Messenger<br/><i>This story the journey</i>] --> B[🔐 X3DH Deep Dive]
A --> C[⚙️ Double Ratchet Deep Dive]
B --> D[Initial Key Exchange<br/>Establishing the shared secret]
C --> E[Continuous Encryption<br/>Rotating keys per message]
D --> F[Combined: Signal Protocol]
E --> F
This post tells the story of building the messenger. The linked posts above dive deep into the cryptography.
The theory was elegant. The implementation was brutal.
The libsignal Dart package existed, but it was FFI bindings to Rust code. FFI is finicky you need to manage memory correctly, handle async callbacks in a sync context, and pray the platform channels don’t break. My first attempt crashed on message decryption. Every. Single. Time.
My first breakthrough came when I realized I was generating new identity keys on every app restart. Messages encrypted yesterday couldn’t be decrypted today because the keys changed. I had to build a secure storage system using flutter_secure_storage to persist the Signal Protocol’s identity keys, 400 one-time prekeys, and session records.
The debugging was maddening. Signal Protocol errors give you nothing just “decryption failed.” I added logging, checked byte arrays, compared base64 encodings. I learned more about elliptic curve cryptography than I ever wanted to know.
Here’s what the key persistence looks like:
// signal_service.dart the breakthrough that made decryption work
// across app restarts
Future<void> _loadOrGenerateIdentityKeyPair() async {
final stored = await _ekv.read(_kIdentityKeyPair);
if (stored != null) {
// Reload persisted identity key pair from secure storage
_identityKeyPair = IdentityKeyPair.deserialize(
bytes: base64Decode(stored),
);
} else {
// First run generate a fresh Curve25519 identity key pair
_identityKeyPair = IdentityKeyPair.generate();
await _ekv.write(
_kIdentityKeyPair,
base64Encode(_identityKeyPair!.serialize()),
);
}
}
The EkvService is a thin wrapper around flutter_secure_storage. Without this persistence layer, every app restart generated new keys, rendering previous messages unreadable. The Signal Protocol requires 400 one-time prekeys, a signed prekey, and optionally a Kyber-1024 post-quantum prekey all persisted to the device’s secure enclave.
Chapter 3: The Database Dance (And Countless Other Reverts)
If you look at my development history, you’ll see a pattern: I made a choice, committed to it, then reversed course days later when reality hit. I lost count of how many times I changed my approach.
Remember how I chose SQLite for being “simple”? That changed. Twice.
I got ambitious. SQLite felt too “mobile” for a serious backend. I switched to PostgreSQL for its power, added Kafka for message queuing, S3 for media storage. I dockerized everything, wrote Terraform configs, felt like a real backend engineer.
Two weeks later, I switched back. PostgreSQL + Kafka was overkill. For a solo project with maybe 100 users max, it added deployment complexity without real benefits. I realized the Signal Protocol was doing the heavy lifting on-device. The server just needed to route encrypted blobs.
I simplified. SQLite on the client, MongoDB on the server (easier schema flexibility), Redis for message queuing (simpler than Kafka), and direct S3 uploads. The architecture got leaner, and I could focus on the hard problems.
But the real complexity was connection management. Mobile networks are flaky users switch from WiFi to cellular, go through tunnels, put the app in the background. I needed a WebSocket layer that survives all of this, plus a server that can handle horizontal scaling.
# websocket_manager.py connection management with Redis pub/sub
# for cross-instance delivery when scaling beyond one server
class ConnectionManager:
def __init__(self):
self.active_connections: Dict[str, WebSocket] = {}
async def send_personal_message(self, message: dict, user_id: str) -> bool:
# Try local connection first
if user_id in self.active_connections:
return await self._send_local(message, user_id)
# User not on this instance publish to Redis for other servers
redis = _get_redis()
if redis.available:
is_online = await redis.is_user_online(user_id)
if is_online:
await redis.publish_to_user(user_id, {
"_action": "deliver",
"_target_user": user_id,
"payload": message,
})
return True
return False
async def handle_redis_message(self, data: dict):
# Handle messages from other server instances via Redis pub/sub
action = data.get("_action")
if action == "deliver":
target_user = data.get("_target_user")
payload = data.get("payload", {})
if target_user and target_user in self.active_connections:
await self._send_local(payload, target_user)
The Redis layer solves a critical scaling problem: when deploying to Cloud Run, requests can hit different instances. Redis tracks which instance has which user, and the pub/sub mechanism routes messages to the right server.
// ws_service.dart exponential backoff reconnection with lifecycle awareness
void _scheduleReconnect() {
if (_intentionalClose) return;
final delay = Duration(
seconds: (_reconnectBase.inSeconds * (1 << _reconnectAttempts)).clamp(
1,
_reconnectCap.inSeconds,
),
);
_reconnectAttempts++;
_reconnectTimer?.cancel();
_reconnectTimer = Timer(delay, () async {
if (!_intentionalClose) await _openConnection();
});
}
@override
void didChangeAppLifecycleState(AppLifecycleState state) {
if (state == AppLifecycleState.resumed) {
_reconnectIfAuthenticated();
} else if (state == AppLifecycleState.paused) {
// Close WebSocket in background so server TTL expires
// and triggers FCM push for new messages
_backgroundPaused = true;
unawaited(_closeConnection());
}
}
The 10-second ping keeps the connection alive in foreground. When backgrounded, we intentionally close the connection so the server’s heartbeat monitor marks us offline this triggers silent push notifications to wake the background isolate for message processing.
This wasn’t the only reversal. My development logs read like a confession of my indecision native code experiments, push notification system switches, state management migrations. Every time I thought I had it figured out, I learned something new and had to change direction.
Chapter 4: The Native Code Experiment
The Signal Protocol worked… mostly. But I kept hitting edge cases with the Dart FFI bindings. Sometimes the Rust library would panic on malformed input. Other times, memory management issues caused mysterious crashes on iOS.
I decided to write native Signal Protocol implementations for Android (Kotlin) and iOS (Swift), then call them through Flutter’s method channels. This would give me more control, better debugging, and native performance.
I spent three weeks on this. I wrote Kotlin code for Android’s Keystore, Swift code for iOS Keychain, built method channel bridges, handled async callbacks.
Then I deleted it all.
The native implementations worked, but they were twice the code, twice the bugs, and ten times the maintenance burden. The FFI approach, despite its quirks, was actually more reliable once I understood the memory model. I learned when to quit.
Chapter 5: Adding Voice and Video
Text messaging was working. Time for calls.
I chose LiveKit, an open-source WebRTC SFU (Selective Forwarding Unit). WebRTC is its own beast STUN servers, TURN servers, ICE candidates, SDP offers and answers. I thought the hard part would be the media negotiation.
I was wrong. The hard part was integrating with iOS CallKit.
When someone calls you on iOS, the system expects a native incoming call UI immediately. But LiveKit needs a token to join the room, which requires an async server request. There’s a race condition: CallKit demands you show UI now, but you don’t have the connection details yet.
The solution was a two-phase approach: show the CallKit UI immediately using the caller info from the push notification, then fetch the LiveKit token in the background while the user is deciding to accept or reject. If they accept, the token is ready. If they reject, you throw it away.
Here’s the CallKit integration that handles the race condition:
// call_service.dart showing CallKit UI while fetching LiveKit token
Future<void> _handleIncomingCall(WsCallInviteEvent event) async {
// Prevent duplicate CallKit UIs (from FCM background handler)
final kv = Get.find<KvService>();
final activeCallRoom = await kv.read('callkit_active_room');
if (activeCallRoom == event.roomName) return;
await kv.write('callkit_active_room', event.roomName);
// Store for use in accept callback
_pendingCallerName = event.callerName;
_pendingCallerId = event.callerId;
// Show CallKit UI immediately with caller info from push payload
final callkitParams = CallKitParams(
id: const Uuid().v4(),
nameCaller: event.callerName,
type: event.callType == 'video' ? 1 : 0,
headers: {
'roomName': event.roomName,
'callerId': event.callerId,
},
);
await FlutterCallkitIncoming.showCallkitIncoming(callkitParams);
}
void _onCallkitAccept(Map<String, dynamic> body) {
// Delegate to background service to fetch LiveKit token
// while keeping CallKit UI responsive
final service = FlutterBackgroundService();
service.invoke('accept_call', {
'room_name': _activeIncomingRoomName,
'caller_id': _pendingCallerId,
'call_type': _pendingCallType,
});
}
The background service handles the /calls/accept POST and /calls/token fetch, then invokes back to the UI with the token. This keeps the main thread free for CallKit responsiveness.
On the server side, LiveKit token generation was straightforward but the call state management wasn’t:
# calls_router.py LiveKit token generation and Redis call state
def _generate_livekit_token(room_name: str, participant_identity: str,
participant_name: str) -> str:
from livekit.api import AccessToken, VideoGrants
token = (
AccessToken(api_key=settings.LIVEKIT_API_KEY,
api_secret=settings.LIVEKIT_API_SECRET)
.with_identity(participant_identity)
.with_name(participant_name)
.with_grants(VideoGrants(room_join=True, room=room_name))
.with_ttl(timedelta(hours=24))
)
return token.to_jwt()
# Redis tracks active calls to prevent duplicates and handle recovery
@router.get("/calls/active")
async def get_active_call(current_user=Depends(get_current_user)):
state = await redis_service.get_user_active_call(current_user.id)
if not state:
return {"active": False}
return {
"active": True,
"room_name": state.get("room_name", ""),
"caller_id": state.get("caller_id", ""),
"status": state.get("status", ""),
}
The Redis call state has a 5-minute TTL if both parties disconnect without properly ending the call, the state expires automatically. This prevents “zombie” call rooms that linger forever.
I learned that WebRTC needs camera and microphone permissions before you even start negotiating. The timing matters ask too early, users get annoyed. Ask too late, the call connection times out.
Chapter 6: The Push Notification Evolution
Push notifications seem simple. They’re not.
I started with Firebase Cloud Messaging (FCM). It worked great on Android, but iOS needed APNs (Apple Push Notification service) certificates, provisioning profiles, and entitlements. Plus, FCM on iOS is actually just a wrapper around APNs extra complexity for no benefit.
So I tried OneSignal, a third-party service that promised to handle both FCM and APNs with one SDK. It worked, but I didn’t love depending on another service for core functionality.
Finally, I went native. Direct FCM for Android, direct APNs for iOS. The code was more verbose, but I controlled everything. I learned about VoIP pushes on iOS (special push type that can wake your app immediately), notification service extensions (for decrypting messages in the background), and the nightmare that is iOS background execution limits.
Three complete switches to get something that should be simple. That’s the mobile development experience in a nutshell.
The real complexity was background message decryption. When a silent push arrives, the app has about 30 seconds to fetch pending messages from the server, decrypt them using the Signal Protocol, and show a notification. The catch: Signal keys are in secure storage, and background isolates have limited access to Flutter plugins.
// background_message_handler.dart decrypting messages in a background isolate
Future<void> handleBackgroundMessage(RemoteMessage message) async {
// Bootstrap all services manually GetX doesn't auto-init in isolates
Get.put(EkvService(), permanent: true); // Secure storage
Get.put(SignalService(), permanent: true); // Signal Protocol keys
Get.put(SignalCryptoService(), permanent: true); // Encryption/decryption
final crypto = Get.find<SignalCryptoService>();
final signalService = Get.find<SignalService>();
// Must initialize Signal before any crypto operations
await signalService.initialize();
// Prevent foreground from decrypting simultaneously (ratchet corruption)
await kv.write('bg_decrypt_processing', DateTime.now().millisecondsSinceEpoch.toString());
try {
// Fetch pending messages from server via HTTP (WebSocket is dead in background)
final pending = await api.fetchPendingMessages();
for (final msg in pending) {
// Decrypt using Signal Protocol Double Ratchet
final plaintext = await crypto.decrypt(
senderId: msg.senderId,
ciphertext: msg.ciphertext,
messageType: msg.signalMessageType,
);
// Show local notification with decrypted content
await showNotification(
title: msg.senderName,
body: plaintext,
);
}
} finally {
await kv.delete('bg_decrypt_processing');
}
}
The background handler sets a flag so the foreground knows decryption is happening. Without this, both isolates could try to advance the Double Ratchet simultaneously corrupting the session state and making future messages unreadable.
Chapter 7: Group Chats and Sender Keys
One-to-one messaging was stable. But group chats broke everything.
In Signal Protocol, group encryption uses “sender keys” each member generates a key for the group, encrypts messages with it, and distributes the key to other members. If you have 500 members, you need to handle 500 key distributions, rotations when someone leaves, and recovery when keys are lost.
I started with a planning phase, sketching out the architecture. The backend needed new endpoints: distribute sender keys, request missing keys, handle group member changes. I used a fan-out approach when someone sends a group message, the server fans it out to all members’ message queues.
The UI was complex group creation, member management, admin controls, leave/delete options. I also migrated from Provider to GetX for state management during this phase, which meant refactoring half the app.
The hardest bug took days to find. Group messages were failing to decrypt intermittently. After exhausting every other possibility, I realized the cipher state was getting corrupted on retry attempts. The fix was to always recreate the cipher state fresh, never reuse it after a failure. Simple in hindsight, maddening in the moment.
Here’s how sender keys are distributed to group members:
// signal_crypto_service.dart sender key distribution for groups
Future<void> distributeSenderKey(String groupId) async {
await _signal.initialize();
// Generate a new sender key for this group
final senderKey = SenderKeyDistributionMessage.create(
groupId: groupId,
senderId: _signal.identityPublicKeyBase64!,
distributionId: const Uuid().v4(),
);
// Persist to secure storage for future group messages
await _ekv.write(
'sender_key_$groupId',
base64Encode(senderKey.serialize()),
);
// Upload public portion to server for other members to fetch
await _senderKeysApi.distributeSenderKeys(
groupId,
SenderKeyDistributionUploadRequest(
distributionId: senderKey.distributionId,
chainKey: base64Encode(senderKey.chainKey),
signingKeyPublic: base64Encode(senderKey.signingKeyPublic),
),
);
}
Future<String> decryptGroupMessage(
String groupId,
String senderId,
String ciphertext,
) async {
// Fetch sender's distribution if we don't have it
final session = await _getOrCreateSenderKeySession(groupId, senderId);
// Critical: recreate cipher state on every decrypt attempt
// Reusing state after a failure corrupts the ratchet
final cipher = GroupCipher(session);
final plaintext = cipher.decrypt(base64Decode(ciphertext));
return utf8.decode(plaintext);
}
Group encryption uses a separate “sender key” ratchet from the one-to-one Double Ratchet. Each group member generates a sender key, distributes it to others, and encrypts group messages with it. The server fans out encrypted messages to all members’ queues.
Here’s the server-side fan-out logic for group messages:
# ws/handlers.py group message fan-out with per-recipient encryption
async def _handle_group_message(message: dict, user_id: str, username: str):
group_id = message.get("group_id")
encrypted_contents = message.get("encrypted_contents", {})
signal_message_types = message.get("signal_message_types", {})
# Verify membership
membership = await GroupMember.find_one(
GroupMember.group_id == group_id,
GroupMember.user_id == user_id,
)
if not membership:
await manager.send_personal_message(
{"type": "error", "detail": "Not a member of this group"},
user_id,
)
return
# Deduplicate using Redis SET NX
if client_message_id:
is_new = await redis_service.check_and_set_dedup(
client_message_id, ttl_seconds=600
)
if not is_new:
return # Already processed
# Fan-out: create separate Message records for each recipient
# Each gets their own encrypted_content (encrypted with their sender key)
pending_deliver = []
for rcpt_id, enc_content in encrypted_contents.items():
if rcpt_id == user_id:
continue # Don't send to self
msg = Message(
id=str(uuid.uuid4()),
sender_id=user_id,
recipient_id=rcpt_id, # Individual recipient
group_id=group_id,
encrypted_content=enc_content, # Encrypted for this recipient only
signal_message_type=signal_message_types.get(rcpt_id, 0),
status=STATUS_SENT,
)
await msg.insert()
notification = {
"type": "new_group_message",
"sender_id": user_id,
"group_id": group_id,
"message": {
"id": msg.id,
"encrypted_content": enc_content,
# ... other fields
},
}
pending_deliver.append((rcpt_id, msg.id, notification))
# Deliver to each recipient (WebSocket if online, Redis queue if offline)
for rcpt_id, msg_id, notification in pending_deliver:
await deliver_to_recipient(rcpt_id, msg_id, user_id, notification)
The key insight: the server stores N copies of a group message (one per recipient), each encrypted with that recipient’s individual sender key. This is how the Signal Protocol handles groups the server never sees plaintext, and each recipient decrypts with their own key chain.
Chapter 8: The Polish Phase
The core features worked. Now came the polish that separates a prototype from a product.
For performance, I added a two-tier storage system: Redis for hot messages (fast retrieval), MongoDB for cold storage (persistent). Messages move from Redis to MongoDB after delivery.
Media handling required image compression (don’t upload 4MB photos from modern phones), BlurHash placeholders (show a blurred preview while the real image loads), and local caching (don’t re-download the same image every time you open the chat).
Users expect to reply from the notification shade without opening the app. Implementing this on Android required background message decryption, which requires the Signal Protocol keys, which requires the secure storage, which isn’t available in background isolates…
The solution was a hybrid: decrypt in the background using a limited key cache, or defer to the main app if keys aren’t available.
Security paranoia set in. If a device is jailbroken or rooted, the secure storage might be compromised. I added checks using root_jailbreak_sniffer to warn users.
Sometimes, despite everything, decryption fails. Keys get out of sync, messages arrive out of order, ratchets get stuck. I added system messages to inform users when this happens, and a “Reset Session” feature to force a fresh X3DH exchange.
What I Learned
Six months of development, and I’m still not “done.” But I’ve learned lessons I couldn’t have gotten any other way:
Start simple, then optimize. My PostgreSQL + Kafka phase was premature optimization. The “boring” stack (SQLite, MongoDB, Redis) handled everything I needed.
Security is a system property. It’s not just about the encryption algorithm. It’s about key storage, prekey management, server-side logic, push notification handling, and UI that doesn’t leak metadata. Every piece matters.
Mobile development is 20% code, 80% edge cases. Background execution, permission handling, lifecycle management, battery optimization dialogs these take more time than the core features.
When to quit. The native code experiment taught me that “more control” isn’t always better. Sometimes you accept the limitations of a library and work within them.
Documentation is a love letter to your future self. I wrote comprehensive README with architecture diagrams after I had to re-remember how my own system worked after a two-week break. Those Mermaid diagrams now save me hours of mental reconstruction.
Where Things Stand Today
After six months of research, false starts, and direction changes, the app finally works. Here’s what I built:
Messaging: End-to-end encrypted chat using the Signal Protocol with X3DH key exchange and Double Ratchet for perfect forward secrecy. The server stores only encrypted blobs it can’t read a single message.
Calling: Voice and video calls via LiveKit with WebRTC. Native CallKit integration on iOS means incoming calls feel like real phone calls.
Group chats: Fan-out encryption to all group members with sender keys.
Media sharing: Compressed images and videos with BlurHash placeholders and local caching.
Push notifications: Silent FCM/APNs pushes that wake the app, decrypt messages in the background, and show proper previews.
Multi-language support: Telugu, Hindi, and English localization (added way later than it should have been).
The Signal Protocol implementation handles 400 one-time prekeys per device, Kyber-1024 for post-quantum protection, and automatic replenishment when keys run low. The WebSocket service has exponential backoff reconnection that survives network switches and backgrounding.
Is it production-ready? Not quite. I still need audit logging, better error recovery for edge cases, and probably a real security audit of my Signal Protocol implementation. The group chat feature works but could be more robust. Disappearing messages and multi-device support are on the roadmap.
But as a learning project? It’s the most valuable thing I’ve ever built. I went from watching Numberphile videos to understanding elliptic curve cryptography, FFI bindings, WebRTC signaling, and the chaos of mobile background execution. The development history all those changes and reversions tells the real story of learning through failure.
Thanks for reading. Now go build something that scares you a little.