• Chat Development

What It Takes to Make a Chat

Paul Dmitriev
24 Mar 2020
27

It’s hard to imagine a modern dating startup without two things: a chat and a mobile app. It comes as no surprise then, that when developing the Zin safe dating service, we needed to create a chat. Not just any chat: a highly secure one with an eye-popping message delivery speed. The project assumed a rapid growth of the user pool and focused primarily on security, which is crucial given the latest series of hacking attacks on popular dating services.

In these articles (yes, the amount of information I want to share is definitely going to exceed a single article. So buckle up!), I will tell you how we solved the issue. No, it’s not “rocket science.” However, the number of different aspects and specific details we had to consider was huge. This component was expected to be the most difficult in the project, and I’ll try to highlight the lessons we learned from developing it.

Tool selection 

Wise people say: before implementing something, look at the ready-made SaaS solutions, which we have already done. It turned out that there are dozens of “turnkey chats” out there, but most of them had a rather painful pricing policy, which wasn’t the best feature for a fast-growing (as we’d hoped) startup. Of course, most ready-made solutions have some kind of trial versions or “start-up” pricing plans, but they are pretty small and will probably be used up before reaching the point of return. 

Here are some of the solutions we were considering.

     One of the most popular “turnkey chat” solution providers that is actively promoting itself in the industry communities. The free plan is limited to 25 connections, and you’re welcome to negotiate the cost of more connections with the company’s sales managers.

     The famous provider of voice calls and SMS APIs also offers a solution for creating a chat. Their free plan allows serving up to 200 users; then, the price starts at 3 cents per user and decreases as the user pool grows.

     A pretty advanced solution that offers support for additional features like channels. After the trial period expires, the minimal plan costs $500 and allows for serving up to 1,250 connections at a time.

We dropped self-hosted solutions right away. We wanted to remove as much information from our servers as possible, so we liked the idea we came up with during a discussion: to make the chat serverless.

Lesson 1. Know your ready-made PaaS solutions, but know that often your own solutions will be more cost-effective.

How does one protect users from potential hacking that leads to message leaks? Store as little messages online as possible. The safest information is no information. We decided we would delete messages from the server immediately after they’ve been received, and for a temporary repository, we will use a ready-made cloud solution.

This is the moment when someone from the back row will yell “Peer to peer encoding” and recall the Signal protocol. Of course, this solution would be even better, but we dropped it. It would require investing a lot in cryptography, and we needed to build an MVP, so we chose the balance between security and financial & time costs.

Obviously, the first thing we looked at was what Amazon had to offer for us. Turns out, not that much. Sure, the wisest (and most impressively bearded) of developers can build a chat engine using the components of its constructor by writing a dozen or two lambda functions. I even remember seeing a tutorial on YouTube that was a couple of hours long. 

Luckily for us (as it seemed at the time), Google already had an almost perfect tool for such tasks – Firebase. See for yourself: it’s cross-platform, there’s an SDK for many languages, it has a pretty nice (especially for Google) documentation, affordable pricing, flexible (even too much) security policies, easy authentication, and additional services that were very useful to us.

Side note: Firebase Cloud Firestore SDK

The synergy of engineers and project managers at Google led to the birth of a truly convenient solution for mobile data storage. Of course, Firestore is not perfect, and there are a lot of various restrictions, but it is very convenient for simple tasks like saving your application data and its cross-platform synchronization. You could say that Firestore is a large hierarchical repository with flexible access settings. At the top-level, this is just a collection of documents with fields of different types, and each of the fields can also be a collection of documents. Your application can write and read data anywhere in this hierarchy if the current security policy allows it.

The ability to “subscribe” to any collection and almost real-time retrieve of information about its changes is convenient, and later you will see that you can do it easily. Generally speaking, the entire Cloud Firestore SDK is written in such a way as to hide its online entity from the user as much as possible. In fact, you work with it as with standard storage, and it takes care of all the offline processing, synchronization, etc. In most cases, this is both convenient and sometimes troublesome. We will discuss it in one of the following articles of the series.

Designing

After we had developed several prototypes, we decided to focus on the following:

  • For each user of our application, we would create a document in Firebase containing a sub-collection of messages. Sending a message is limited to merely recording the document in the sub-collection of the corresponding user.
  • After sign-in or registration, each client would subscribe to their messages sub-collection, receive a message and process it.
  • Using security policies, we would grant any user the rights to write to any sub-collection of other users, but only the owner has the reading rights. This is the moment when the flexible system of access control in Firebase starts justifying itself.

Now, it’s time for a small quiz: taking into account that our project had to be cross-platform from the very beginning, what did we have to do first? Who said, “Hold a gazillion meetings?” You’re right, but let’s pretend we live in a perfect world. So, first, we needed documentation on data formats so that all three parties (the back end, iOS and Android) could communicate in the same language, i.e., in the same format. 

Jumping ahead, I’d like to say that it didn’t help much (reading documentation is for chickens), but it still made it easier for us to move forward. Here is a shortened example of the messages our mobile clients exchange.

{
    "timestamp": "message timestamp in UTC",
    "type": "message",
    "sender": {
        "id": "id of user",
        "name": "username, selected by user",
        "profile_image": "user's avatar url"
    },
    "payload": {
        "text": "message's text",
        "attachments": []
    }
}

I will not dig deep into the details. Everything is more complicated there since we need to support a lot of things, from sending pictures to self-destructing messages. So, I showed you the simplest option – plain text. As you can see, we had to be a bit “verbose,” so that the message would contain all the information necessary for the recipient (e.g., the structure of the sender field). Since it was obvious that we would have different messages even at the initial stage, we thought about the type field in advance and decided that the payload would be changing depending on this type. I can’t tell you how many times this saved us later on.

Lesson 2. In a cross-platform team, you need to start with detailed specifications.

After that, we were able to get down to development, and we started with the…

Firebase security policy

Google’s solution offers a very flexible system for setting access levels. Actually, the set of rules is a script that runs for each request and determines what features the operation has. Google’s documentation details it, but in this case, the code is worth a thousand words:

function isAuthenticated() {
    return (request.auth != null && request.auth.uid != null);
}

function isOwner(uid) {
    return request.auth.uid == uid;
}

function isAuthor() {
    return resource.data.sender.user_id == request.auth.uid;
}

service cloud.firestore {
    match /databases/{database}/documents {
        match /users/{userId} {
            allow read, write: if isAuthenticated() && isOwner(userId);
            match /messages/{msgId} {
                allow create: if isAuthenticated();
                allow read: if isAuthenticated() && isOwner(userId);
                allow update: if isAuthenticated() && isOwner(userId);
                allow delete: if isAuthenticated() && (isOwner(userId) || isAuthor());
            }
        }
    }
}

 

To begin, there are three helper functions that simplify the building of a core policy.

  • isAuthenticated

     verifies that the user is authenticated (a necessary condition for us). As you can see from the code, the built-in request property stores information about the request to the repository.

  • isOwner

     verifies if the “author” of the current request matches the specified ID. In our case, this function is a part of the predicate, “I am the recipient of this message.”

  • isAuthor

     verifies whether the client executing the request is the author of the message. Please note that the built-in resource.data property is a document, access to which is “verified.” This allows checking document fields for decision making.

After that, everything is pretty straightforward. Each user has access to the /users/{id} document if their identifier matches the ID, or, in other words, everyone gets full access to their document. By the way, this was also useful for us to store additional information. 

Things get more complicated with /users/{id}/messages/ sub-collection:

  • Any authenticated user can send messages to other users (create)
  • Only recipients are allowed to read and update messages
  • Recipients and senders can delete a message (the “oops, that was a mistake, I have to delete it right now” function was vital)

 

Authenticated user

Recipient
(message owner)
Sender

Create

✔️

Read

✔️

Update

✔️

Delete

✔️

✔️

 

After that, we could safely proceed to the next stage, which was…

Authentication

Initially, the plan was to use two cloud services, AWS and Firebase, at the same time, and authentication was going to be the responsibility of AWS Cognito. We even implemented it, although later it was removed in favor of our own server. Anyway, this is a story for another time.

Of course, Firebase has its back end for user authentication, but for now, it’s too simple and doesn’t work well in the case of sign-in with no password using one-time codes received via email. Plus, at one time, we planned not to use the user’s email as a necessary field for registration, which also didn’t fit into the basic Firebase scenarios. That’s why we did the authentication ourselves. Generally, it was a trivial task that each of us solved many times. Everything is traditional here: sessions, their identifiers, tokens.

Then, we started realizing that you need to sign in to Firebase, too, and do this alongside signing in to our application. Fortunately, Google predicted a scenario like this, and we used the Custom Authentication System without any problems. It all works really simply: we use the Firebase Admin SDK on the server to create a custom key that we pass to the client to sign in to Firebase. 

This is easily done:

1. Download the JSON file with admin keys for your Firebase application. Be careful: just like the One Ring, it gives your application access to Firebase, bypassing all the rules and security policies.

2. Indicate where this file is located through the environment variable:

export 
GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

3. Now, you can create a key:

from firebase_admin import auth
default_app = firebase_admin.initialize_app(cred)
print(default_app.name)  # "[DEFAULT]"

custom_token = auth.create_custom_token(uid)

The main thing here is to make sure the user ID matches the one in the main database of our server. Using this key on the client is also simple.

 

import FirebaseAuth
import RxSwift

func logIn(customToken: String) -> Completable {
    return Completable.create { observer in
        Auth.auth().signIn(withCustomToken: customToken) { _, error in
            if let error = error {
                observer(.error(error))
                Logger.shared.log(error, withPrefix: "Login error")
            } else {
                observer(.completed)
            }
        }
        return Disposables.create()
    }
}

 

Everything is pretty straightforward, the only trick is we used RxSwift and wrapped up the asynchronous login process and potential error handling in Completable.

I don’t want to overload the article with too much code, so I decided to provide some functions in the form of headers. To enter and register in our own API, we used the following two functions:

func login(email: String, code: String) -> Observable<OwnAuthResult>

and 

func register(email: String, name: String) -> Observable<OwnAuthResult>

The returned OwnAuthResult structure simply contains data from our server, including the generated code for sign-in to Firebase in the firebaseToken field.

Now, we can easily make a common function that processes both scenarios. I intentionally removed a part of the specific code not related to authentication (logs, analytics, etc., you can find in every project).

private func handleSignup(via: Observable<OwnAuthResult>) -> Observable<Void> {
    return via
        .flatMapLatest { [weak self] result -> Observable<Void> in
            guard strongSelf = self else {
                return .never()
            }
            return strongSelf.firebaseAuth.logIn(customToken: result.firebaseToken)
                .andThen(.just(()))
        }
}

The only thing I’m going to explain here is that we converted Completable to Observable <Void> using the .andThen() method. In this case, the type returned by the function can also be replaced with Completable, but it’s currently in our backlog, waiting to be refactored.

An existing user sign-in looks like this:

handleSignup(via: login(email: email, code: code))

Registration looks like this:

handleSignup(via: register(email: email, name: name))

Lesson 3. Custom authentication in Firebase is a flexible mechanism that allows integrating it with any of your solutions.

Finally, we are ready for the most interesting thing: chat.

Sending messages

Note: I’m just going to skip the part about converting our message structures into dictionaries like [String: Any] Firebase prefers communicating with. Anyone who’s ever worked with Codable knows this process well.

So, let’s get right down to the sending messages function in Firebase:

func send(message: ExchangeMessage, toUser userId: String, withDocId docId: String) -> Completable {
    return Completable.create { [weak self] completable in
        if let strongSelf = self, var payload = message.toJSONDictionary() {
            strongSelf.database.document("/users/\(userId)/messages/\(docId)").setData(payload) { err in
                if let error = err {
                    completable(.error(error))
                } else {
                    completable(.completed)
                }
            }
        } else {
            Logger.shared.log("Can't generate payload", level: .critical)
            completable(.completed)
        }
        return Disposables.create()
    }
}

 

Note that serialization is performed by calling the .toJSONDictionary() method, which returns an optional, but it’s highly unlikely for a message not to be serialized, so we ignore it and log the message just in case (this came in handy a couple of times during debugging). Also, we transfer the identifier of the future message separately since it’s generated by the persistence layer (we want the same message IDs both in the local database and in Firebase), which I intentionally left outside the parentheses.

We only need to save a message to the database, get its ID (the saving method to the database will return it) and end with calling “send” via the time-tested RxSwift “glue.”

Lesson 4. Reactive functional programming is an excellent way to describe complex asynchronous interactions. However, the learning curve is pretty steep, since there are many peculiarities you need to understand and “feel”.

Receiving messages

This is a little more complicated, but just a tiny bit. First, we need a function that can “listen” to the desired Firebase collection.

private func listAppends(collection name: String) -> Observable<(String, FirestoreData)> {
    return Observable.create { [database] observer in
        var listener: ListenerRegistration?
        let cancel = Disposables.create {
            listener?.remove()
        }
       
        listener = database.collection(name).addSnapshotListener { querySnapshot, err in
            if let err = err {
                Logger.shared.log(err, withPrefix: "FirestoreDataFetcher")
                observer.onError(err)
            } else {
                if cancel.isDisposed {
                    return
                }
                if let changes = querySnapshot?.documentChanges {
                    changes.forEach { change in
                        if change.type == .added {
                            observer.onNext((change.document.documentID, change.document.data()))
                        }
                    }
                }
            }
        }

        return cancel
    }
}

 

I think the function needs no explanation. We set the .addSnapshotListener() processor and never forget to remove it if our Observable is deployed. Perhaps, it’s only worth noting that we filter the additions to the collection since Firebase calls this processor when editing with record deletion, and it’s not what we need.

Since we have to consider user login/registration, we will call it as follows: 

func getCollectionAppends(collection name: String) -> Observable<(String, FirestoreData)> {
    return authManager.isLoggedIn.asObservable()
        .filter { $0 }
        .take(1)
        .flatMapLatest { [weak self] _ in
            return self?.listAppends(collection: name) ?? .never()
        }
}

Now, we need to process the results with the following scheme:

private let userId = BehaviorSubject<String?>(value: nil)

private(set) lazy var messages: Observable<(String, ExchangeMessage)> = {
    return userId.asObservable()
        .distinctUntilChanged()
        .flatMapLatest { [weak self] userId -> Observable<(String, FirestoreData)> in
            guard let strongSelf = self, let thisUserId = userId  else {
                return .empty()
            }


            return strongSelf.getCollectionAppends(collection: "/users/\(thisUserId)/messages")
                .catchError { error in
                    Logger.shared.log(error)
                    return .empty()
                }
        }
        .compactMap(deserializeMessage)
        .share()
}()

 

It works like this: when we need to start “listening” to messages, we assign the ID of the current user to userId, and when we need to stop, nil is sent there. All other manipulations are obviously seen in the code. That’s why I love RxSwift (and FRP as a whole) – for clarity.

So, we received a message, but it requires some actions. We need to save it locally, delete it from the server and notify the UI that you were blessed with a message.

You probably know exactly how we create this chain of asynchronous calls in RxSwift, so I will just provide a ready-made delete function, in case someone decides to reuse a part of our code.

func deleteMessage(withId msgId: String, atUser userId: String) -> Completable {
    return Completable.create { [weak self] completable in
        self?.database.document("/users/\(userId)/messages/\(msgId)").delete { err in
            if let error = err {
                completable(.error(error))
            } else {
                completable(.completed)
            }
        }
        return Disposables.create()
    }
}

 

Lesson 5. A “spike” is always great, but the real project will be ten times more difficult, and “tiny details” will take up more time than what is considered the “key functionality.”

Sending and receiving messages was the easiest part of all that we had to do. In the next parts of this series, I will tell you more about our chat’s components: delivery and persistence notifications, connection with the UI, backup service in iCloud, push notifications. We will also talk about the problems created by the “serverless” architecture and our solutions. Stay tuned!