• Chat Development

What It Takes to Make a Chat

Paul Dmitriev
24 Mar 2020
27
What It Takes to Make a Chat - image

It’s hard to imagine a modern dating startup without two things: a mobile app and a chat within this app. Pretty natural, that during the development of the Zin safe dating service, we faced the need to create a chat. Not just any chat: a highly secure one with an eye-popping message delivery speed. The project assumed a rapid growth of the user base and focused primarily on security, which is crucial given the latest series of data leaks at popular dating services.

Are you a modern startup and need help with similar issues? Book a call with us to discuss your development strategy.

Book a strategy session

Get actionable insights for your product

    Success!

    We’ll reach out to schedule a call

    In these articles (yes, the amount of information I want to share is definitely going to exceed a single article. So buckle up!), I will tell you how we solved the issue. Of course, our solution isn’t some kind of “rocket science.” However, the number of different aspects and specific details we had to consider was huge. This component expectedly became the most difficult in the project, and I’ll try to highlight the lessons we learned from developing it.

    Tool selection 

    Wise people say: before implementing something, look at the ready-made SaaS solutions, so we started with that. It turned out that there are dozens of “instant chat” solutions out there, but most of them had a rather tough pricing policy, which wasn’t the best feature for a fast-growing (as we’d hoped) startup. Of course, most ready-made solutions have some kind of trial versions or “start-up” pricing plans, but they are pretty limited and will probably be used up before reaching the point of return. 

    Here are some of the solutions we were considering (prices/plans are valid as I’m writing this review, but of course they can change in the future).

    • Sendbird. One of the most popular “chat-as-a-service” solution providers that is actively promoting itself in the developer communities. The free plan is limited to 25 connections, and you’re welcome to negotiate the cost of more connections with the company’s sales managers.
    • Twillo. The famous provider of voice calls and SMS APIs also offers a solution for creating a chat. Their free plan allows serving up to 200 users; then, the price starts at 3 cents per user and decreases as the user pool grows.
    • Stream. A pretty advanced solution that offers support for additional features like channels. After the trial period expires, the minimal plan costs $500 and allows for serving up to 1,250 connections at a time.

    Also, I need to mention that from the start we dropped self-hosted open-source solutions, as we wanted to remove as much information from our servers as possible. Finally, we liked the idea we came up with during a discussion: to make the chat serverless.

    Lesson 1. Know your ready-made PaaS solutions, but know that often your own solutions will be more cost-effective.

    How does one protect users from potential hacking that leads to message leaks? Store as little messages online as possible. The safest information is no information. We decided we would delete messages from the server immediately after they had been received, and for a temporary repository, we will use some ready-made cloud solution.

    This is the moment when someone from the back row will mention “peer to peer encoding” and recall the Signal protocol. Of course, this solution would be even better, but we dropped it. It would require investing a lot in cryptography, and we needed to build an MVP, so we chose the balance between security and financial & time costs.

    Obviously, the first thing we looked at was what Amazon had to offer for us. Turns out, not that much. Sure, the smartest (and most red-eyed) of developers can build a chat engine using the components of AWS’s constructor by writing a dozen or two lambda functions. I even remember seeing a tutorial on YouTube that was a couple of hours long. 

    Luckily for us (as it seemed at the time), Google already had an almost perfect tool for such tasks – Firebase. See for yourself: it’s cross-platform, there’s an SDK for many languages, it has a pretty nice (especially for Google) documentation, affordable pricing, flexible (even too much) security policies, easy authentication, and additional services that were very useful to us.

    Side note: Firebase Cloud Firestore SDK

    The cooperation of engineers and project managers at Google led to the birth of a truly convenient solution for mobile data storage. Of course, Firestore is not perfect, and there are a lot of various restrictions, but it is super convenient for simple tasks like saving your application data and its cross-platform synchronization. You could think of Firestore as a large hierarchical repository with flexible access settings. At the top-level, this is just a collection of documents with fields of different types, and each of the fields can also be a collection of documents. Yes, if you thought that this concept reminds you of  JSON, you’re absolutely right. Your application can write and read data anywhere in this hierarchy if the current security policy allows it.

    The ability to “subscribe” to any collection and retrieve the information about its changes almost in real-time is convenient, and later you will see that you can do it easily. Generally speaking, the entire Cloud Firestore SDK is written in such a way as to hide the online activities from the user as much as possible. In fact, you work with it as with local storage, and it takes care of all the online and offline processing, synchronization, etc. In most cases, this is really convenient, but as we’ll discuss in one of the following articles, sometimes it can become troublesome.

    Designing

    After we had developed several prototypes, we decided to focus on the following:

    • For each user of our application, we would create a document in Firebase containing a sub-collection of messages. Sending a message is simply adding the document in this sub-collection of the corresponding user.
    • After startup, each authenticated client would subscribe to their messages sub-collection, receive a message and process it.
    • Using security policies, we would grant any user the right to write to a sub-collection of any other user, but only the owner has the reading rights. This is the moment when the flexible system of access control in Firebase starts justifying itself.

    Now, it’s time for a small quiz: taking into account that our project had to be cross-platform from the very beginning, what did we have to do first? Who said, “Hold a gazillion meetings?” You’re right, but let’s pretend we live in a perfect world. So, first, we needed the documentation on data formats so that all three parties (the back end, iOS, and Android) could communicate in the same language, i.e., in the same format. 

    Jumping ahead, I’d like to say that it didn’t make our work 100% worry-free (reading documentation is for the birds), but it still made it easier for us to move forward. Here is a simplified example of the messages our mobile clients exchange.

    
    {
        "timestamp": "message timestamp in UTC",
        "type": "message",
        "sender": {
            "id": "id of user",
            "name": "username, selected by user",
            "profile_image": "user's avatar url"
        },
        "payload": {
            "text": "message's text",
            "attachments": []
        }
    }

     

    I will not dig deep into the details, everything is more complicated there since we need to support a lot of things, from sending pictures to self-destructing messages. So, I showed you the simplest option – plain text. As you can see, we had to be a bit “verbose,” so that the message would contain all the information necessary for the recipient (e.g., the contents of the sender field). Since it was obvious that we would have different messages even at the initial stage, we thought about the type field in advance and decided that the payload would be changing depending on this type. I can’t tell you how many times this saved us later on.

    Lesson 2. In a cross-platform team, you need to start with detailed specifications.

    After that, we were able to get down to development, and we started with the…

    Firebase security policy

    Google’s solution offers a very flexible system for setting access levels. Actually, the set of rules is a script that runs for each request and determines what permissions the operation has. Google’s documentation details it, but in this case, the code is worth a thousand words:

    function isAuthenticated() {
        return (request.auth != null && request.auth.uid != null);
    }
    
    function isOwner(uid) {
        return request.auth.uid == uid;
    }
    
    function isAuthor() {
        return resource.data.sender.user_id == request.auth.uid;
    }
    
    service cloud.firestore {
        match /databases/{database}/documents {
            match /users/{userId} {
                allow read, write: if isAuthenticated() && isOwner(userId);
                match /messages/{msgId} {
                    allow create: if isAuthenticated();
                    allow read: if isAuthenticated() && isOwner(userId);
                    allow update: if isAuthenticated() && isOwner(userId);
                    allow delete: if isAuthenticated() && (isOwner(userId) || isAuthor());
                }
            }
        }
    }

    To begin, there are three helper functions that simplify the building of a core policy.

    • isAuthenticated verifies that the user is authenticated (a necessary condition for us). As you can see from the code, the built-in `request` object stores information about the request to the repository.
    • isOwner verifies if the “author” of the current request matches the specified ID. In our case, this function is a part of the predicate, “I am the recipient of this message.” We’re checking userId that we got from the pattern against the ID of the user that is making this request.
    • isAuthor verifies whether the client executing the request is the author of the message. Please note that the built-in `resource.data` property is a document, access to which is “verified.” This allows checking document fields for decision making.

    After that, everything is pretty straightforward. Each user has access to the /users/{id} document if their identifier matches the ID, or, in other words, everyone gets full access to their “own” document. By the way, this was also useful for us to store additional information. 

    Things get more complicated with /users/{id}/messages/ sub-collection:

    • Any authenticated user can send messages to other users (create)
    • Only recipients are allowed to read and update messages
    • Recipients and senders can delete a message (the “oops, that was a mistake, I have to delete it right now” function was vital)
    Authenticated user Recipient (message owner) Sender
    Create ✔️
    Read ✔️
    Update ✔️
    Delete ✔️ ✔️

    After that, we could safely proceed to the next stage, which was…

    Authorization - image

    Authentication

    Initially, the plan was to use two cloud services, AWS and Firebase, at the same time, and authentication was going to be the responsibility of AWS Cognito. We even implemented it, although later it was removed in favor of our own server. Anyway, this is also a story for another time.

    Of course, Firebase has its back end for user authentication, but for now, it’s too simple and doesn’t work well in the case of sign-in with no password using one-time codes received via email. Plus, at one time, we planned not to use the user’s email as a necessary field for registration, which also didn’t fit into the basic Firebase scenarios. That’s why we did the authentication ourselves. Generally, it was a trivial task that each of us solved many times: sessions, their identifiers, tokens, etc.

    Then, we started realizing that you need to sign in to Firebase too, and do this alongside signing in to our application. Fortunately, Google predicted a scenario like this, and we were able to use the Custom Authentication System without any problems. It all works really simply: we use the Firebase Admin SDK on the server to create a custom key that we pass to the client to sign in to Firebase.

    Auth Sequence Diagram - image

    Working with this API is pretty simple:

    1. Download the JSON file with admin keys for your Firebase application. Be careful: just like the One Ring, it gives your application access to Firebase, bypassing all the rules and security policies.
    2. Indicate where this file is located through the environment variable:
    export 
    GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

    3. Now, you can create a key:

    from firebase_admin import auth
    default_app = firebase_admin.initialize_app(cred)
    print(default_app.name)  # "[DEFAULT]"
    
    custom_token = auth.create_custom_token(uid)

    The main thing here is to make sure the user ID in our DB matches the one in Firebase. 

    Using this key on the client is also simple.

    import FirebaseAuth
    import RxSwift
    
    func logIn(customToken: String) -> Completable {
        return Completable.create { observer in
            Auth.auth().signIn(withCustomToken: customToken) { _, error in
                if let error = error {
                    observer(.error(error))
                    Logger.shared.log(error, withPrefix: "Login error")
                } else {
                    observer(.completed)
                }
            }
            return Disposables.create()
        }
    }

    Everything is pretty straightforward, the only trick is we used RxSwift and wrapped up the asynchronous login process and potential error handling in Completable.

    I don’t want to overload the article with too much code, so I decided to provide some functions only as headers. To enter and register in our own API, we used the following two functions:

    func login(email: String, code: String) -> Observable<OwnAuthResult>

    and 

    func register(email: String, name: String) -> Observable<OwnAuthResult>

    The returned OwnAuthResult structure simply contains data from our server, including the generated code for sign-in to Firebase in the firebaseToken field.

    Now, we can easily make a common function that processes both scenarios. I intentionally removed a part of the specific code not related to authentication (logs, analytics, etc., you can find in every project).

    private func handleSignup(via: Observable<OwnAuthResult>) -> Observable<Void> {
        return via
            .flatMapLatest { [weak self] result -> Observable<Void> in
                guard strongSelf = self else {
                    return .never()
                }
                return strongSelf.firebaseAuth.logIn(customToken: result.firebaseToken)
                    .andThen(.just(()))
            }
    }
    
    

    The only thing I’m going to explain here is that we converted Completable to Observable <Void> using the .andThen() method. In this case, the type returned by the function can also be replaced with Completable, but it’s currently in our backlog, waiting to be refactored.

    An existing user sign-in looks like this:

    handleSignup(via: login(email: email, code: code))

    Registration looks like this:

    handleSignup(via: register(email: email, name: name))

    Lesson 3. Custom authentication in Firebase is a flexible mechanism that allows integrating it with any of your solutions.

    Finally, we are ready for the most interesting thing: chat.

    Receiving messages - image

    Sending messages

    I’m just going to skip the part about converting our message structures into dictionaries like [String: Any] Firebase prefers communicating with. Anyone who’s ever worked with Codable knows this process well.

    So, let’s get right down to the sending messages function in Firebase:

    func send(message: ExchangeMessage, toUser userId: String, withDocId docId: String) -> Completable {
        return Completable.create { [weak self] completable in
            if let strongSelf = self, var payload = message.toJSONDictionary() {
                strongSelf.database.document("/users/\(userId)/messages/\(docId)").setData(payload) { err in
                    if let error = err {
                        completable(.error(error))
                    } else {
                        completable(.completed)
                    }
                }
            } else {
                Logger.shared.log("Can't generate payload", level: .critical)
                completable(.completed)
            }
            return Disposables.create()
        }
    }

    Note that serialization is performed by calling the .toJSONDictionary() method, which returns an optional, but it’s highly unlikely for a message not to be serialized, so we ignore this possibility and log the message just in case (this came in handy a couple of times during debugging). Also, we transfer the identifier of the future message separately since it’s generated by the persistence layer (we want the same message IDs both in the local database and in Firebase), which I also skipped in this article.

    We only need to save a message to the database, get its ID (the saving method to the database will return it) and end with calling “send” via the time-tested RxSwift “glue.”

    Lesson 4. Reactive functional programming is an excellent way to describe complex asynchronous interactions. However, the learning curve is pretty steep, since there are many peculiarities you need to understand and “feel”.

    Sending messages - image

    Receiving messages

    This is a little more complicated, but just a tiny bit. First, we need a function that can “listen” to the desired Firebase collection.

    private func listAppends(collection name: String) -> Observable<(String, FirestoreData)> {
        return Observable.create { [database] observer in
            var listener: ListenerRegistration?
            let cancel = Disposables.create {
                listener?.remove()
            }
           
            listener = database.collection(name).addSnapshotListener { querySnapshot, err in
                if let err = err {
                    Logger.shared.log(err, withPrefix: "FirestoreDataFetcher")
                    observer.onError(err)
                } else {
                    if cancel.isDisposed {
                        return
                    }
                    if let changes = querySnapshot?.documentChanges {
                        changes.forEach { change in
                            if change.type == .added {
                                observer.onNext((change.document.documentID, change.document.data()))
                            }
                        }
                    }
                }
            }
    
            return cancel
        }
    }

    I think the function needs no explanation. We set the .addSnapshotListener() processor and never forget to remove it if our Observable is disposed. Perhaps, it’s only worth noting that we filter the additions to the collection since Firebase also calls this handler on record edits and deletes, and that’s not what we need.

    Since we have to consider user login/registration, we will call it as follows:

    func getCollectionAppends(collection name: String) -> 
    Observable<(String, FirestoreData)> {
        return authManager.isLoggedIn.asObservable()
            .filter { $0 }
            .take(1)
            .flatMapLatest { [weak self] _ in
                return self?.listAppends(collection: name) ?? .never()
            }
    }
    
    

    You can easily guess that authManager class is responsible for user authentication, and its isLoggedIn property holds current login status for the user.

    Now, we need to process the results in the following way:

    private let userId = BehaviorSubject<String?>(value: nil)
    
    private(set) lazy var messages: Observable<(String, ExchangeMessage)> = {
        return userId.asObservable()
            .distinctUntilChanged()
            .flatMapLatest { [weak self] userId -> Observable<(String, FirestoreData)> in
                guard let strongSelf = self, let thisUserId = userId  else {
                    return .empty()
                }
    
    
                return strongSelf.getCollectionAppends(collection: "/users/\(thisUserId)/messages")
                    .catchError { error in
                        Logger.shared.log(error)
                        return .empty()
                    }
            }
            .compactMap(deserializeMessage)
            .share()
    }()

    It works like this: when we need to start “listening” to messages, we assign the ID of the current user to userId, and when we need to stop, nil is sent there. All other manipulations are obviously seen in the code. That’s why I love RxSwift (and FRP as a whole) – for clarity.

    So, we received a message, but it requires some actions. We need to save it locally, delete it from the server and notify the UI that the user was blessed with a message.

    You probably know exactly how we create this chain of asynchronous calls in RxSwift, so I will just provide a ready-made delete function, in case someone decides to reuse a part of our code.

    func deleteMessage(withId msgId: String, atUser userId: String) -> Completable {
        return Completable.create { [weak self] completable in
            self?.database.document("/users/\(userId)/messages/\(msgId)").delete { err in
                if let error = err {
                    completable(.error(error))
                } else {
                    completable(.completed)
                }
            }
            return Disposables.create()
        }
    }

    Lesson 5. A “spike” is always great, but the real project will be ten times more difficult, and “tiny details” will take up more time than what is considered the “key functionality.”

    Sending and receiving messages was the easiest part of all that we had to do. In the next articles of this series, I will tell you more about our chat’s components: delivery and persistence notifications, connection with the UI, iCloud backup service, and push notifications. We will also talk about the problems created by the “serverless” architecture and our solutions. Stay tuned!

    Book a call with us to discuss your chat architecture.

    Book a strategy session

    Get actionable insights for your product

      Success!

      We’ll reach out to schedule a call