At OrangeLoops, we’ve worked on several AI-powered mobile apps, gaining experience in Mobile ML and Mobile AR projects such as the magic mirror feature for the Loog guitar apps (btw, buy a Loog Guitar, they are really cool) and DrillRoom. The lessons learned, and shared in this article, were gained in scenarios where the AI execution needed to take place client-side, executing the machine learning models on edge computing devices, mainly smart phones. Relying on server-side execution of the machine learning models may yield other best practices. Our experience is tackling computer-vision-intensive problems rather than in generative AI, or ChatGPT-like experiences.
In this article, we share a glimpse of the challenges we faced and some of the lessons we learned hoping to provide value to teams addressing similar projects.
The top 10 lessons we learned in our journey to implement AI-powered apps are the following:
1. Validate feasibility with a Proof of Concept
These types of projects have a research and development quality. It’s not always obvious that the solution envisioned is technically feasible, and can be implemented providing a reasonable user experience for the target audience’s devices. Before making any hard commitments, we recommend going through a proof of concept to evaluate the state of the art in mobile chipsets, ML models, and other factors before investing significant resources.
2. Be ready for an iterative process
AI-powered mobile app development is an iterative process that naturally loops through four main stages:
- Data collection, curation and annotation
- Model training
- App development and integration
Each stage involves different environments, skills, and maybe even whole teams. Prepare for frequent adjustments and improvements at each stage and use version control and automation tools to streamline the workflow. For data management in the first stage, you will need to rely on different practices and tools than for source code, given the unique challenges of handling large volumes of data.
Evaluate automating as much as possible the different steps in the workflow of each phase – in the long run this will save you time. As usual, this may make sense or not depending on the available budget and expected life cycle for the project.
3. Data is king
Get a hold of a good dataset; not being able to can be a showstopper. Data annotation workload can be split and tackled in parallel by multiple people or outsourced. Double-checking for human errors in the annotation process is important: beware of the adage in tech known as”when garbage in, garbage out”. It never applied better than in ML projects.
Leverage all the data augmentation techniques that fit your scenario to make the most out of your dataset in training the model.
4. Be ready for a wild spread in skills and environments
When embarking on a mobile AI or machine learning project, it’s essential to prepare for the diverse range of skills, tasks, environments, and programming languages involved. This can be particularly challenging for small teams, as the project may encompass several distinct stages, each with its unique requirements. To be fair this may be common to all machine learning projects, and not just mobile first ones.
Among the challenges faced are the following:
- Data Capture: Acquiring the necessary data may require access to specific hardware, such as a guitar, or necessitate traveling to particular locations like a billiard table or soccer field. This stage may demand diverse expertise and equipment to gather the data needed for training. Moving between locations can be taxing. Plan for all the data scenarios you need to collect beforehand. Make the most of each trip.
- Data Annotation: This phase involves labeling the data with appropriate tags and categories, which may require a different set of skills and tools. Some data annotation tasks can be terribly repetitive and low-skill (to be fair others may demand more specialized knowledge). Don’t expect everyone in your team to do everything. If there’s a high mismatch between skill levels sustained over time, this may lead to attrition in your team.
- App Integration: Model Conversion: Transforming the model into CoreML and TensorFlow Lite formats for iOS and Android, respectively. Model conversion can be complex, especially when fine-tuning the architecture of layers in the models used.Model Integration: Integrating the machine learning models into the mobile app, which may involve processing the output of the models. This step can be particularly challenging for custom models.
- Testing: Depending on the use case, testing may require, not only moving to a specific location or hardware, but also domain-specific skills to reproduce edge case scenarios, such as playing the guitar or being a reasonably skilled billiard or soccer player.
In a nutshell, when it comes to team conformation, make sure you have access to talent that can navigate all of the tasks that will be part of the project. Be ready from going from the python math-savvy model trainer, or mobile app developer, to the business domain expert. Don’t expect one profile to be able to cruise through all activities.
5. Model size matters
The size of a model can have a significant impact on various aspects of the performance of an app. For instance, when it comes to app loading times, larger model sizes can result in slower loading times, especially in Android. Similarly, execution times of the model, which directly affect the frames per second (FPS) and ultimately the user experience (UX) when processing video streams, can also be affected by the size of the model. The total app size is also influenced by the size of the model, which can affect the download time of the app for users.
It is probably a good idea to start with an off-the-shelf model architecture. However, it can pay off to fine tune models to get an extra mile of accuracy in lightweight architectures of layers.
Evaluate implementing lazy downloading of the model in your application, as this can yield UX benefits, particularly reducing the initial download size and improving app loading times. This capacity is supported in both iOS and Android.
6. Device chipset matters
When developing AI-driven mobile apps, it is crucial to consider the device chipset, as it directly impacts the user experience. This is particularly true for machine learning models that run on the mobile device’s chipset, and that require real-time or near-real-time processing. Some of the insights we’ve gained are:
- iOS Chipsets Stand Out. iOS chipsets are known for their ability to handle model execution in almost real-time. Developers can run multiple CoreML models on iOS devices for object detection and classification tasks several times per second. Having said this, know that different iPhone generations have different chipsets, leading to varying execution times, frames per second (FPS) processed, and user experiences. This disparity makes it challenging to provide a uniform experience across all devices.
- Segment Target Devices. The spread out there in mobile device chipsets is way too wide. Specially on the Android side. Some older generation devices just do not have the horse power to run some of the models your application may need. In some cases it’s better to prevent your app from being installed in those devices, than to offer a very poor user experience. To approximate this segmentation in iOS you can enable Info.plist keys such as MinimumOSVersion, and UIRequiredDeviceCapabilities.
- Dynamic Model Selection. Consider using different models with varying input sizes that can be dynamically instantiated depending on the device’s chipset capabilities. More performant yet less accurate models may provide a satisfactory user experience on slower chipsets, while more powerful and computationally expensive models can offer an optimal experience on the latest devices.
- Android Device Spread The Android device ecosystem is notoriously diverse, making it difficult for developers to create AI-powered mobile apps that work seamlessly across various devices. This diversity has proven to be a challenge for developers, and is one of the main reasons many AI-powered app titles are available only on iOS.
7. Model input size matters
Incorporating AI into mobile applications at some point becomes a quest to find all the possible ways in which you can make things run more efficiently on a mobile device. In this quest, one of the levers available to make models more performant so that they can run in close to real time on edge devices is to reduce their input size. This is specially true for computer vision tasks, where the model takes an image as input, and then usually convolutional networks take it from there.
At the application level it’s possible to configure the captured video stream, so that smaller images are feed into the model. Reducing the input size of a model can significantly improve its performance, and therefore the number of frames processed by second. However, it is essential to consider the trade-off between model performance and accuracy. Decreasing the input size too much may result in a loss of critical information, ultimately affecting the model’s ability to perform its intended task properly.
To find the optimal input size for your AI model, it is recommended to experiment with different input sizes and observe the resulting effects on performance and accuracy. By carefully adjusting the model’s input size, you can achieve a balance that allows for efficient real-time processing on mobile devices without sacrificing the quality of the AI’s output.
8. Implement a smart debugging workflow
Integrating AI into mobile applications introduces unique challenges to the debugging process. Reproducing specific scenarios in AI-powered mobile apps can be difficult, as it requires capturing and reproducing all the necessary sensory data for the models to run. This becomes even more complex when dealing with multiple sensors and unique hardware or stage conditions. Moreover, debugging can be a time-consuming and expensive process. For example, when processing video feeds in real-time, debugging workflows can become inefficient as the test suite grows to encompass hours of footage. To overcome these challenges and optimize the debugging workflow, it is crucial to implement smart strategies that can significantly reduce the time and resources required for testing.
One effective approach is to get the execution of AI models out of the way whenever possible. By leveraging dynamic programming, developers can store and reuse the results of previously computed test cases, thus avoiding the need for redundant processing by the AI models. This can lead to substantial increases in test suite execution speeds. In our case this allowed for a x10 increase in execution times of our test suits.
9. Not everything needs to be a Machine Learning Model
Not everything that can be implemented using a machine learning model should be implemented using a machine learning model. There are cases where a rules-based approach can be more effective, providing substantial performance benefits. Some functions that were candidate to be implemented as a ML model we were able to implement with a rules engine achieving reasonable results, while executing in a fraction of the time it would have taken a complex model to do it.
Event detection problems may fall in this category. In the implementation of the event engine to achieve multi object tracking in real time for DrillRoom we ended up defining several threshold variables to trigger events. Given some time we found out that we would benefit from applying optimization techniques to find the optimal values for our trigger threshold variables, using our test suite as ground truth. This code became a middle ground between the black box of neural network models and the structured code of our application.
If you come across similar scenarios do checkout Differentiable Programming. It’s a promising research paradigm that proposes applying optimization techniques, such as gradient descent to structured code, so that it “learns” how to behave best from some ground truth.
10. It works well 90+% of the time
When starting a mobile AI project it is important to recognize that machine learning models approximate a function based on a given ground truth, improving fitness in the training process. Achieving 100% accuracy is not always feasible or even possible, especially when dealing with lightweight versions of the models meant for edge computing devices. There’s also the case of achieving almost perfect accuracy against a testing dataset only to find out that the reality out there includes many more scenarios underrepresented in the training data.
Be prepared to deploy models with less than perfect accuracy, understanding that the Pareto principle often applies: 20% of the effort usually goes to achieving 80% accuracy, while it may take 80% of the effort to push accuracy from 80% to the 90+% range.
If the model’s accuracy is far from what’s required by your target scenario, it could make sense to consider alternative approaches, such as running the models on a server, leveraging more computing power, running larger more accurate models for the problem.
AI can be leveraged in the implementation of mobile applications to drive innovation, helping create smarter mobile experiences. ML models are valuable tools for creating cutting-edge mobile applications. The state of the art in mobile chipsets makes it possible to do close to real time execution of ML models, specially on the iOS side of things.
In this article we did not cover many challenges in data handling, model training, and ML Ops, which are crucial in delivering successful AI-driven mobile solutions. We may expand on those in future posts.