Same Code, Two Different Worlds: How an Adaptive Algorithm Solved the OCR Problem for Light and Dark Mode
Anyone who has tried to make a computer read text from a photo knows that Optical Character Recognition (OCR) is often a torturous path. And if that photo is of a computer screen? That’s when you enter a whole new level of challenges. Glare, annoying moiré patterns, and a general lack of sharpness from photographing a pixel grid all come into play.
In one of my recent projects, I was tasked with building the core of a mobile app in Kotlin, where a key feature was to precisely extract text from a photo of a computer screen. I knew that without a solid image “cleaning” mechanism, the entire feature would be useless, and the OCR engine would return digital gibberish, not meaningful text.
The Main Enemy: Unpredictability
I quickly discovered that the biggest problem wasn’t the noise itself, but its variability. The key challenge turned out to be the difference between light and dark mode in operating systems. A set of filters and parameters that worked great for black text on a white background failed completely with white text on a dark background.
I could have tried to find one universal set of settings, but it would have been a compromise that performed poorly in both cases. I knew the solution had to be smarter. It had to be adaptive.
The Solution: An Algorithm That First Looks, Then Acts
Instead of forcefully imposing a single solution, I decided to first teach my code to analyze the problem. I created a function in Kotlin that, as a first step, checks the average brightness of the image. Based on this, it can determine with high confidence whether it’s dealing with a light or dark mode interface.
Only then, armed with this knowledge, does it dynamically select a completely different set of tools from the OpenCV library—a different thresholding mode and different values for the operations that remove digital “noise.” It’s like a doctor first making a diagnosis and only then prescribing medicine.
The Heart of the Solution: Code That Adapts to the Background
All this “magic” happens in one well-thought-out function. The following snippet from my OpenCvEnhancer object shows the key decision-making moment where the algorithm chooses a processing strategy depending on the background.
fun preprocessForTesseract(srcMat: Mat): Mat? {
try {
val grayMat = Mat()
Imgproc.cvtColor(srcMat, grayMat, Imgproc.COLOR_BGR2GRAY)
// Key step: I check the average image brightness to detect the mode
val avgBrightness = Core.mean(grayMat).`val`[0]
val isDarkBackground = avgBrightness < 100 // My threshold for "dark mode"
// Dynamic selection of thresholding parameters
val thresholded = Mat()
Imgproc.adaptiveThreshold(
grayMat,
thresholded,
255.0,
Imgproc.ADAPTIVE_THRESH_MEAN_C,
// If the background is dark (dark mode), I invert the filter's action
if (isDarkBackground) Imgproc.THRESH_BINARY_INV else Imgproc.THRESH_BINARY,
// And I select different parameters for "cleaning"
if (isDarkBackground) 31 else 45,
if (isDarkBackground) 11.0 else 15.0
)
grayMat.release()
return thresholded
} catch (e: Exception) {
// ... error handling
return null
}
}
The Result: From Chaos to Stability
Introducing this adaptive logic brought immediate and measurable results. The OCR accuracy increased dramatically, and the application handled a photo of a Windows Notepad just as well as a developer’s terminal on a dark background.
Most importantly, an unpredictable and unreliable feature was transformed into a stable and trustworthy component. My OCR module became a solid foundation that unlocked the ability for the rest of the application to be developed without issues.
I love challenges like this because I believe that in programming, the devil is in the details. Instead of looking for shortcuts, I prefer to deeply understand the unique problems—like those with photographing screens—and build a solid, flexible solution that will work in the real world.
Are you facing a tough technical challenge in your mobile app where standard libraries are failing? Do you need a solution that can handle specific, unpredictable data?
Contact me—I’d be happy to help you find an effective and robust solution.
