Exploring the Most Transformative Trends in Commercial Real Estate | PROPMODO METATRENDS 2023→

The Linguist’s Qualm with Lease-Reading AI

The concept of artificial intelligence doesn’t have the same dystopian ring to it thanks to film tropes that have shaped the way we think about the technology over the years. Today, AI is used as a data system that perceives its environment to help analysts with their research, diagnose diseases, and find that perfect post for your social media feed. Hardly the blood-crusted robotic boogeyman that we’ve all been led to fear. 

The terror of Skynet’s Terminators or the Matrix’s Sentinels hasn’t scared the real estate world from buying into the technology. Currently, AI is being used by some real estate firms as a glitzy solution for the tedious and time-consuming task of lease abstraction. However, the idea that software can take all the pain away from finding those needle-in-the-60-page-haystack details and formatting them into an abstract is enticing, the technology isn’t reliable enough to operate independently. Mainly because today’s AI can’t keep up with the complexities of human language.


Most of the time “AI” pops up in the news, it’s usually just “machine learning” under a slick marketing veneer. Machine learning, which is what we mostly use today, is an application or subset of AI that allows machines to learn from data without being explicitly programmed (AI is a larger idea that aims to produce intelligent machines that can replicate human thinking capabilities and behavior). 

But let’s keep the AI title for argument’s sake. AI technology for lease abstraction can speed up the process of extracting essential data from a large number of hard-copy lease documents. Many businesses are considering AI as an alternative to manual lease abstraction, which is cumbersome and can take up to five hours per lease to distill all qualitative and quantitative data elements. When you have hundreds or even thousands of leases to abstract, it can quickly add up in terms of time and resources.

But AI is far from a perfect solution when it comes to lease abstraction for several reasons, according to Bart Waldeck, Chief Strategy and Product Officer at Tango, a software provider that applies AI systems to real estate and facilities management. He actually chuckled when the subject came up. “When it comes to AI automating your entire workload, especially in the lease abstraction sector, it’s more of a marketing buzzword than a reality,” he said. According to Waldeck, lease-reading AI systems aren’t accurate enough to operate independently. “It’ll get you about 50-60 percent of the way there. It’s definitely getting better, but we’re far from where it’s necessarily better than doing it the traditional way. That comes down to the fact that natural language processing is generally an evolving science.” 

The “natural language processing,” or NLP, that Waldeck is referring to is yet another branch of AI. NLP allows computer systems to understand human language almost as well as a human can. Computational linguistics, which is rule-based (the most common form of AI where sets of if-then guidelines result in predetermined outcomes) human language modeling can be combined with statistical, machine learning, and deep learning models to create NLP. These technologies, when used together, allow computers to process human language in the form of text or speech data and “understand” its full meaning. But it’s the if-this-then-that rationale that trips up AI’s understanding of language because human language is often too complex to be understood by a single syntactical principle. Plus, language likes to change, and the rate of language change is something that historical linguists have trouble agreeing on.

The turn of a phrase

Determining exactly how long it takes a language to evolve on its own has been a contentious problem for historical linguists. In the 1950s, linguist Morris Swadesh cooked up the idea of lexicostatistics, a standardized method that calculated the rate of language change (even though it had nothing to do with statistics). Swadesh insisted that all languages, no matter their nuances, replaced words at a constant rate over time, so you could calculate how old a given language is. 

If you took a list of 100 basic words (the kind that children learn first that don’t shift much from generation to generation) and compared it to the same list of words in the descendant language, what you should find is that some words have been replaced, but most are essentially the same word. Take old English compared to modern English, it might take an expert to recognize most similarities. Still, it’s not too difficult to comprehend that “fæder” would eventually become the word “father,” or that “grēne” would mature into “green,” or that “and” would evolve into… “and.” According to Swadesh, after accounting for shifts in spelling and pronunciation, after 1,000 years, languages keep about 86 percent of their original vocabulary.

Now, if that were the case, then surely AI systems wouldn’t have too much trouble comprehending leases since automated language processing thrives in a more structured environment and a high degree of predictability. Unfortunately, that’s not the case for two reasons.

The first is that historical linguists accept that the rate of language change can vary wildly for a multitude of factors (for instance look at how quickly new slang replaces old slang). The second problem for lease-reading AI systems is that because language is so dynamic, the sheer amount of leases it has to comb through really exposes the AI linguistics problem. That’s an issue for companies that have thousands of leases. As Waldeck explains, “Each lease has different language and legalese, which is tied to a locale, so the exact terminologies used in one lease can differ from the next one. For example, in a retail setting, you may have a radius restriction or a co-tenancy clause that cites specific geographies, roads, anchor tenants, etc. This is a problem for current AI-based lease abstraction software.” 

Different legal terminologies that mean the same thing actually tie back to the first problem because the speed of change in legal language is its own can of worms. Legal language likes to exploit vague phrases like “do business” to offer flexibility (so that the language can be a scapegoat if any legal disagreements occur). That’s another headache for the AI to contend with.

Because AI-based lease abstraction software is far from perfect, your company may need to seek the help of experienced human lease professionals to review what the tool has put together. Some AI systems may only duplicate any text that fits a key term, or ignore synonyms for a word. ​​If you only rely on AI-based software to abstract your lease, don’t be surprised when inaccuracies pop up, or if your entire data collection gets derailed. Language is a tricky thing to understand, even for a computer. 

Image - Design