When you feed enough information to a machine learning algorithm eventually it will find correlations between things that no one previously thought were connected. Sometimes these can be hilariously coincidental. For example, the number of people who drowned by falling into a swimming pool correlates with the number of films Nicolas Cage appeared in:
But other times this process can reveal some really important knowledge. In the lending industry there have been a lot of advances in establishing creditworthiness by using correlation between factors like battery life and name capitalization. Whether this is an ethical way to determine who to lend to is another issue but there at least seems to be enough evidence to be actionable to lending institutions.
Unfortunately, getting these insights isn’t as easy and uploading more and more data into a database. Many of the inputs need to be worked into a mathematical model. That means creating a numerical value for things that might not be easily categorized using only numbers. Words fall into this category. They are subjective, have multiple sometimes complicated meanings. But a new algorithm created by Google called Word2vec has done just that.
Or Hiltch, CTO of Skyline AI, explained it to me this way, “When we convert numbers it needs to keep its lexical properties. One way of achieving this is by using Google’s Word2vec. The idea is to build a bucket of numbers, or vectors, that represent words where the words are positioned in space so that words that share common context are located in close proximity to one another.” Imagine the entire dictionary laid our in 3D space. This system graphs every word in a language into a lyrical galaxy. Related words have similar vectors and therefore are closer together in the nebulous. “For instance if you have words like city and town they would be located closer to each other than the word for apple,” Or explained.
This allows data scientists like Or to input words into his models and run correlation calculations. Or’s company works with some of the world’s biggest asset managers to use AI to enhance their strategy. One of the things that they found out when they entered every multifamily building’s name in the U.S. into their models was that the name itself had a rather strong correlation on the price of the asset compared to similar buildings.
Some of the terms that had a positive correlation with price (meaning that they often were associated with buildings that were valued more than their peers) were New York, Hamilton, Villas, Residences, by Windsor, and lofts. Words on the other end of the spectrum include Spanish, Astoria, Raintree and Pangea.
Or was quick to point out that correlation does not mean causation. “I would not take this list to mean that if you change the name of an apartment building to include one of the better performing words that you will make it more valuable,” he said. “But if I was thinking about naming a property I would probably stay away from some of the names on the bottom of our list.”
This is only one example of how machine learning can help us understand the nuances of property pricing. Or thinks that these insights will only increase as they are able to add more granularity to the data. “Take weather data, for instance,” he said. “If you use average monthly temperatures then there is a good chance you will not get much value from that. Where it becomes interesting is when you can add new features to that data for example number of consecutive days it was raining and then you can characterize an area.”
Adding new features to data (or featurization as data scientist like to call it) takes a level of critical thinking that makes analytics both an art and a science. That means that it still takes a lot of talented people to create and test these models with more and more features. “Thinking about adding a layer like this is something that machines are not really good at yet,” Or explained.
Since we know that something as inauspicious as a building’s name can actually be a factor in what price the market gives it then we must think about how many other things are contributing to the real value of a building that no human has ever thought of and no machine has ever identified.
The idea is to build a bucket of numbers, or vectors, that represent words where the words are positioned in space so that words that share common context are located in close proximity to one another.
For instance if you have words like city and town they would be located to each other than the word for apple.
One of the non-mathematical features that we found to be highly correlated to the asset value is the assets name. Our model will predict different values for the same asset with a different name because it learned how they correlate.
It is important to note that this means that there is dependence between variables but has no bearing on correlation. It is may be more likely that assets named Lakeside are closer to lakes. I wouldn’t buy a property for a name but I might look at names with big negative correlation and avoid naming my asset one of those.
It could be non-linear correlation. The change of the name does not have the same impact on all assets. The change could be big or small depending on the other features of the assets.
The word avalon or havana or camden are consistently correlated with higher prices. The frequency of a name matters a lot. We are only dealing with apartment buildings in the us.
We begin by looking at all historical transaction. The first layer uses other data like Libor, rate, school grade, weather a lot of stuff. Features can be much more complex. We treat rental values as a signal that will impact the values but it might take a few years for it to happen.
If you want to look at something like weather if you use average monthly temps then there is a good change you will not get much value from that. Where it becomes interesting is when you can add new features to that data for example number of consecutive days it was raining and then you can characterize an area. Thinking about adding a layer like this is something that machines are not really good at yet.