"Owning It" is one of the most important values at Zillow, and to keep true to the parody-style of Yillow, this is the page where I "Own It" and tell you just how bad the Yestimate really is. There's so many nuances to the performance of our model compared to the Zestimate that I just had to make a whole subpage devoted to them. I'll be going over how the Zestimate and the Yestimate really aren't even predicting the same thing, the differences in the span of each model, and so much more!
The first and most important point is that the Zestimate is not trying to just predict the price that a house was sold at. Taken straight from Zillow, "the Zestimate is Zillow's best estimate of this home's market value," which is oftentimes very different than the price that a house was sold at! In contrast, my model assumes that the price that a house was sold at is the market value of the house. So, we should actually think of the Zestimate's deviation from the actual price as the average difference between the market value of a house and the price that it was sold at. This is a super important distinction, because this means that the $30k that the Zestimate deviates from the price of a house isn't the Zestimate's error but rather the Bayes error, or minimum possible error that a house's price can be predicted, for any model predicting house prices (assuming the Zestimate is perfect, which to be honest it probably is pretty darn close to being perfect).
Even with this in mind, this still doesn't take into account how my model is predicting a very, very small proportion of all houses out there. To be specific, the Yestimate is only trained and evaluated against Single Family and Manufactures homes in Washington, sold about within a year ago, that have information regarding the main features discussed on the Acquisition page, and only when the sold price is not an outlier. Change any of those factors, and the Yestimate is not much better than np.random. On the other hand, the Zestimate has to be trained on every single house in its dataset, so the fact that the Zestimate can beat the Yestimate at its own game on this tiny subset of houses is remarkable.
There's probably so many other reasons why the Yestimate has an unfair advantage over the Zestimate, but these were just the most prominent ones that came to mind.
Now that we've talked about all the ways the Yestimate has an unfair advantage over the Zestimate, let's talk about all the ways the Zestimate has an unfair advantage over the Yestimate. First and foremost, Zillow has over 100 million houses in its dataset, which puts my 5,000 to shame. Not only this, but Zillow probably has thousands of features about each house in its database, and Zillow probably takes into account tons of opinions from real estate companies and has an army of home evaluation experts. Not only that, but Zillow has some of the best data scientists and machine learning engineers in the world that have been working on improving the Zestimate since 2006.
Yillow, on the other hand, has a single overworked college student living off a diet of ramen and coffee whose emotional support comes from a hamster named Ghiarrhei and who gave himself a deadline of three weeks for this project to make sure that I have enough time to pass my classes and get a job after graduation. Furthermore, I'm using Zillow's own dataset, which means that the Yestimate is being evaluated against a testing dataset its never seen before while the Zestimate is evaluated on its training dataset. If you just got shivers, so did I. Comparing how one model performs on a testing dataset with how another model performs on a training dataset is Machine Learning Blasphemy! I could excuse everything else, but comparing testing and training performances crosses the line!
All jokes aside, the Zestimate and the Yestimate are obviously incomparable. However, I still think that the Yestimate's ability to get a mean absolute error between the result of the model and the sold price 2.1 times that of the real Zestimate is pretty good when you take into account all the factors that go into both models. It shows that the Yestimate has an idea of what's going on in the housing market it was trained on about half as well compared to the real Zestimate on the data it was trained on. This indicates that the data science and machine learning techniques I used worked, and that's a result I'm more than happy with.
If I had to summarize all the nuances of the Yestimate's performance compared to the Zestimate, it would be this: the models are essentially incomparable, but since there's nothing else to compare the Yestimate to, we'll say that it's about half as well as the Zestimate on a very, very small subset of houses. This definitely isn't as flashy as saying "I made a model that predicts house prices half as well as Zillow!", but it's the truth. And in the spirit of being a parody of Zillow, we also need to abide by their values, most notable "Owning It." So, I'll own up to it: the Yestimate is an awful predictor of house prices in general, but it's pretty good considering the resources it had. In other words, the fact that it can predict house prices about 2.1 times as bad as the Zestimate for the data it was trained on shows that the data science and machine learning techniques used worked, and in my opinion, that's what matters.
The "next" section of my website talks about data trends, but after this point most of the pages are independent of each other, so feel free to go wherever.
Next Page: Price TrendsOr, if you want, you can jump straight to my interactive maps to see for yourself how the Yestimate predicts house prices. Or go to any page, after this point you have a full understanding of how I got my data and arrived at the Yestimate, so feel free to explore!
Yillow was created by Brandon Bonifacio with the help of a variety of sources which are credited on our References page.
Come check out my personal website or connect with me on LinkedIn!
Disclaimer: Yillow is an independent project, not affiliated with or endorsed by Zillow in any way. It is created for educational purposes and is not intended to infringe on any rights of Zillow.
No rights reserved - whatsoever.