We have reached the end of our journey with Yillow, and I think it would be good to reflect everything we have learned and accomplished. I hope to take all of these skills and apply them to future projects and endeavors.
This project started with the creation of a dataset of over 5,000 houses with 30 features,
a task that took a culmination of many hours of respectful webscraping, data cleaning, and feature
engineering. I also explored using ConvNets to extract more features and found
that this enhanced the accuracy of a smaller dataset, but this was not implemented because of the difficulty associated
with obtaining the images for the 5,000 houses and also the processing power of my laptop.
I then used this dataset to tune and train nine machine learning models, and then I
ensembled multiple models together to create the Yestimate. My Yestimate achieved a testing mean absolute error that is only
2.1 times that of the real Zestimate for the dataset I made - a performance I thought was far out of reach. Next, I performed a thorough data analysis
to understand what house features were important in determining house prices, and I also explored how these features correlated
with each other.
Next, I created two interactive maps in the style of Zillow's own map that allows us to visualize the spatial distribution of house prices.
Finally, I wanted to apply the Yestimate to something with real-world use,
so I used my Yestimate to find houses that were undervalued and were a great deal. Who knows, maybe I'll use this one
day to find my dream home!
On top of all of this, I challenged myself to present it all in a website that was both engaging and informative in a style that
parodies Zillow's own website - something that probably took as much time as the rest of the project! I did this to become a more
well-rounded programmer, and I feel much more confident in my general programming abilities as a result of it.
The biggest lesson that I learned from this project is just how important it is to have a good dataset. I spent so much time making a dataset, then trying
it on the ML model, then going back and trying to make it better, and then repeating this process over and over again. In the future, I'll be sure to spend
extra time making the data as good as it can be because it will save me a lot of time in the long run.
I've never webscraped before, but it was fun to learn how to do it. I cemented my abilities to clean and engineer data as well as
tuning, training, and ensembling machine learning and ConvNet models, and I even learned how to use Optuna, which I haven't used before. I also gained
a lot of experience in data analysis and visualization, which I think is a very important skill to have and taught me a lot about what exactly makes
houses expensive.
On top of this, I created an interactive map with the use of Folium, which I had never used before. In terms of the Home Hunt, I learned that there
really are some hidden gems when it comes to buying houses, and I think that this project has made me much more prepared for when I (fingers crossed)
buy my own home one day. I also learned a lot about web development and how to create a website that is both engaging and informative.
The biggest challenge I faced was that I had to do so much more than just machine learning and data science. Although ML and data science are what I've
been trained to do,
I had to do so much just so I could get to the point where I had a csv of data that I could clean, engineer, and train ML models on! Then, I couldn't just
stop there -- I had to perform in-depth analyses, make an interactive map, draw real-world conclusions, and then present it all on a website! For someone
who thought that this project would be a breeze, I was in for a rude awakening and had to learn so many skills in order to make Yillow a reality.
There were multiple ways this project could have been better, and the first one that comes to mind is if I could have also
scraped the images of the houses and used ConvNets to extract more features from them. I think that this would have improved the accuracy of the model
and also been a really cool addition to the project. Furthermore, there were more features that I could have webscraped if
I spent more time on it and had better foresight, like Walkability Scores, elevation, distances from certain landmarks, neighbor's home prices, etc.
Furthermore, the comparison of my Yestimate to Zillow's Zestimate is hardly fair - my dataset is made to work on a very small and precleaned dataset of houses
whose prices follow nice distributions. Zillow's Zestimate has to work for all houses, so naturally it's going to be at a disadvantage to my Yestimate when it
comes to this small subset of houses I made. This makes my Yestimate seem much better than it actually is, and if I were to continue this project,
I would want to expand my dataset greatly so I could go toe-to-toe with Zillow and predict all types of houses along with outliers.
As I wrap up, it's vital to reflect on the purpose of this project - it was an educational endeavor aiming to explore the world of data science, implement machine learning techniques, and apply these to a real-world scenario. While I strived for a robust and accurate model, my primary focus was learning, exploring, and pushing the boundaries of what I can achieve with data.
I hope that Yillow has been insightful, shedding light on the data science workflow and the
potential of machine learning. As I start my full-time career after graduation next spring,
I look forward to unearthing more knowledge and continuing to learn.
Next Page: References
Yillow was created by Brandon Bonifacio with the help of a variety of sources which are credited on our References page.
Come check out my personal website or connect with me on LinkedIn!
Disclaimer: Yillow is an independent project, not affiliated with or endorsed by Zillow in any way. It is created for educational purposes and is not intended to infringe on any rights of Zillow.
No rights reserved - whatsoever.