Summary

We have reached the end of our journey with Yillow, and I think it would be good to reflect everything we have learned and accomplished. I hope to take all of these skills and apply them to future projects and endeavors.

What I Accomplished

This project started with the creation of a dataset of over 5,000 houses with 30 features, a task that took a culmination of many hours of respectful webscraping, data cleaning, and feature engineering. I also explored using ConvNets to extract more features and found that this enhanced the accuracy of a smaller dataset, but this was not implemented because of the difficulty associated with obtaining the images for the 5,000 houses and also the processing power of my laptop.

I then used this dataset to tune and train nine machine learning models, and then I ensembled multiple models together to create the Yestimate. My Yestimate achieved a testing mean absolute error that is only 2.1 times that of the real Zestimate for the dataset I made - a performance I thought was far out of reach. Next, I performed a thorough data analysis to understand what house features were important in determining house prices, and I also explored how these features correlated with each other.

Next, I created two interactive maps in the style of Zillow's own map that allows us to visualize the spatial distribution of house prices. Finally, I wanted to apply the Yestimate to something with real-world use, so I used my Yestimate to find houses that were undervalued and were a great deal. Who knows, maybe I'll use this one day to find my dream home!

On top of all of this, I challenged myself to present it all in a website that was both engaging and informative in a style that parodies Zillow's own website - something that probably took as much time as the rest of the project! I did this to become a more well-rounded programmer, and I feel much more confident in my general programming abilities as a result of it.

What I Learned

The biggest lesson that I learned from this project is just how important it is to have a good dataset. I spent so much time making a dataset, then trying it on the ML model, then going back and trying to make it better, and then repeating this process over and over again. In the future, I'll be sure to spend extra time making the data as good as it can be because it will save me a lot of time in the long run.

I've never webscraped before, but it was fun to learn how to do it. I cemented my abilities to clean and engineer data as well as tuning, training, and ensembling machine learning and ConvNet models, and I even learned how to use Optuna, which I haven't used before. I also gained a lot of experience in data analysis and visualization, which I think is a very important skill to have and taught me a lot about what exactly makes houses expensive.

On top of this, I created an interactive map with the use of Folium, which I had never used before. In terms of the Home Hunt, I learned that there really are some hidden gems when it comes to buying houses, and I think that this project has made me much more prepared for when I (fingers crossed) buy my own home one day. I also learned a lot about web development and how to create a website that is both engaging and informative.

Challenges & What I Could've Done Better

The biggest challenge I faced was that I had to do so much more than just machine learning and data science. Although ML and data science are what I've been trained to do, I had to do so much just so I could get to the point where I had a csv of data that I could clean, engineer, and train ML models on! Then, I couldn't just stop there -- I had to perform in-depth analyses, make an interactive map, draw real-world conclusions, and then present it all on a website! For someone who thought that this project would be a breeze, I was in for a rude awakening and had to learn so many skills in order to make Yillow a reality.

There were multiple ways this project could have been better, and the first one that comes to mind is if I could have also scraped the images of the houses and used ConvNets to extract more features from them. I think that this would have improved the accuracy of the model and also been a really cool addition to the project. Furthermore, there were more features that I could have webscraped if I spent more time on it and had better foresight, like Walkability Scores, elevation, distances from certain landmarks, neighbor's home prices, etc. Furthermore, the comparison of my Yestimate to Zillow's Zestimate is hardly fair - my dataset is made to work on a very small and precleaned dataset of houses whose prices follow nice distributions. Zillow's Zestimate has to work for all houses, so naturally it's going to be at a disadvantage to my Yestimate when it comes to this small subset of houses I made. This makes my Yestimate seem much better than it actually is, and if I were to continue this project, I would want to expand my dataset greatly so I could go toe-to-toe with Zillow and predict all types of houses along with outliers.

Final Note

As I wrap up, it's vital to reflect on the purpose of this project - it was an educational endeavor aiming to explore the world of data science, implement machine learning techniques, and apply these to a real-world scenario. While I strived for a robust and accurate model, my primary focus was learning, exploring, and pushing the boundaries of what I can achieve with data.

I hope that Yillow has been insightful, shedding light on the data science workflow and the potential of machine learning. As I start my full-time career after graduation next spring, I look forward to unearthing more knowledge and continuing to learn.

Next Page: References

Intro

Data Science

Machine Learning

Pricing Analysis

Yillow Maps

Home Hunt

Summary

References

Intro

Data Science

Machine Learning

Pricing Analysis

Yillow Maps

Home Hunt

Summary

References

Intro

Data Science

Machine Learning

Pricing Analysis

Yillow Maps

Home Hunt

Summary

References

Intro

Data Science

Machine Learning

Pricing Analysis

Yillow Maps

Home Hunt

Summary

References

Summary

What I Accomplished

What I Learned

Challenges & What I Could've Done Better

Final Note