AirBnB Recommender App
My first Python Programming class was very hands on and we mainly did a lot of coursework. Here's a final project I did using all the skills learned in the mini 21h course.
Hackwagon Academy - DS101
AirBnB Project
Learning Outcomes:
- Learn how to translate business requirements into workable applications
- Declare variables, and manipulate the variables to perform arithmetic operations
- Create a list, append new elements to a list, remove elements from list, and access elements within a list
- Create a dictionary, access data, and update information within the dictionary
- Be able to aptly make use of if and nested if constructs
- Variable conversion
- Produce visualisations
- Able to come up with insights based on the data
#Before you start, please perform the following 2 steps:
#1. Rename the file to <First_Name>_<Last_Name>_DS101_Lab_1 e.g. john_doe_DS101_Lab_1
#2. Fill in your details here:
#Name :Barbara Yam/span>
#Start of Course Class(Edit accordingly): 15 Jan 2019, 7pm
# FOR TA/INSTRUCTOR
# Total Marks: 100 / 100
# Part 1: 5 / 5
# Part 2: 25 / 25
# Part 3: 10 / 10
# Part 4: 60 / 60
References
Important Collections Functions
Creation
Collection Type | Function | Examples |
---|---|---|
list |
new_list = [] new_list = [1,2,3,4] |
|
dict |
new_dict = {} new_dict = {'a': 1, 'b':2} |
Add / Appending Data
Collection Type | Functions | Examples | Resulting Output |
---|---|---|---|
list |
new_list = [1,2,3] new_list.append(4) |
[1,2,3,4] |
|
list |
new_list = [1,2] new_list.extend([3,4]) |
[1,2,3,4] |
|
dict |
new_dict = {} new_dict['a'] = 1 new_dict['b'] = 2 |
{'a': 1, 'b':2} |
Updating / Changing Data
Collection Type | Functions | Examples | Resulting Output |
---|---|---|---|
list |
new_list = [1,2,3] new_list[0] = 5 |
[5,2,3] |
|
dict |
new_dict = {'a': 1, 'b':2} new_dict['a'] = 10 |
{'a': 10, 'b':2} |
Accessing / Taking Out Data
Collection Type | Functions | x to be | Examples |
---|---|---|---|
list |
3 | new_list = [1,2,3] x = new_list[2] |
|
list of list |
3 | new_list = [[1,2],[3,4]] x = new_list[1][0] |
|
list of dict |
2 | new_list = [{'a':1},{'b':2}] x = new_list[1]['b'] |
|
dict |
2 | new_dict = {'a': 1, 'b':2} x = new_dict['b'] |
CITU Framework & Applied Iterations
- What variables do you need to answer this question?
- Create the results container
- Iterate the input data/list
- Take out the variables you needed in step 1
- Test conditions of each value
- Update the results container when condition is fulfilled
Sorting Values
x = [10,20,50,2,4]
x.sort()
print(x) # [2,4,10,20,50]
x.sort(reverse=True)
print(x) # [50,20,10,4,2]
Further explore the .sort() function in the documentation
Search up 'list .sort() python 3.0'
</hr>
Welcome to your final project of Hackwagon Academy DS101! You've come a long way since the start of this course and if you've been on track with our exercises, you should find this doable.
Airbnb is an online marketplace and hospitality service, enabling people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. The company does not own any lodging; it is merely a broker and receives percentage service fees (commissions) from both guests and hosts in conjunction with every booking. In this project, we aim to use algorithms and libraries to mine the reviews people have submitted on Singapore AirBnB rentals in order to provide descriptive analytics.
Load File
Load the airbnb_data.csv
as a list of dictionaries into a new variable called airbnb_data
. Once you load the data, you should see something like this:
[
{
'listing_id': '1133718',
'survey_id': '1280',
'host_id': '6219420',
'room_type': 'Shared room',
'country': '',
'city': 'Singapore',
'borough': '',
'neighborhood': 'MK03',
'reviews': '9',
'overall_satisfaction': '4.5',
'accommodates': '12',
'bedrooms': '1.0',
'bathrooms': '',
'price': '74.0',
'minstay': '',
'last_modified': '2017-05-17 09:10:25.431659',
'latitude': '1.293354',
'longitude': '103.769226',
'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'
}
...
]
# Read file into a list called airbnb_data
import csv
with open('airbnb_data.csv') as csvfile:
data = csv.DictReader(csvfile)
airbnb_data = []
for row in data:
airbnb_data.append(dict(row))
print(airbnb_data[:2])
Data Cleaning (5 marks)¶
Once this is done correctly, you do not need to change the type for the remaining parts of your project.
Preview your data and clean them to appropriate type. Namely these columns:
overall_satisfaction
price
longitude
latitude
reviews
Expected Output:
{
'listing_id': '1133718',
'survey_id': '1280',
'host_id': '6219420',
'room_type': 'Shared room',
'country': '',
'city': 'Singapore',
'borough': '',
'neighborhood': 'MK03',
'reviews': 9.0,
'overall_satisfaction': 4.5,
'accommodates': '12',
'bedrooms': '1.0',
'bathrooms': '',
'price': 74.0,
'minstay': '',
'last_modified': '2017-05-17 09:10:25.431659',
'latitude': 1.293354,
'longitude': 103.769226,
'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'
}
#Write code below
for row in airbnb_data:
row['overall_satisfaction'] = float(row['overall_satisfaction'])
row['price']= float(row['price'])
row['longitude'] = float(row['longitude'])
row['latitude'] = float(row['latitude'])
row['reviews'] = float(row['reviews'])
print(airbnb_data[:1])
# 5 / 5
Exploratory Data Analysis (35 marks)¶
The data team at AirBnB wishes to find out the answers to a few simple questions on the existing listings in Singapore. Your goal is to manipulate the data you have stored in the list of dictionaries and understand some of the basic statistics of your dataset. The following are some of the common first questions asked.
Q1. List out each neighborhoods and their number of listings (5 marks)
Hint
- Counting with dictionaries </i>
Expected Output:
When you search for ['TS17'], it should give you 342 counts.
#Write code below
neighborhood_listing ={}
for row in airbnb_data:
neighborhood = row["neighborhood"]
if neighborhood in neighborhood_listing:
neighborhood_listing[neighborhood] += 1
else:
neighborhood_listing[neighborhood] =1
print("When you search for ['TS17'], it should give you " + str(neighborhood_listing['TS17']) + " counts.")
# 5 / 5
#can use break to stop the iteration after one row
Q2. List out each neighborhood and their average overall_satisfaction (5 marks)
Note: You should filter out listings whose reviews are 0.
Hint
- Create dictionary where key is the neighborhood_id and value is a list of overall_satisfaction
- Create another dictionary to compute the average </i>
Expected Output:
When you search for ['TS17'], it should give you an average score of 2.859447004608295.
#Write code below
satisfaction_dictionary = {}
for row in airbnb_data:
neighborhood_id = row["neighborhood"]
satisfaction = row["overall_satisfaction"]
reviews = row["reviews"]
if neighborhood_id not in satisfaction_dictionary:
if reviews != 0.0:
satisfaction_dictionary[neighborhood_id] = [satisfaction]
else:
if reviews !=0.0:
satisfaction_dictionary[neighborhood_id].append(satisfaction)
average_satis_dict = {}
for neighborhood_id, list_satis in satisfaction_dictionary.items():
ave_satis = sum(list_satis)/len(list_satis)
if neighborhood_id not in average_satis_dict:
average_satis_dict[neighborhood_id] = ave_satis
print("When you search for ['TS17'], it should give you an average score of",average_satis_dict['TS17'],".")
# 5 / 5
#can filter for reviews != 0 first before finding creating the list
#for key, value in results.items():
# results[key] = sum(value)/len(value)
Q3. List out each neighborhood and their average price (5 marks)
Hint
- Similar to previous question </i>
Expected Output:
When you search for ['TS17'], it should give you an average price of 95.5672514619883.
#Write code below
price_dictionary = {}
for row in airbnb_data:
neighborhood_id = row["neighborhood"]
price = row["price"]
reviews = row["reviews"]
if neighborhood_id not in price_dictionary:
price_dictionary[neighborhood_id] = [price]
else:
price_dictionary[neighborhood_id].append(price)
ave_price_dict = {}
for neighborhood_id, price in price_dictionary.items():
ave_price = sum(price) /len(price)
if neighborhood_id not in ave_price_dict:
ave_price_dict[neighborhood_id] = ave_price
print("When you search for ['TS17'], it should give you an average price of",ave_price_dict['TS17'],".")
# 5 / 5
Q4. Plot a distribution of counts of the overall_satisfaction (5 marks)
Note: You should filter out listings whose reviews are 0.
Hint
- Counting with dictionaries
- Get a list of tuples with
.items()
- Create 2 lists:
- 1 for all the scores labels
- 1 for all the counts
- Plot with the 2 lists </i>
Expected Output:
# Remember to import the relevant library/libraries!
# Write code below:
satis_count ={}
for row in airbnb_data:
reviews = row["reviews"]
satisfaction1 = row["overall_satisfaction"]
if reviews !=0:
if satisfaction1 in satis_count:
satis_count[satisfaction1] += 1
else:
satis_count[satisfaction1] = 1
x= tuple(satis_count.items())
satis_list =[]
count_list = []
for row in x:
satis_list.append(row[0])
count_list.append(row[1])
import matplotlib.pyplot as plt
plt.bar(satis_list, count_list)
plt.title("Distribution of Overall Satisfaction Scores")
plt.xlabel("Overall Satisfaction Scores")
plt.ylabel("Counts")
plt.show()
# 5 / 5
Q5. Plot a geographical representation of all of the listings in Singapore (5 marks)
Hint
- Create a list for latitude
- Create a list for longitude
- Append each listing's latitude and logitude to the lists
- Plot a scatter plot using both lists </i>
Expected Output:
#Write code below
latitude_list = []
longitude_list = []
for row in airbnb_data:
latitude_list.append(row["latitude"])
longitude_list.append(row["longitude"])
import matplotlib.pyplot as plt
plt.scatter(longitude_list, latitude_list)
plt.title("Geographical Representation of All Airbnb Listings in Singapore")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
# 5 / 5
Interpretation (10 marks)
Answer the following questions to better understand the Airbnb dataset.
You're free to make some assumptions
Q1. Why do you think the overall_satisfaction
is in intervals of 0.5 and not otherwise? (5 marks)¶
Answer:¶
Overall_satisfaction is a range from 0 to 5 in intervals of 0.5 so that the input can is limited to 10 numbers for easier statistical analysis. People cannot input numbers like 4.8; for example, they can decide if they would give the listing a 4 or 5 depending on their level of preference and if they are ambivalent between 4 and 5, they could input 4.5. This way, the preference of each consumer will be more distinct.
# 5 / 5
# it also gives you a nice histogram! (=
Q2. Why was there a need to filter reviews greater than 0 in question 2 and 4? (5 marks)¶
Answer:¶
Those with more than 0 reviews are likely to be more reliable listings as there is evidence that there were actual people who have stayed there.
# 5 / 5
AirBnB Visualisation and Price Recommender App (60 marks)
Attempts to create the functions are awarded 2 marks each Scenario: Based on the earlier EDA, the earlier codes were not modular and scalable, hence do not allow the AirBnB team to look into each neighborhood. As such, the AirBnB data team have tasked you to build a simple application to improve the earlier EDA while serving its 2 users: Guests and Hosts. Your objective: Develop an app which will serve the 2 main users: 1. Guests - Visualisation tool to recommend them the best listings based on price and overall satisfaction score in a neighborhood 2. Hosts - Recommend a price to set for their listing in a given neighborhood based on better performing listings!pip install mplleaflet
How do you know if you installed the library correctly? Try running the next cell, if you don't get an error, you are good to go!
import mplleaflet
Building the App
To begin building the App, there are 2 things to do:
- Build the functions
- Test the functions
After we are done building the functions in part 1, we will test them in part 2
def example_function_1(data, x, y, ..):
for i in data:
print(i)
# when using it.. notice that airbnb_data is placed first, followed by the other parameters
example_function_1(airbnb_data, some_x, some_y, ...)
There are a total of 5 functions:
get_all_latitudes
get_all_longitudes
listings_recommender
price_recommender
visualise_listings
get_all_latitudes()
- Functions to get all latitudes given a list of listing_ids (2 marks)¶
Input: airbnb_data
as data
, a list
of listing_ids
Return: A list
of latitudes
#Write code below
def get_all_latitude(data, listing):
latitude_list = []
for row in data:
latitude = row["latitude"]
for item in listing:
if item in (row['listing_id']):
latitude_list.append(latitude)
return(latitude_list)
# 2 / 2
Tester Cell - To test the above function to see if it's working.
Expected Output:
[1.311147]
get_all_latitude(airbnb_data, ['12367758'])
get_all_longitudes()
- Functions to get all longitudes given a list of listing_ids (2 marks)¶
Input: airbnb_data
as data
, a list
of listing_ids
Return: A list
of longitudes
#Write code below
def get_all_longitude(data, listing):
longitude_list = []
for row in airbnb_data:
longitude = row["longitude"]
for item in listing:
if item in (row['listing_id']):
longitude_list.append(longitude)
return(longitude_list)
# 2 / 2
Tester Cell - To test the above function to see if it's working.
Expected Output:
[103.857933]
get_all_longitude(airbnb_data, ['12367758'])
listings_recommender()
- Function to recommend all listings based on a given price, satisfaction score and neighborhood (2 marks)¶
Note:
- Lesser than or equal to that price
- Equal or more than that overall satisfaction score
- In that neighborhood
Input: airbnb_data
as data
, price, overall_satisfaction, neighborhood_id
Return: A list
of listing_ids
#Write code below
def listings_recommender(data, price, overall_satisfaction, neighborhood_id):
list_of_listings =[]
for row in data:
data_price = row['price']
data_satisfaction = row['overall_satisfaction']
data_neighborhood = row['neighborhood']
data_listing = row['listing_id']
if neighborhood_id == data_neighborhood:
if data_price <= price and data_satisfaction >= overall_satisfaction:
list_of_listings.append(data_listing)
return(list_of_listings)
# 2 / 2
Tester Cell - To test the above function to see if it's working.
Expected Output:
['10350448',
'13507262',
'13642646',
'15099645',
'6451493',
'4696031',
'2898794',
'13181050',
'9022211',
'5200263',
'6529707',
'14433262']
listings_recommender(airbnb_data, 60, 5, 'MK03')
price_recommender()
- Function to recommend a price in a neighborhood based on average price and overall satisfaction (2 marks)¶
For this function, we want to build a simple price recommendation function that will give a potential host a suggested price.
To build this, these are the requirements:
- Take all listings in that neighborhood and check for listings with a least 1 review and an overall satisfaction score of 4 or more.
- From that filtered listings, calculate the average price and return that as the suggested price rounded to 2 decimal places.
Input: airbnb_data
as data
, a neighborhood_id
Return: A float
of recommended price
#Write code below
def price_recommender(data, neighborhood_id):
price_dictionary={}
for row in data:
data_price = row['price']
data_satisfaction = row['overall_satisfaction']
data_neighborhood = row['neighborhood']
data_listing = row['listing_id']
data_review = row['reviews']
if neighborhood_id == data_neighborhood:
if data_review != 0 and data_satisfaction >= 4.0:
if neighborhood_id not in price_dictionary:
price_dictionary[neighborhood_id] =[data_price]
else:
price_dictionary[neighborhood_id].append(data_price)
for neighborhood, price in price_dictionary.items():
average = sum(price) / len(price)
return(round((average),2))
# 2 / 2
visualise_listings()
- Function to geographically visualise a given list of listings (2 marks)¶
Input: airbnb_data
as data
, a list of listing_ids
Output: Visualisation of locations the listings (nothing to return)
# Remember to import the relevant library/libraries!
import matplotlib.pyplot as plt
import mplleaflet
#Write code below
def visualise_listings(data, listing):
plt.plot(get_all_longitude(data, listing),get_all_latitude(data, listing),'bs')
mplleaflet.show()
# 2 / 2
Tester Cell - To test the above function to see if it's working.
Expected Output: A visualisation should appear as a new tab in your browser. The listing is between Kitchener Road and Somme Road.
visualise_listings(airbnb_data, ['12367758'])
More functions of your own if you want... (no bonus marks given :'') )¶
Testing
Here, we will test if your functions are working as they are supposed to.
User - An Airbnb Host
Imagine now you're an Airbnb host and you are going to use the app you've developed to ask for a recommended price to list your place.
Based on your assigned neighborhood, what is the recommended price for your neighborhood (15 marks)
Expected output: 66.28
neighborhood_to_test = 'TS17'
#Write code below
price_recommender(airbnb_data,'TS17')
# 15 / 15
User - An Airbnb Guest
Imagine now you're an Airbnb guest and you are going to use the app to find a list of listings you want based on your search filter/restrictions.
Based on your assigned price, overall_satisfaction and neighborhood, using the functions created above and plot them out on a map (35 marks)
Expected output: Visualisation should show the listings are in the Boon Keng / Farrer Park areas
If it's working, a new tab will pop out. This is normal.
neighborhood_to_test = 'TS17'
price_to_test = 100
overall_satisfaction_to_test = 4
#Write code below
listing_visual = listings_recommender(airbnb_data, price_to_test, overall_satisfaction_to_test, neighborhood_to_test)
get_all_latitude(airbnb_data,listing_visual)
get_all_latitude(airbnb_data, listing_visual)
visualise_listings(airbnb_data, listing_visual)
# 35 / 35