Finding a good flat which is near to your work place and is also near to e.g. the kindergarden/school of your kids, your favorite park etc. can be very difficult. Unfortunately the existing search engines in Germany for apartments like Immoscout, Immowelt, Immonet don't support to compute the travel time for an apartment to some destinations. Here I want to show you how to use Immospider to do that.
Immospider is a python program that crawls the Immoscout24 website. It is based on ideas from http://mfcabrera.com/data_science/2015/01/17/ichbineinberliner.html and https://github.com/balzer82/immoscraper . But it is faster and more flexible.
Immospider is using the popular python framework https://scrapy.org/ . To install you need Python 3. Then you can clone this repository and install the requirements via
pip3 install -r requirements.txt
This should install scrapy and the googlemaps package for you. To use it you also need an API key for the googlemaps API. You should follow the instructions at https://github.com/googlemaps/google-maps-services-python#api-keys to get your API key.
Let's assume you want to move to Berlin. You will work at some fancy startup near Alexanderplatz but your partner likes to go shopping at the KaDeWe. And you are searching for a flat with 2-3 rooms bigger than 60m^2 flat which should not be more expensive than 1000 Euro. You must enter these requirements in Immoscout24 website and search. If you search for whole Berlin you probably will find more than 500 results. As next step copy the url of your Immoscout search, because Immospider will use it. For the example given here the url is https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00 . With this information you can now start Immospider like
scrapy crawl immoscout -o apartments.csv -s GM_KEY=<Google Maps API Key> -a url=https://www.immobilienscout24.de/Suche/S-T/Wohnung-Miete/Berlin/Berlin/-/2,50-/60,00-/EURO--1000,00 -a dest="Alexanderplatz, Berlin" -a mode=transit -a dest2="KaDeWe, Berlin" -L INFO
The option -o apartments.csv
specifies the output file. The parameter -s GM_KEY=<Google Maps API Key>
sets your Google maps API key. The argument dest="Alexanderplatz, Berlin" -a mode=transit
tells Immospider that you want to calculate the travel time for each apartment to Alexanderplatz using public transportation mode. The argument dest2="KaDeWe, Berlin"
will additionaly compute the travel time via car (the default mode) to KaDeWe. You can have up to three destinations dest1,dest2,dest3
and specify the mode for each destination mode1,mode2,mode3
. The argument -a url=...
must hold the search url from Immoscout. The optional parameter -L INFO
can be added to generate more log output.
If you start Immospider with the given parameters here it might run up to 20 minutes, not because the crawler is slow, but because the Google Maps API takes some time to compute the travel time for each of the more than 500 apartments. If that is too slow for you, you should modify your search on Immoscout (and again copy the new url), so that the amount of search results is lower. If your result set is about 50 apartments, Immospider will only need 1-2 minutes to compute all the travel times.
After Immospider has finished it is time for some data science. In the following we will use https://jupyter.org/ to analyze the result.
import pandas as pd
df = pd.read_csv('apartments.csv')
We remove all the results without location (latitude, longitude).
df.dropna(subset=["lng", "lat"], inplace=True)
df.head(n=10)
city | media_count | immo_id | district | title | url | time_dest2 | time_dest3 | time_dest | rent | sqm | address | lat | contact_name | zip_code | lng | rooms | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Berlin | 13 | 91655265 | Köpenick (Köpenick) | Besichtigung am Sonntag, den 05.02. um 16:00 U... | https://www.immobilienscout24.de/expose/91655265 | 30.383333 | NaN | 43.833333 | 746 | 78,83 | Grünauer Straße 129, Köpenick (Köpenick), Berlin | 52.43238 | Herr Clemens Teske | 12557 | 13.57151 | 3 |
1 | Berlin | 10 | 92662753 | Spandau (Spandau) | Spandauer Arkaden schöne Altbauwohnung mit gro... | https://www.immobilienscout24.de/expose/92662753 | 27.116667 | NaN | 32.100000 | 530 | 70 | Pichelsdorfer Straße 139, Spandau (Spandau), B... | 52.52729 | Pierre Olbort | 13595 | 13.19551 | 3 |
2 | Berlin | 8 | 92662740 | Wedding (Wedding) | FREI AB SOFORT * SANIERT * BALKON VORHANDEN * ... | https://www.immobilienscout24.de/expose/92662740 | 25.500000 | NaN | 26.366667 | 753 | 75 | Steegerstraße 61, Wedding (Wedding), Berlin | 52.56246 | Klaudia Jantsch | 13359 | 13.39491 | 3 |
3 | Berlin | 8 | 92662699 | Wedding (Wedding) | FREI AB SOFORT * SEHR GEPFLEGT * SANIERT * WAN... | https://www.immobilienscout24.de/expose/92662699 | 25.550000 | NaN | 26.600000 | 749 | 75 | Steegerstraße 60, Wedding (Wedding), Berlin | 52.56228 | Klaudia Jantsch | 13359 | 13.39506 | 3 |
4 | Berlin | 10 | 93084855 | Spandau (Spandau) | Hoch hinaus mit toller Aussicht! | https://www.immobilienscout24.de/expose/93084855 | 25.833333 | NaN | 39.416667 | 780 | 103 | Falkenseer Chaussee 275b, Spandau (Spandau), B... | 52.54620 | Heike Rohrbach | 13583 | 13.18929 | 3 |
5 | Berlin | 2 | 92998370 | Müggelheim (Köpenick) | FREI AB MÄRZ 2017 * WANNENBAD * BALKON * RUHIG... | https://www.immobilienscout24.de/expose/92998370 | 41.533333 | NaN | 54.500000 | 561 | 70 | Philipp-Jacob-Rauch-Straße 72, Müggelheim (Köp... | 52.41571 | Klaudia Jantsch | 12559 | 13.64953 | 2,5 |
6 | Berlin | 6 | 33037112 | Marienfelde (Tempelhof) | Wohnen im Grünen Nähe Namitzer Damm | https://www.immobilienscout24.de/expose/33037112 | 32.733333 | NaN | 46.116667 | 640 | 75 | Marienfelder Allee 172a, Marienfelde (Tempelho... | 52.41102 | Frau S, Rahmlow | 12279 | 13.36040 | 3 |
7 | Berlin | 14 | 92712979 | Tiergarten (Tiergarten) | Besichtigung: Donnerstag den 02.02.17 um 17.00... | https://www.immobilienscout24.de/expose/92712979 | 9.800000 | NaN | 23.483333 | 998 | 117,41 | Berlichingenstraße 3, Tiergarten (Tiergarten),... | 52.52857 | Herr Methner | 10553 | 13.32520 | 3 |
8 | Berlin | 9 | 92869379 | Alt-Hohenschönhausen (Hohenschönhausen) | "Weiße Taube" Schöne 3 Zimmer Wohnung in ruhig... | https://www.immobilienscout24.de/expose/92869379 | 31.316667 | NaN | 34.416667 | 677 | 79,73 | Plauener Str. 89b, Alt-Hohenschönhausen (Hohen... | 52.53910 | Herr Werk | 13055 | 13.50934 | 3 |
9 | Berlin | 16 | 91448921 | Spandau (Spandau) | ++ Großzügige, lichtdurchflutete 3-Zimmer-Wohn... | https://www.immobilienscout24.de/expose/91448921 | 30.300000 | NaN | 61.700000 | 999 | 139,18 | Hakenfelder Straße 10a, Spandau (Spandau), Berlin | 52.56592 | Herr Oliver Müller | 13587 | 13.20028 | 3 |
We are searching for the apartments with the lowest travel time on average to our two destinations (Alexanderplatz and KaDeWe). To do this we compute the average travel time (avg_time) for each apartment and sort the list according to this value. Then we generate a list with the top10 results.
df["avg_time"] = 0.5*(df.time_dest + df.time_dest2)
df.sort_values("avg_time", inplace=True)
top10=df.head(n=10)
top10
city | media_count | immo_id | district | title | url | time_dest2 | time_dest3 | time_dest | rent | sqm | address | lat | contact_name | zip_code | lng | rooms | avg_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
732 | Berlin | 13 | 46991344 | Mitte (Mitte) | Helle, sanierte 3-Raum Wohnung mit Ausblick au... | https://www.immobilienscout24.de/expose/46991344 | 16.516667 | NaN | 4.183333 | 770 | 70,03 | Köpenicker Str. 103, Mitte (Mitte), Berlin | 52.51087 | Frau Marta Stellmach | 10179 | 13.41647 | 3 | 10.350000 |
626 | Berlin | 10 | 91193506 | Kreuzberg (Kreuzberg) | ***Top sanierte 2,5-Zimmerwohnung mitten im Ki... | https://www.immobilienscout24.de/expose/91193506 | 14.283333 | NaN | 9.366667 | 980 | 66 | Reichenberger Straße 3, Kreuzberg (Kreuzberg),... | 52.49981 | Provisionsfrei vom Eigentümer | 10999 | 13.41527 | 2,5 | 11.825000 |
568 | Berlin | 9 | 91799446 | Kreuzberg (Kreuzberg) | *** Top sanierte 2,5-Zimmerwohnung mit Südbalk... | https://www.immobilienscout24.de/expose/91799446 | 14.283333 | NaN | 9.366667 | 965 | 64,5 | Reichenbergerstraße 3, Kreuzberg (Kreuzberg), ... | 52.49981 | Provisionsfrei vom Eigentümer | 10999 | 13.41527 | 2,5 | 11.825000 |
340 | Berlin | 7 | 93003687 | Kreuzberg (Kreuzberg) | Hell Und Sonnig In Kreuzberg! | https://www.immobilienscout24.de/expose/93003687 | 15.550000 | NaN | 8.366667 | 759,59 | 114,88 | Admiralstr. 37, Kreuzberg (Kreuzberg), Berlin | 52.49807 | Herr Robin Cramer | 10999 | 13.41752 | 4 | 11.958333 |
682 | Berlin | 3 | 88469173 | Schöneberg (Schöneberg) | Sonnige 3-Zimmer mit Balkon / sozialer Wohnung... | https://www.immobilienscout24.de/expose/88469173 | 4.983333 | NaN | 20.200000 | 592,30 | 87,95 | Schwerinstr. 18, Schöneberg (Schöneberg), Berlin | 52.49718 | Frau Ayten Hennig | 10783 | 13.35897 | 3 | 12.591667 |
31 | Berlin | 9 | 92804746 | Charlottenburg (Charlottenburg) | Schöne drei Zimmer Wohnung in Berlin, Charlott... | https://www.immobilienscout24.de/expose/92804746 | 7.400000 | NaN | 17.816667 | 940 | 89 | Bleibtreustraße 51, Charlottenburg (Charlotten... | 52.50641 | Herr Wolfgang Dr. Groß | 10623 | 13.32039 | 3,5 | 12.608333 |
350 | Berlin | 2 | 92992442 | Kreuzberg (Kreuzberg) | Familienaltbauwohnung am Heinrichplatz | https://www.immobilienscout24.de/expose/92992442 | 16.000000 | NaN | 10.083333 | 602,21 | 74,79 | Oranienstr. 30, Kreuzberg (Kreuzberg), Berlin | 52.50150 | Herr Paul Herrmann | 10999 | 13.41907 | 3 | 13.041667 |
151 | Berlin | 7 | 93067156 | Mitte (Mitte) | Helle 3-Zim. Wohnung in Berlin-Mitte plus Stel... | https://www.immobilienscout24.de/expose/93067156 | 19.766667 | NaN | 7.183333 | 849 | 74 | Kleine Alexanderstr 5-7, Mitte (Mitte), Berlin | 52.52510 | Frau T. Orth | 10178 | 13.41106 | 3 | 13.475000 |
525 | Berlin | 7 | 92399166 | Kreuzberg (Kreuzberg) | Ihr neues zu Hause in der Nähe vom Potsdamer P... | https://www.immobilienscout24.de/expose/92399166 | 8.966667 | NaN | 18.183333 | 871,97 | 79,27 | Schöneberger Str. 6, Kreuzberg (Kreuzberg), Be... | 52.50412 | Frau Diana Wilhelm | 10963 | 13.37929 | 3 | 13.575000 |
465 | Berlin | 20 | 92720306 | Kreuzberg (Kreuzberg) | BSI***MITTENDRIN UND VOLL SIXTIES* RETRO* SONN... | https://www.immobilienscout24.de/expose/92720306 | 12.066667 | NaN | 15.166667 | 780 | 67 | Gitschiner Straße 0, Kreuzberg (Kreuzberg), Be... | 52.49829 | Herr Bernd Sajdok | 10969 | 13.40204 | 3 | 13.616667 |
For better overview we show the results on a map. For this we use the package folium.
# see https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/Quickstart.ipynb
import folium
print(folium.__version__)
0.2.1
We create a map of Berlin and add a marker cluster to the map. Then we add our top 10 results as markers to the marker cluster. We also add a HTML popup to each result, showing the average travel time and a link to the expose.
map = folium.Map(location=[52.520645, 13.409779])
# see http://deparkes.co.uk/2016/06/24/folium-marker-clusters/
marker_cluster = folium.MarkerCluster("appartments").add_to(map)
for index,row in enumerate(top10.itertuples()):
html = '''{0}. <a target="_blank" href="{1}">{2}</a> </br>
{3} </br>
Average travel time: {4:.2f} min '''.format(index, row.url, row.title, row.address, row.avg_time)
iframe = folium.element.IFrame(html=html.decode("utf-8").encode('ascii', 'xmlcharrefreplace'), width=300, height=100)
popup = folium.Popup(iframe, max_width=300)
folium.Marker([row.lat, row.lng], popup=popup).add_to(marker_cluster)
# see https://nbviewer.jupyter.org/github/ocefpaf/folium_notebooks/blob/master/test_fit_bounds.ipynb
map.fit_bounds(map.get_bounds())
map