UCIPD Crime Map

The data is arranged in a multi-page table, so I got the idea of writing a python script to convert this to an HTML map for easy display.

Here is the final output from the table published on 1/7/22:

The Process

Parsing PDF and omitting duplicates:

UCIPD publishes the tables as a pdf, which means it will need to be parsed into a more friendly format before the data contained can be processed.

I chose to use tabula because I just needed a quick way of parsing tables from a PDF file, and it allows both a local file and a remote one to be processed easily.

Initially, I converted the pdf data parsed by tabula directly into a Pandas dataframe. This presented problems since the pdf spanned multiple pages and so there were duplicate rows for the heading of the table present on all pages of the pdf.

I solved this by converting the data to a CSV file first, then iterating through each row and deleting duplicate rows except the first, preserving the header and deleting all subsequent duplicates.

Using Google Maps API:

With the duplicates removed, the CSV file containing the table can now be read into a Pandas dataframe for easy management and processing.

To minimalize API calls for cost and effeciency (since each API call takes a not insignificant amount of time and potentially money), only duplicate addresses are skipped over and the geo coordinates are cached locally in JSON for future use.

Generating the HTML Map:

Last thing before a map can be generated, the dataframe has two new colums appended to it: latitude and longitude. We now have all the information for generating a map.

Using the folium library, an HTML file is generated and pins are plotted using the new columns of coordinates.