Continuing my series on big data geoanalytics, I wanted to show how to bring in large data sets so that we can start working with them. The data set we’ll use is the NYC taxi data that includes information on pickup and dropoffs. There are about 13 million records in a 2.2GB .csv file. That is not insanely large, but it is large enough for us to start messing around with it (don’t worry, I have a few 20GB+ data sets that I am working with and will eventually show that to you as well).
This video below will walk you through the steps I took to load and prepare the NYC taxi data inside of Manifold Future. My next posts will begin to look at how we can begin interrogating the data source to find meaningful information.
I hope you enjoy the video. Please comment below – I’d love to hear what people think.