Data Science Students Team Up With Marine Biologist
The students are using deep learning and neural networks to create an automated system that classifies plankton for large-scale oceanographic studies.
By Robert Florida
Professor Joaquim Goes studies the effects of climate change on marine biology. His research team travels by ship to different parts of the Atlantic Ocean to collect water samples. The team is currently designing an automated flow-through system where seawater is drawn into their moving ship and is continuously analyzed. This automated system is an advancement over the usual method of collecting samples, where ocean researchers must stop their ships at pre-planned locations to collect samples. This system allows the ships to keep moving.
The team is also gathering data on the diversity of microscopic plant life, particularly plankton, which are critical to the marine ecosystem and for assessing the ocean’s ability to sequester carbon dioxide from the atmosphere. Plankton form the basis of many food chains and are an important indicator of an ocean’s health. When fully functional, the system will provide data required for validating satellite images of the ocean now being developed by NASA, NOAA and other agencies.
Last fall, as part of the Campus Connections program, which connects Data Science Institute students with professors around the rest of Columbia University, Goes teamed up with three DSI students (Ankit Peshin, Ziyao Zhang and Paridhi Singh) to develop an automated classification system for the phytoplankton types; previously they relied on manual methods to classify these images. The classification system uses two data-science techniques—Deep Learning and Convolutional Neural Networks—to automatically classify phytoplankton types based on their shape, size and other distinct morphological features.
“Once completed, the system will be a considerable step forward in automating our flow-through system for large scale oceanographic studies,” says Goes, a research professor at Lamont Doherty Earth Observatory. “For us at Lamont, DSI’s Campus Connections initiative has provided an opportunity to think outside of the box. Our partnership with this team of extremely bright students will be a considerable step forward in automating our flow-through system for large-scale oceanographic studies.”
One of the students, Ankit Peshin, is using existing plankton images to train a neural network model to automatically classify the different plankton, allowing the team to better understand the biological composition of each water sample.
“With projects like these, the more data (i.e., plankton images) you have, the better,” says Peshin. “A lot of experimentation is involved to see what works, since neural networks essentially act as black boxes,” without an understanding of the internal workings.
Peshin is delighted to have the chance to help develop the image-classification method. The theoretical knowledge he learned in his DSI classes has proven especially relevant to the project, he says, adding, “I’m glad to have been connected to this opportunity through Campus Connections.”
“My post-graduation plans aren’t set in stone, but having research experience will definitely help me,” he says, “whether I pursue a career in industry or academia.”
A version of this post was originally published by Columbia’s Data Science Institute. Read about more DSI’s Campus Connections projects here.