Research motivation and questions

I’ve always been fascinated by the intricate world of the intersection of hardware and software. What has been even more fascinating is how complex Software Algorithms run on ultra-low power, tiny devices completing numerous inferences in a matter of milliseconds. How small can a neural net practically get? Small enough to run under 1 milliWatt and 16KB of RAM? Researchers have for a long time fantasized bigger, complex but highly accurate state-of-the-art ML models, but talking about practical applications, the research outcome should be unanimously serving the aim to create efficient, fast and deployable algorithms and frameworks, which are practically scalable. For a long period of my research, I’ve worked on how these computationally expensive Machine Learning models can be quantized to make them accessible and deployable. Computer Vision has huge prospects when it comes to Embedded ML. Using conventional frameworks possibly might have not brought ML to the NASA’s perseverance rover with it’s low power and limited data needs, or to the PhiSat-1 where Intel’s EdgeAI system has enabled bandwidth efficient satellite classification and even in Epsilon Rocket, where Sony’s Spresense explores anomalous trends in satellites under low power. Exploring and researching these Embedded ML applications first-hand through Research Internships has brought me to explore low power Computer Vision in fields of Wildlife tracking and conservation, or in the field of remote Autonomous Plant Phenotyping where previously battery and device constraints limited existing approaches. While a lot of Embedded ML is application driven, the unexplored questions lie in pruning, quantization and optimization methods, to bring heavy models quantized to lower sizes for real-time inferencing which is also the most exciting part of Embedded ML. Exploring these aspects brought me to thinking about a famous question in this field. While quantization of larger models is possible, can smaller neural networks be trained from the start to make the training computationally efficient as well? How could we reduce the unpruned architectural entropy in these “unidirectional” networks to train them efficiently small in size from the start? There exists so much randomness in these ML models that the way it directs connections in architecture after pruning and to get to the same architecture using a computationally inexpensive approach is simply one of the most perplexing questions. A lot of curiosity, and a lot of room for research is just another aspect which makes this field compelling to me.

Machine Learning holds some of the most interesting questions for potential research, and the roots for a few originate from emulation of the human cognition in the form of computational models. A lot of activities are inferred unknowingly by the human cognition: Visual Perception, Haptic Feedback, Spatial locationing and processing of these activities without going through a set of rules each time or taking a second to think and process. How can we replicate human perception as algorithms in systems in the form of “embodied intelligence”? How can modalities such as Visual Perception be integrated in the form of Spatial Computation and Depth Sensing of objects efficiently? How do we perceive irregular, rigid & occluded objects and synthetically model them for navigation and how can we employ this information in systems to estimate and generate 3D object structures and efficiently understand perceptible environments? These are just a tiny set of the questions that amuse me in this domain, all connected through an interdisciplinary goal: Embodied Intelligence. Through my second research internship I’ve explored how Depth Estimation using Stereo systems can model perceptible environments in the form of point clouds, and stereo matching algorithms, but such perception systems cannot replicate human behavior in intuitive understanding of occluded objects. Understanding how human perception systems learn the incomplete occluded information to perceive environments can help create better generative models, perhaps a “cognitive” 3D Generative Adversarial Network to model occluded surroundings through intuitive understanding of systems. While such networks are already intelligent, can we make them more perceptible through cognitive intelligence and learning through unobserved visual factors? I developed fascination about such systems through my first touch with these GANs: Spatio-temporal GANs that learned to generate better yield predictive models of plants through raw unhindered visual information rather than through feature vectors, suggesting the capability of human visual perception and application. Gradual understanding and application of the human cognition in the form of embodied intelligence can invent self-sustainable, contextually aware algorithms in terms of space and time and the research for such gradual understanding makes this field enthralling.

Share on

Twitter Facebook LinkedIn

Dhruv. M. Sheth

Share on