What happens when a mechanical engineer leaves machines behind for machine learning? The engine’s start and stop buttons become inputs and outputs. Shifting gears turns into writing lines of code. The infotainment system transforms into a command-line interface. What once involved nuts and bolts now revolves around neural networks and algorithms.
A mechanical engineer’s path to computer vision and YOLOv8 is all about discovering how real-world problems are solved through digital solutions. Let’s see how I went from being a passionate learner with a mechanical engineering background to artificial intelligence and YOLOv8.
What’s in the article: You are about to see my quest to learn Python, computer vision, YOLOv8, and their real-world applications. I’ll explain my learning process, challenges, Aha moments, and my love-hate relationship with coding.
Getting to Know Computer Vision – The First Step
Computer vision, a branch of AI, enables machines to “digitally see” the real world through images and videos. In mechanical terms, computer vision is like giving cars human eyes instead of headlights, enabling them to drive autonomously. Computer vision techniques like image recognition and object detection are gaining significant attention in the mechanical and automobile industries. Almost all car manufacturers are constantly looking for ways to improve the performance of their cars using AI. For example, modern-day Volkswagen cars are integrated with ChatGPT to enhance the in-car experience and provide personal assistance.
A computer vision model is a mathematical representation of the real world. The model takes images or videos as input and outputs the information about the object it detects. Though there are many models available for computer vision, YOLO is the most popular and user-friendly option. If computer vision is about enabling machines to see the real world, then YOLO is the magnifying glass that allows these machines to detect specific objects accurately. Seeing the advancements of AI in the mechanical industry, professionals from mechanical, automobile, and other similar backgrounds are transitioning to computer vision.
Learning Python – The Second Step
Python is one of the easiest and most useful programming languages, with a huge ecosystem of libraries. After a quick research, I found out that Python is the most preferred language for computer vision applications due to its vast integration capabilities. From a mechanical engineer’s point of view, learning Python from scratch is more of a love-hate relationship. It’s like trying to set speed limits in an F1 racetrack. You don’t know how and when to drive and stop. But, once you get used to it, the frustrations turn into gratification.
Here’s the first snippet of the code that made me smile. The problem statement was to obtain the weight information and its unit from the user, then display the output in alternate units. Though the problem statement was simple and easy to solve, it was quite challenging for me to understand background operations and processing input from the user. Once I executed the snippet without any errors and got the desired output, the feeling was pure bliss.
One aspect of Python that truly left me awestruck is its use of indentation. Back in my school days, I used indentation merely to make my writing look neat and readable. But in Python, indentation is an essential headache. It is more like a traffic signal that guides the code where to go, what to process, and when to stop.
Resources used:
- A YouTube video on Python for Beginners by Mosh Hamedani
- A Medium article on Python for AI by Shaw Talebi
- A YouTube video on Python for AI by Shaw Talebi
This experience reminds me of Nolan Bushnell’s quote, “A good game is easy to learn but hard to master.”
Diving into YOLO – Third Step
When I first heard about YOLO, I thought of the internet slang “You Only Live Once.” But, that is not the case with the YOLO in YOLOv8 – You Only Look Once. YOLO is an object-detection model used for real-time tasks like surveillance cameras and self-driving cars. It is a popular computer vision model that offers high accuracy and speed when compared to its predecessors like R-CNN, Fast R-CNN, and Faster R-CNN
Learning something new is always exciting, and YOLO is no exception. Being an easy-to-learn model, it gave me that sense of excitement and the love-at-first-sight vibe. But the love was soon converted to hate when I encountered errors during my execution. Familiarizing myself with technical terms like IoU, Precision, Recall, and Confusion Matrix was quite challenging. But once I understood them, it became a no-brainer. In simpler terms, YOLO is your Sherlock Homes, observing the scene rather than just seeing it.
Resources used:
- A blog on What is YOLO and its Models by Roboflow
- A YouTube video on What is YOLO algorithm by Louis-François Bouchard
- A blog on YOLO architecture by V7labs
- A blog on What is Mean Average Position by V7labs
- A blog on real-world working YOLO model by Analytics Vidhya
- A document on different modes in YOLO by Ultralytics
The Leap to YOLOv8 – The Biggest & Final Step
Similar to iPhone and Android upgrades, YOLO has one too many versions, starting from YOLOv1 to YOLOv10 (as of September 2024). The differences and upgrades in each model are minimal yet effective in performance. If you compare YOLOv1 and v2, the difference is not that substantial. But, when you compare it with the latest models (v8 or v9), the difference is huge with respect to performance and architecture.
It took me around 5 days to fully understand the theory part of Python. Once I got comfortable with the theory, I shifted to writing codes. For guidance, I followed the Ultralytics YouTube video on training YOLOv8 models with custom datasets. To challenge myself a bit, I decided not to use the dataset from the video. Instead, I opted for a custom dataset from Roboflow – the Racoon Dataset.
The coding part was quite challenging and I made a few errors while executing it. Luckily, I got a wonderful team at Scribe of AI to help me out. I used Google Collab to write and run the computer vision project. The process involves installing Ultralytics and the YOLO8 model in Python. Then, I added the custom dataset from Roboflow. After plenty of trial and error, I successfully trained my YOLO model to detect raccoons accurately. I hope Rocket from Guardians of the Galaxy won’t be offended.
The resources that I used to train my model:
- https://docs.ultralytics.com/quickstart
- https://www.youtube.com/watch?v=LNwODJXcvt4
- https://learnopencv.com/train-yolov8-on-custom-dataset/
- https://docs.ultralytics.com/datasets/
- https://docs.ultralytics.com/modes/predict/
How My Mechanical Engineering Skills Helped Me
I firmly believe in Norton Juster’s words: “What you learn today, for no reason at all, will help you discover all the wonderful secrets of tomorrow.” I won’t say my mechanical background significantly helped me in learning computer vision, it greatly influenced my approach to solving the real-world problem. For instance, when working with the custom raccoon dataset, my mechanical engineering mindset prompted me to ensure that the dataset included raccoons from different angles; left side, right side, top view, and bottom view. Training the model with different perspectives is necessary for accurate object detection, just like how I used to incorporate different orientations in my CAD diagrams.
Moreover, I believe that a diverse team is essential for developing impactful AI solutions. A team of AI experts might overlook the nuances that are crucial for real-world applications. Adding non-technical perspectives ensures that AI-generated solutions are not only effective but also understandable to a wider audience. This reminds me of the Confucius quote: “The superior man thinks always of virtue; the common man thinks of comfort”
How Non-Developers Can Learn AI and Computer Vision
Computer vision is a vast field, and not exclusively for those with a computer science background. With the right mentor and resources, anyone can master computer vision. Platforms like YouTube and Reddit offer tons of valuable resources to self-learn computer vision and its applications. What sparked my curiosity was the idea of teaching a machine to ‘see’ what humans see and understand visual information in real-time. That’s how computer vision became my new passion.
If you are interested in learning computer vision, I’d suggest following the same approach I took. Start with the theory and build a strong foundation. Master Python and its essential libraries. Dive into the technical aspects of computer vision techniques like object detection, segmentation, and classification. Start with simple projects like recreating YouTube tutorials to acquire practical knowledge. Stay up to date with the latest trends in AI and computer vision. And, most importantly, be consistent.
Moving Forward
I never imagined I would become an AI developer, or a computer vision engineer, or whatever the exact title is. Started my professional career as an SEO Content Writer, switched to Technical Writer, and now learning computer vision. It was a fantastic journey with a lot of ups and downs.
My personal journey serves as a testament to the power of dedication and a willingness to learn. It shows that anyone, regardless of their educational background, can become a valuable member of the AI community.
I’m grateful to the Scribe of AI team for recognizing the importance of diverse perspectives and valuing creative and problem-solving skills over academic credentials. If you are from a mechanical background and considering a similar transition or are passionate about learning computer vision, I encourage you to share your experiences in the comments section.
Here’s the exciting part: we are always looking for passionate candidates to join our Scribe of AI team. Regardless of your background, we welcome those who are excited about learning and updating their career in the field of AI and computer vision.
This article was contributed to the Scribe of AI blog by Mehavannen MP.
At Scribe of AI, we spend day in and day out creating content to push traffic to your AI company’s website and educate your audience on all things AI. This is a space for our writers to have a little creative freedom and show-off their personalities. If you would like to see what we do during our 9 to 5, please check out our services.