You may have heard of EfficientNet as a image classification algorithm. Then why am I referring this as an approach, not an algorithm,which one is it? Lets find out!
First things first.Why EfficientNet? EfficientNet models (or approach) has gained new state of the art accuracy for 5 out of the 8 datasets,with 9.6 times fewer parameters on average.
At the paper, the authors firstly find out the relationship between the accuracy and the scaling(size) of a model.It is found that , performance increasing along with increase of width(number of channels used in each layer), depth(how many layers in the model) and resolution(size of the input image).
We can see, at every case, initially the accuracy increases dramatically (of course along with the computational cost), but after sometime the curves almost flatten.
At this case, where depth is increased, everything else(width, resolution) is kept fixed. But the performance doesn't saturate so easily, if we increase all of them simultaneously and that's the first key observation in that paper.
Intuitively also, this makes a great sense. Because, for higher resolution image we can extract more features with a deeper neural network. Similarly, with increased width of the model, more fine grained patterns can be captured from a higher number of pixels. From this we can sense that, co-ordination and balance between different scaling dimensions can help us achieve greater performance.
Generally this balance is tried to achieve by arbitrary change and manual tuning. But here we will take a more methodological approach. And this approach, compound scaling method is the second key observation of this paper.
Without getting into the mathematical details (for which I will recommend reading the original paper linked below), we can say that our authors have found a way , that if we increase the depth, width and height by a power of N, our computation cost will increase by a power of 2 to the power N. And that is the compound scaling method.
By applying this concept and some nitty-gritty math details, authors first scaled up some existing state of the art ConvNets, like MobileNets and ResNets, which showed great improvement over their authentic performance, as well as their manually scaled versions.
Then ,they set out to a journey, for developing their own optimized model. By using some recently developed neural architecture search methods, at first a baseline model EfficientNet-B0 was created. Then it was scaled up by using previously discussed compound scaling method, developing the other versions, from EfficientNet-B1 to EfficientNet-B7. And that's what did the magic!!
Here is their performance on the ImageNet dataset compared with models of similar Top-1/Top-5 accuracy(If you aren't sure what the Top-1/Top-5 accuracy, just take the Top-1 accuracy as the regular household accuracy,besides a little Google/Youtube search doesn't hurt).
And that's even not the whole story, the real magic is on the rightmost column.If the term Flops is unknown,take that as number of total addition or multiplication operation needed for the process (again a little googling, you may search FLOPS vs FLOPs), actually it is the measure of computational power needed. And do you find something amazing??
Yap! So do I. The dramatic difference between EfficientNet models and their comparable models.Thing how much computationally efficient these EfficientNet models really are. Are they really comparable in those groups??
So, that's almost the story of one one the most powerful and scalable image classification models in recent years. Although a lot is left, like use of EfficientNet in transfer learning, as well as inference Latency comparison with other models. But lets leave that for your reading the paper.
Hope you get the main idea behind this amazing approach. Do you know , our EfficienNet has a detective brother, EfficientDet!! But that's a story for a different day.
Thanks for reading till here.You can read the original paper at https://arxiv.org/pdf/1905.11946
This is actually my first blog as well as first paper review. So, if you have any suggestions for me, I'm always open to them. Stay safe!!!