Reaction to Poggio & Ullman (2013) by Richard Thripp
EXP 6506 Section 0002: Fall 2015 – UCF, Dr. Joseph Schmidt
October 13, 2015 [Week 8]
Poggio and Ullman (2013) give us an intriguing overview of many reasons why computers have gotten much better at recognizing objects over the past decade, such as implementing learning via examples rather than design (p. 74), feature databases (pp. 74–75), and hierarchization and related algorithmic improvements (pp. 75–76).
I was surprised that no mention is given of the improvements in computer performance over the past decade. Jeff Preshing (2012), a Canadian computer programmer, created the following graph of the single-threaded floating-point performance of common computer processors from 1995 to 2011, showing an annual performance increase of approximately 21% from 2004 through 2011. While this increase is fairly modest compared to the 64% average annual increase he calculated from 1995 through 2003, multicore computing has also became very common since 2004. Since this graph only concerns single cores, the processing performance increase for the past decade is likely ten-fold, conservatively.
Poggio and Ullman (2013) do not discuss CPU (central processing unit) improvements, nor the vast increases in RAM (random access memory) and GPU (graphics processor unit) speed and capacity that have occurred over the past decade. Such improvements have allowed more data to be maintained in memory and operated upon more quickly. The authors state that “high-performance computer vision systems require millions of labeled examples to achieve good performance,” but do not discuss that the computing power required for the millions of examples is much cheaper and more readily available now. This seems a rather large oversight.
Poggio and Ullman (2013) mention the PASCAL Visual Object Classes challenge (p. 73), the requirements for which do not discuss the performance or computing power requirements of the object categorization algorithms (Everingham, Gool, Williams, Winn, & Zisserman, 2012). However, the contest ran from 2005 to 2012, and notably became more complex each year. This does not necessarily mean that computer programmers are simply getting better—they may be aided by better equipment and tools, as well as past experience and emerging cognitive and neuropsychological research.
An image categorization algorithm may achieve incrementally improved accuracy at immense computational cost, and this is sufficient perhaps for a proof-of-concept paper or contest entry. However, deploying the algorithm in the real world, for example, Facebook’s 2015 rollout of automatic tagging of people in uploaded photos (Woollaston, 2015), requires not only algorithmic optimization and compromises, but a cheapening of computing resources which has occurred, but was not broached by Poggio and Ullman (2013). Deploying image categorization algorithms on a massive scale, as Facebook has done, may provide critical insights into improving object recognition and closing the performance gap that Poggio and Ullman (2013) have identified (pp. 77–78).
Everingham, M., Gool, L., Williams, C., Winn, J., & Zisserman, A. (2012). The PASCAL visual object classes homepage. Retrieved October 13, 2015, from http://host.robots.ox.ac.uk/pascal/VOC/
Poggio, T., & Ullman, S. (2013). Vision: Are models of object recognition catching up with the brain? Annals of the New York Academy of Sciences, 1305, 72–82. doi:10.1111/nyas.12148
Preshing, J. (2012). A look back at single-threaded CPU performance. Preshing on Programming. Retrieved October 13, 2015, from http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/
Woollaston, V. (2015). Facebook can tag you in photos AUTOMATICALLY: Social network starts rolling out DeepFace recognition feature. Daily Mail Online. Retrieved October 13, 2015, from http://www.dailymail.co.uk/sciencetech/article-2946186/Facebook-soon-tag-photos-AUTOMATICALLY-Social-network-starts-rolling-DeepFace-feature.html