next up previous
Next: PicToSeek Up: Systems Previous: Picasso



NEC Research Institute, Princeton, NJ, USA.




The current version of the systems uses color histogram and color spatial distribution along with hidden textual annotations. Besides a 64-bin HSV histogram, two other vectors - a 256-length HSV color autocorrellogram (CORR) and a 128-length RGB color-coherence vector (CCV) - are describing the color content of an image. The first 64 components of CORR represent the number of pixels of a particular color from the 64-quantized HSV color space having neighbors of the same color at distance 1. The rest of the vector is defined in the same way for distances 3, 5 and 7. CCV is made of 64 coherence pairs, each pair giving the number of coherent, and incoherent pixels of a particular discretized color in the RGB space. In order to classify pixels in one of the two categories, the image is first blurred slightly by replacing each pixel value with the average value of the pixels in its $3\times3$ neighborhood. Then, pixels are grouped in connected components. A pixel will be classified as coherent if the connected component it belongs to is larger than a fixed threshold. Keywords, selected from a set of 138 words, are associated with each image in the database population step and are represented as a boolean vector.


Retrieval is done with query by example.


The distance between individual features (color vectors or annotation lists represented as binary vectors) is the L1 distance. These distances are scaled and combined in a global distance. The scaling factors are computed by maximizing the probability of a training set.

Relevance feedback

PicHunter implements a probabilistic relevance feedback mechanism, which tries to predict the target image the user wants based on his actions (the images he selects as similar to the target in each iteration of a query session). A vector is used for retaining each image's probability of being the target. This vector is updated at each iteration of the relevance feedback, based on the history of the session (images displayed by the system and user's actions in previous iterations). The updating formula is based on Bayes' rule. If the n database images are noted Tj, $j=1,\ldots,n$, and the history of the session through iteration t is denoted $H_t=\{D_1, A_1, D_2, A_2,\ldots,D_t, A_t\}$, with Dj and Aj being the images displayed by the system and, respectively, the action taken by the user at the iteration j, then the iterative update of the probability estimate of an image Ti being the target, given the history Ht, is:

\begin{displaymath}P(T=T_i\vert H_t)=P(T=T_i\vert D_t,A_t,H_{t-1})=
\frac{ P(A_...
...{j=1}^nP(A_t\vert T=T_j,D_t,H_{t-1})P(T=T_j\vert H_{t-1}) }.

In computing the probability of a user to take a certain action At given the history so far and the fact that the target is indeed Ti, namely P(At|T=Ti,Dt,Ht-1), a few models were tested. One approach is to estimate the probability of the user to pick an image Xa from $X_1,\ldots,X_{n_t}$ by

\begin{displaymath}p_{softmin}(A=a\vert X_1,\ldots,X_{n_t},T)=
\frac{ exp(-d(X_a,T)/\sigma)}{\sum_{i=1}^{n_t} exp(-d(X_i,T)/\sigma)},

and, in the case of choosing any number of images, to assume that each image is selected independently acording to a psoftmin.

Result presentation

While older versions were displaying the nt images with the highest probability of being the target, in the newer version the images selected for display are determined by minimizing the expected number of future iterations estimated by entropy.


The system was tested with a database gathering images from 45 CD's of Corel stock photographs.

next up previous
Next: PicToSeek Up: Systems Previous: Picasso
Remco Veltkamp