green-canvas-art-analytics-evaluation-price-prediction-1
green-canvas-art-analytics-evaluation-price-prediction-2
green-canvas-art-analytics-evaluation-price-prediction-3
green-canvas-art-analytics-evaluation-price-prediction-4
green-canvas-art-analytics-evaluation-price-prediction-5
green-canvas-art-analytics-evaluation-price-prediction-6

The Green Canvas

This project (source code) aimed at studying art valuation criterion with a specific focus on paintings. We were interested in quantifying aesthetics as an extremely subjective and quality-based feature as well as exploring the middle realm between artistic evaluation and scientific statistics. How do we evaluate paintings? Will there be any interesting relationships between price evaluation and pixels? Taking into consideration other factors such as artist's prominence, the project attempts to answer questions such as: Are paintings of specific contrast tend to be evaluated higher than those with high saturation? Are paintings with sharp edges or on bigger canvases tend to sell cheaper? Focus will be on contemporary paintings as a twist on the exaggerated prices some of these pieces have reached - especially when many are comparable to children's artwork. While the art market itself is responsible for fluctuations and trends, aesthetics plays an important role in evaluating art pieces.

Image processing will be utilized to extract pixel information and attempt to correlate these with the peices' current market value. Therefore, the plan is to: A. Use data to explore trends in painting evaluation. How is information about the painting or even at the pixel level influencing evaluation? strong>B. Develop a machine learning framework that can predict pricing. Can we develop an algorithm that can replace expert art consultants and can be used by, say, auction houses?

Data was collected from two sources: A. Internet: Scrape information off websites - example. A Pandas dataframe will be constructed containing the columns: Artist's name, Year, Country, Painting Name, Style, Material, size, Markings and Price with time sold. B. Images of the paintings: Use openCV and PIL libraries to add features to the dataframe such as: Face detection - tell us whether it is a portrait or not, Top dominant colors, large/small Areas of solid color, Edges, contrast, brightness, hue, saturation… The three design stages included: A. Explore trends by grouping by artist, location, material and so on. Which material or style cost the most? Or some interesting discovery such as which size of the drawings would be most expensive in terms of unit area. These will be based on pandas dataframe manipulation and grouping. We will also attempt to reduce dimentionality of the data and explore the principal components. B. After pixel information is added to the dataframe, stage one will be repeated to explore additional trends. C. Build a ML model that is able to predict value. Explore different ML techniques learned in class and which is more appropriate for our application. Which feature is most influential on the evaluation?

We analyzed 35407 paintings at a total valuation of $9,366,754,845. Prices included a maximum of $119,922,500, an average of $264,545 and a minimum of $3. Some of the trends explored included: 1. Paintings produced in the 1960's recorded the highest sales.This coincides with the many artistic impulses that began to gain momentum during that period including the explosion of consumerism and popular culture. 2. Paintings with whites, grays and blacks as dominant colors are most likely to have high sales values, compared to other more saturated colors. 3. Paintings where low corner percentages are detected are also more likely to have high sales values. 4. Auctions of valuable pieces tend to coincide with successful exhibitions. In an attempt to develop a machine learning platform for pricing artwork, we created a linear regression model specifically fit for paintings by Spanish painter Pablo Picasso. We used a set of 4000 paintings for training and another equal set for testing. Our model reached a prediction score of 0.32. Using a single log on the price value gave the most optimum results. We also built separate regression models based on single parameters as predictors. We noticed that the ratio of unique colors alone generated a relatively high correlation of 0.46 between predicted prices and actual prices.

Share