[ABE-L] Seminário de Estatística, Ciencia de Dados e Decisao - Robert Tibshirani

Hedibert Lopes hedibert em gmail.com
Sex Abr 16 13:37:06 -03 2021


*Data Science*










Academic Seminar



*Cross-validation: what does it estimate and how well does it do it?*





*Speaker:*  Robert Tibshirani <https://statweb.stanford.edu/~tibs/>
*University:* Stanford Universty <https://www.stanford.edu/>

Cross-validation is a widely-used technique to estimate prediction error,
but its behavior is complex and not fully understood. Ideally, one would
like to think that cross-validation estimates the prediction

error for the model at hand, fit to the training data. We prove that this
is not the case for the linear model fit by ordinary least squares; rather
it estimates the average prediction error of models fit on other unseen
training sets drawn from the same population. We further show that this
phenomenon occurs for most popular estimates of prediction error, including
data splitting, bootstrapping, and Mallow’s Cp. Next, the standard
confidence intervals for prediction error derived from cross-validation may
have coverage far below the desired level. Because each data point is used
for both training and testing, there are correlations among the measured
accuracies for each fold, and so the usual estimate of variance is too
small. We introduce a nested cross-validation scheme to estimate this
variance more accurately, and show empirically that this modification leads
to intervals with approximately correct coverage in many examples where
traditional cross-validation intervals fail. Lastly, our analysis also
shows that when producing confidence intervals for prediction accuracy with
simple data splitting, one should not re-fit the model on the combined
data, since this invalidates the confidence intervals.

[image: Ícone Data]

May/10/2021

[image: Ícone Hora]

*12**pm de Sao Paulo, Brazil (UTC/GMT -03:00)*

[image: Ícone Hora]

*Click here to join <https://zoom.us/j/97971536999>*

-- 
Hedibert Freitas Lopes, PhD
Professor of Statistics and Econometrics
INSPER - Institute of Education and Research
Rua Quatá, 300 - São Paulo, SP 04546-042 Brazil
Phone: +55 11 4504-2343
www.hedibert.org
-------------- Próxima Parte ----------
Um anexo em HTML foi limpo...
URL: <http://lists.ime.usp.br/pipermail/abe/attachments/20210416/5559f74f/attachment-0001.htm>


More information about the abe mailing list