http://icml2008.cs.helsinki.fi/detailed_sessions.shtml ICML 2008 http://icml2008.cs.helsinki.fi/papers/587.pdf 155 Bayesian Multiple Instance Learning: Automatic Feature Selection and Inductive Transfer advances aided andrews artificial asuncion axis caruana computer diagnosis dietterich dundar fung hofmann html http information instance intel krishnapuram lathrop learning ligence lozano machine machines mlearn mlrepository multi multiple multipleinstance neural newman parallel perez problem processing rectangles references repository solving support systems task tsochantaridis vector with http://icml2008.cs.helsinki.fi/papers/429.pdf 23 A Semiparametric Statistical Approach to Mo del-Free Policy Evaluation acknowledgments actor adaptive addition algorithms amari among analysis anonymous appendix applied approximation argument athena authors bartlett barto baxter become bernoul bertsekas bias bickel biometrika bradtke cambridge comments conference constant critic difference differentiating dynamic efficient equality estimates estimating estimation european finite foundations function functions furthermore geometry global godambe gradient greensmith helpful holds horn information introduction inversion iteration johnson journal kawanabe klaassen known lagoudakis learning least lemma linear machine management mannor matrices matrix models must natural neuro obtained only optimal optimality oxford parametric parr peters policy press proceedings processes programming proved recursive reduction references reinforcement research reviewers ritov sample satisfy schaal science scientific semi semiparametric shifts simester similar solution springer squares statistical stochastic suggestions sutton techniques temporal thank that their theorem therefore those timeseries trace tsitsiklis university using value variance verlag vijayakumar well wellner when where wish young http://icml2008.cs.helsinki.fi/papers/484.pdf 122 Expectation-Maximization for Sparse and Non-Negative PCA acad advances algorithms analysis applied armstrong aspremont avidan bach boer bounds cadima component components conference correlations distinct distinguishes exact expression full gene genetics ghaoui global golub greedy horst information international interpretation introduction jolliffe kluwer korsmeyer lander learning leukemia loadings machine minden moghaddam nature neural optimization pardalos path pieters principal proceedings processing profile publ references regularization sallan silverman sparse specify spectral statistics staunton systems that thoai translocations unique weiss http://icml2008.cs.helsinki.fi/papers/382.pdf 54 Large Scale Manifold Transduction belkin bennet bottou burges cambridge data decision demiriz dimensionality eigenmaps examples framework from geometric http icml journal labeled laplacian learning leon machine machines manifold nips niyogi press projects reduction references regularization representation research rules semi simplified sindhwani supervised support unlabeled vector http://icml2008.cs.helsinki.fi/papers/544.pdf 1 Hierarchical Model-Based Reinforcement Learning: R - M A X + MAXQ absence abstracted abstraction abstractions action advances agent agents algorithm although approach artificial atkeson autonomous avenues average barto based between both bounds boutilier brafman cambridge chentanez college combines completion complexity composite concept conclusions conference constraining construction convergence cumulative data dearden decomposition demonstrates deterministic dietterich discovery discrete dissertation diuk doctoral does domains drastically during earned effective efficient empirical environment episode episodes evaluation even event experience explicit explicitly exploiting exploration fifteenth fifth finite first formal fourteenth framework function future general given goldszmidt guarantee guarantees guiding hester hierarchical hierarchies hierarchy identify implicit improve improving individual information intelligence international intrinsically introduction issue joint jong journal kakade kearns known learning less littman london machine mahadevan maxq mdps mirroring model modelbased moore more motivated multiagent near neural nineteenth novelty optimal performs policy polynomial precup press prioritized proceedings processing real recent reduce reduced references reinforcement relative relatively research retains reward sample seen semi seri seventh simple simsek singh single space special state stone strehl structure suggests sutton sweeping systems tadepalli temporal tennenholtz that theoretical this throughout time twenty university upon useful using utility value with within http://icml2008.cs.helsinki.fi/papers/196.pdf 9 Estimating Lo cal Optimums in EM Algorithm over Gaussian Mixture Mo del accelerate acceleration algorithm analysis approach architectures asymptotic bound clustering comp computation conclusion convergence data dempster derive discussion efficient elkan estimating experts extensions finite from function gaussian handbook http icdm icml implementation incomplete inequality interscience john jordan journal kanungo krishnan laird likelihood local lower lutkepohl matrices maximum mclachlan means method mixture mixtures model models mount multiple netanyahu networks neural optimum optimums over paper papers peel piatko properties propose rate references restart results robin royal silverman society solution sons statistical this triangle tung upper using wiley with zhang zhangzh http://icml2008.cs.helsinki.fi/papers/449.pdf 51 Robust Matching and Recognition using Context-Dep endent Kernels acquisition algorithm alvey amores annual apprentissage approach bahlmann barla based belongie bishop boosting boser boughorbel burkhardt cadre chal challenge chapelle classes classification classifiers combined computational conference constellations context contextual convolution corner cruz cuturi cvpr dans darrell data descriptor descriptors detection detector discovery discrete dissertation doctoral eccv edge efficient ensmp etude everingham faculte fast features fifth gartner gool grauman guyon haasdonk haffner handwriting harris hausdorff haussler histogram http image integrating iwfhr ject jmlr kernel kernels learning lenges lncs machine machines malik margin match matching mining multi networks neural nips noyaux objets odone online optimal orsay pascal pascalnetwork pattern pittsburgh pour puzicha pyramid radeva recognition references relational report results santa sebe semigroupe sets shape spatial springer statistique stephens structured structures support survey svms technical theory thesis training transaction vapnik vector verri vision visual williams winn with workshop zisserman http://icml2008.cs.helsinki.fi/papers/461.pdf 37 A Quasi-Newton Approach to Nonsmo oth Convex Optimization algorithms analysis andrew belloni bundle center chal conf convex haarala hiriart intl introduction joachims large learning lemar linear machine methods minimization models nonsmooth operation optimization pages proc references regularized report research scalable scale skyl springer svms technical thesis time training university urruty verlag volume york http://icml2008.cs.helsinki.fi/papers/601.pdf 102 Classification using Discriminative Restricted Boltzmann Machines advances artificial barbados bengio bouchard cambridge carreira chap chelle ciety classification classifiers compstat computational contrastive criterion curse deep delalleau discriminative diver druck elle ervised etween functions gence generative greedy highly hinton hotel hybrid iasc information inforo intel intelligence international kernel lamblin laro layer learning ligence lkopf machines mation mccallum networks neural ovici perpinan platt prague press proceedings processing propagation quadratic references roux savannah semi semisup statistics supervised symposium systems tenth tradeoff training triggs variable weiss wise with workshop zien http://icml2008.cs.helsinki.fi/papers/562.pdf 125 mStruct: A New Admixture Mo del for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations academy acknowledgements algorithm allele allocation anonymous artificial bayesian bioinformatics blei bodmer bonne cambon cann cavalli cazes cell comment conrad coop correlated data dirichlet disasters disequilibrium diversity donnelly erosheva events excoffier exponential falush families feldman field fienberg forensic freimer frequencies from future generalized genet genetic genetics genome genotype ghahramani grant graphical hamilton haplotype human inference intel interpreting introduction jaakkola jakobsson ject joint jordan journal kaufmann kidd lafferty latent learning legrand ligence line linkage linked loci machine markers mass mean membership methods microsatellite mixed model models morel morgan multilocus mutation myers national panel part past piouffre population populations present pritchard probabilistic proceedings publications publishers recombination references research revisited rosenberg russell samples saul science sciences scientific sforza slatkin sohn spectrum stephens stepwise structure supported survey tamir this thomsen toma uncertainty using valdes variation variational wall weber worldwide xing zhivotovsky http://icml2008.cs.helsinki.fi/papers/179.pdf 106 Query-Level Stability and Generalization in Learning to Rank adapting addison agarwal algorithms baeza bipartite bousquet burges colt deeds descent document elisseeff generalization gradient hamilton huang hullender icml information journal lazier learning machine modern neto niyogi proc rank ranking references renshaw research retrieval ribeiro shaked sigir stability using wesley yates http://icml2008.cs.helsinki.fi/papers/652.pdf 93 An Analysis of Reinforcement Learning with Function Approximation adaptive algorithm algorithms annals applications applied approximate approximation approximations athena baird benveniste bertsekas borkar carnegie chains chattering computer conf control coste diaconis difference discretetime dynamic engineering existence farias finite fixed function gordon inequalities informational internal iteration journal learning letters logarithmic machine markov mellon meyn neuro optimization points priouret probability proc programming references reinforcement report residual saloff sarsa scales school science sciences scientific sobolev springer stability stable stochastic systems technical temporal theory time tivier tsitsiklis tweedie university value verlag with http://icml2008.cs.helsinki.fi/papers/665.pdf 49 Composite Kernel Learning absolute account across adapting adaptive addition advances algorithm alignment also alvarado among annals appear applied arbritrary areas arrow artificial bach bartlett beyond biomed birbaumer black blankertz bogdan bonnans bousquet boyd brain braincomputer cambridge campbell canu center channel chapelle choosing circle classifiers color competition composite computer conclusion conference conic considering contribution convex cortex cristianini crossroad curio darker definite detection direction discarded discrimination duality dynamically efficiency electrode electrodes elisseeff enabling ensemble especially estimation eurasip exactly extended factor family feature figure first former formulation framework from frontal function further generalized ghaoui grandvalet group grouped guided guigue guyon hierarchical high higher highlight hill hinterberger icml ieee importance improving information instead interface interfaces international into introduction jordan journal kandola kernel kernels koller lanckriet large lasso lateral latter learning left likely machine machines mahoudeaux mallet maps matrix median mixed model more morizet motor mukherjee muller multiple neat networks neuper neural norm normalization numerous omnipress optimization paper paradigm parameters parametric particularly penalization penalties performances perspectives pertubation pfurtscheller platt poggio pontil press primary problem problems proceedings processing programming progress provide rakotomamonjy ratsch references regions regression regularization relevance relevances relevant represents research results review right rkhs robust rocha rosenstiel roweis royal same scale scaling scalp schafer schalk schlogl scholkopf schroder selection semi series sets shapiro shawe show shrinkage siam side signal similar singer single smola smooth society somatosensory sonnenburg spaces speller springer statistical statistics structure subjects support svms systems szafranski take target taylor that they this tibshirani tour trans trials twenty university vandenberghe vapnik variable variables variational vaughan vector very viewpoint visual weston where white with wolpaw works yuan zero zhao http://icml2008.cs.helsinki.fi/papers/437.pdf 25 Active Kernel Learning aaai above active algorithm alignment andd appendix application applications artificial bartlett batch belkin chapelle chen classification clus clustering colt cones conf conference constraints constructe corvallis cristianini definite derivative dhillon diffusion discrete distance duce elisseeff first from function ghahramani ghaoui graph graphs have icml image information intel intl into intro jmlr john jordan kandola kernel kernels koller kondor kulis lafferty lagrangian lanckriet large learning ligence machine matlab matrices matrix medical methods metric multiplier nips niyogi nonparametric olbox optimization other over pairwise pennsylvania pittsburgh press problem proceedings programming proof rank references regularization russell scholkopf second sedumi selection semi setting shawe side simplified smola software sons stanford statistical structures sturm substituting supervised support sustik symmetric target taylor text theorem theory tong transforms twenty using vapnik vector weston wiley with xing zero http://icml2008.cs.helsinki.fi/papers/600.pdf 124 Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo aaai about above account accuracy achieve acknowledgments additional advances advantage alberta allowing also amount analysis appears applied approach approximate approximations assumptions available banff bayesian bonn california cambridge camp canada carefully carlo chain collaborative colt compared component computation computational computer conclusion conclusions conference confidence confirmed constrained containing converged convergence cores cost dataset demonstrated department description desired determine diagnose differs discussions distribution easy ecml empirical entirely eration factor factorization fast feature filtering first fransisco from fully gaussian generated generating geoffrey germany ghahramani graphical greatly hard have helpful higher hinton hofmann however hyperparameters hyperpriors icml ilin improvements inference information inspecting instead international into introduction inverting jaakkola jordan just karhunen kaufmann keeping large larger latent leading leads learning length lots machine made making many maptrained margin markov marlin matrix maximum mcmc means mentioned methods million minimizing missing mnih model modeling models monte morgan movie movies much multiple multiplicative neal networks neural nowlan nserc number over parallel parameters perform placing posterior potentially practice prediction predictions predictive presented press principal probabilistic problem problems proceedings process processing profiles provides quantified raiko rank rating ratings reason reasonable recommendations references regularization reject rely rennie report research resources results risk rules salakhutdinov samples sampling saul scale scholkopf science second semantic sharing show significantly simple simplifying single soft speed srebro still structure suboptimal successfully sufficient suggests supported systems taken technical thank that there this thrun thumb time tions toronto trained treatment true tuned twentieth twenty typically uncertainty university user users using values variational vectors washington weight weighted weights when which will with within workshop zemel http://icml2008.cs.helsinki.fi/papers/448.pdf 61 Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning active adapting amini brinker burges carbonell conference crisp development document ecml functions gallinari goldstein huang icml information international label lacasse laviolette learning nips proceedings ranking references research retrieval sampling selective sigir solution strategy uniqueness usunier http://icml2008.cs.helsinki.fi/papers/627.pdf 41 Knows What It Knows: A Framework For Self-Aware Learning acknowledgments algorithm analysis angluin bagnell bianchi blum boolean brafman carnegie case cesa classification computer computing darpa decision distribution domain efficient free general gentile ieee information institute journal label learning linear lugosi machine markov mellon minimizing mistakebound models near optimal over pittsburgh polynomial prediction problems provided queries references regret reinforcement report research revisited robotics sampling schneider science selective separating siam solving stoltz support technical tennenholtz theoretical theory time transactions uncertain university with worst zaniboni http://icml2008.cs.helsinki.fi/papers/580.pdf 94 Reinforcement Learning in the Presence of Rare Events adaptive ahamed akarapu algorithm algorithms analysis approach approximation artificial asmussen athena bartlett barto baxter bertsekas bhatnagar bias borkar bucklew chains combinatorial conditioned conf control cross dasgupta difference differences dynamic eligibility entropy ergodic estimates estimation evaluation event events function glynn gradient horizon importance infinite intelligence international introduction journal juneja kroese learning machine management mannor markov method methods montecarlo neuro oper optimization policy precup predict press proc programming rare references reinforcement research rubinstein sampling science scientific simester simulation simulationbased singh springer stochastic sutton technique temporal traces tsitsiklis unified using value variance verlag with http://icml2008.cs.helsinki.fi/papers/523.pdf 129 Empirical Bernstein Stopping aaai accelerating achieved acknowledgements action adaboost adaptive addressed advantage after agence alberta algorithm algorithms also american application approximation argue armed artificial asia association asuncion audibert bandit based because begin benefit bernoulli bernstein better boosting bound bounded bounds bradley carlo case certis classification clustering colt come computing conclusions condition conference covertype dagum data dataset datasets decision denoted deviation diagrams discovery distributions domingo domingos each ecole effect empirical enpc environments error estimates estimation even existing expected exploited exploration extending filterboost from function fund future general given graphiques have hoeffding however http hulten iaai icml icore incorporating inequalities influence information ingenuity intel interesting into ject journal kaelbling karp know known large larger largest lazy learner learners learning ledge ligence luby machine madaboost manner mannor mansour markov maron merely method methods mining model modification monte moore most much multi munos nationale near negativity newman nips nonnegative notably number optimal options ortiz outperforms pacific part percentage plications ponts possible presented principled probability problems processes provided question races racing random recherche references regression relative remaining report repository results review ross rule samples sampling sarcos saved scaling schapire search selection should showed shows siam significantly smaller smallest standard statistical stochastic stopping sums supported szepesv table taken technical termination than that theoretical this three tuning tweaking unprincipled value variable variables variance watanabe well were where which while with without work would http://icml2008.cs.helsinki.fi/papers/158.pdf 48 Localized Multiple Kernel Learning accuracies adaptive advances algorithm algorithms alpaydin amari analysis annual arabidopsis assigns average bach banana bartlett based benchmark bengio between bioinformatics biology canonical canu classification classifier classifiers collobert combination combined compared comparing comparison comparisons computation computational conclusions conference conic consists cristianini data denmark diego direct duality duin efficiency eukaryotes experts framework from functional functions fusion gating gene genome genomic germannumeric ghaoui grandvalet grundy heart heterogeneous hinton iapr improving information initiation instance intel international introduces ionosphere jacobs jebara joint jordan journal kernel kernels lanckriet large learning lewis ligent liverdisorder lkopf lmkl local localized locally machine manual matrix mixture mixtures model modifying moguerza molecular more mosek multiple munoz network networks neural nielsen nips noble nonstationary nowlan optimization paired parallel pattern pavlidis pedersen percentages performed perspectives pima prediction problems proceedings processing programming proposed rakotomamonjy rank recognition references research revision ringnorm scale semidefinite sets signed sites sonar sonnenburg sources spambase statistical structural supervised support svms syntactic systems table test testing this tools translation tsch vector version versus vertebrates very verzakov wdbc weights weston which wilcoxon with work workshop workshops http://icml2008.cs.helsinki.fi/papers/581.pdf 22 An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning aaai algorithms analysis analyzing approximate approximation automatic barto based basis bebf bellman bertsekas blackjack boutilier boyan bradtke chain column computing construction control convergence dean decision decomposition dictionaries difference differences different discovering dynamic ergodic error factored feature figure first framework frequency function functions generation givan icml ieee ijcai introduction iteration jmlr keller koller lagoudakis laplacian learning least lids linear littman machine maggioni mahadevan mallat mannor markov matching mdps methods minimization model models number order painter parr petrik policies policy precup predict press problem problems processes processing programming proto purdue pursuits references reinforcement report representation results reward room sanner second selection signal some squares state structured sutton technical temporal third three time trans university value wakefield with zhang http://icml2008.cs.helsinki.fi/papers/178.pdf 29 Nearest Hyperdisk Methods for High-Dimensional Classification approaches barkana based bennett bredensteiner cevikalp classifiers common computer conference dimensionality discriminant discriminative douze duality face geometry greece icml ieee jurie larlus learning linear local machine margin neamtu nonlinear pami pattern polikar processing recognition reduction references signal society subspace thessaloniki transactions triggs vectors vision visual wilkes workshop http://icml2008.cs.helsinki.fi/papers/317.pdf 116 On-line Discovery of Temp oral-Difference Networks acknowledgments advances agent alberta algorithm algorithms also among amount argue blind bowling brown calculate calculation cassandra cause centralized combine comparison complex computation computed computer contribution costs criteria demonstrated developing difference differences dimensional discovery discrete distributed dynamical effectiveness effort efforts eligibility empirical environment environments error establishment evaluate example examples expand expands experiments explanation explores file function further future gordon grant hand help history however html http icml ijcai important incremental independency independent index information jaeger james keep kind known kraines larger leaf learning like line linear littman long manner master mccracken memory mext models most network networks neufeld neural node none nonlinear normalize observable online only operations operator other page parallel partially planning policies pomdp prediction predictive presented press probability proc processing prof proposes quite references related repository representation representations required requires research reset result ring rosencrantz rudary scientists series should sigmoidal since singh size small state steps steven stochastic studied substantially summary supported sutton systems tanner temporal temporaldifference thank thanks that their these thesis they this through thrun time tony traces university used using various well when wilkinson with without wolfe work worlds young http://icml2008.cs.helsinki.fi/papers/296.pdf 55 Graph Transduction via Alternating Minimization belkin examples framework from geometric jmlr labeled learning manifold niyogi references regularization sindhwani unlabeled http://icml2008.cs.helsinki.fi/papers/528.pdf 82 The Asymptotics of Semi-Sup ervised Learning in Discriminative Probabilistic Mo dels active bach chapelle generalized learning linear lkopf misspecified models nips press references semio supervised zien http://icml2008.cs.helsinki.fi/papers/260.pdf 39 On Partial Optimality in Multi-lab el MRFs agarwala agrawala appeared applied approximate artificial binary boolean boros boykov chain cohen colburn complementation convergent convex curless cuts cvpr dead decomposition desmet digital discrete dontcheva drucker dual duality efficient elimination energy exact fast felzenszwalb fields flow flows functions graph hammer hansen hazes huttenlocher iccv intel interactive ishikawa kolmogorov komodakis lasters ligence maeyer markov matching math mathematics message minimization minimized minimizing nature network nonsubmodular operation optimality optimization pami paragios passing persistency photomontage pictorial positioning preprocessing priors problems product programming protein pseudo quadratic random references report research review revisited reweighted roof rother rutcor salesin side simeone solved some structures tavares technical theorem trans tree tziritas uncertainty unconstrained veksler wainwright what with zabih http://icml2008.cs.helsinki.fi/papers/458.pdf 113 Automatic Discovery and Transfer of MAXQ Hierarchies aaai aamas abstraction abstractions acknowledge acknowledgments actions advanced agency agent agents algorithm allow also altered analyzes andre anonymous apply approach artificial automatic automatically barto based behavior between causal challenge compact comparably conclusion consistent convergence creating currently darpa deal decomposition defense density deterministic dietterich direct discovering discovery disjunctive diuk diverse does domains dynamic ecml effectively efficient elty empirical ensure explanation extending factored finally finding framework from function further future given goals grant graph gratefully handle have help hengst hexq hierarchical hierarchies hierarchy icml identify important improve improves indicate induced inducing input intel into investigate investigating jectory jects jonsson journal kaelbling learn learned learning ligence littman long lozano machine macro mannor manuallyengineered marthi maxq mcgovern mdps mehta menache method methods mike models more multi need nips observed optimality order organization partitions perez perform pickett policies policy policyblocks precup presented problem problems programmable pseudo rate recursively references reinforcement related relationships relative representations research results reviewers rewards rich russell safe scenario schwartz semi setting shared shimkin show singh solved state strehl structurally structure subgoals subtasks successful support sutton tables tadepalli target temporal thank that their thrun trajectory transfer under useful using value valuefunction where with work working workshop wynkoop zero http://icml2008.cs.helsinki.fi/papers/258.pdf 3 Random Classification Noise Defeats All Convex Potential Bo osters adaboost arcing bagging bartlett berkeley boosting bradley breiman california classification comparison consistent constructing datasets decision department dietterich edge ensembles experimental filterboost jmlr large learning machine methods nips randomization references regression report schapire statistics technical three traskin trees university http://icml2008.cs.helsinki.fi/papers/552.pdf 154 Multiple Instance Ranking accepted advances afzelius analysis andrews annual applications arnby artificial axis broo carlsson classification clickthrough comparative computational conference cytochrome data dietterich discovery drug eighth engines future guengerich hofmann information insights instance intel international isaksson joachims journal jurva kjellander know kolmodin lathrop learning ledge ligence linear lozano machine machines mangasarian maron mechanistical metabolism mining multiple multipleinstance natural neural nilsson optimization optimizing parallel perez pharmacology predictions problem proceedings processing programming ratan raubacher rectangles references regulation review reviews role scene search sigkdd site solving state successive support systems theory tools toxicology tsochantaridis using vector weidolf wild with http://icml2008.cs.helsinki.fi/papers/272.pdf 27 Learning from Incomplete Data with Infinite Imputations abbeel account advances analytical approaches argyriou artificial assumptions attribute basic bhattacharyya carin chechik chosen classification combinations completion conclusion cone conference continuously convex data decision dependency devised distribution elidan experts feature free freely from function functions gated handling heitz hofmann icml imputations incomplete information intel international into investigated journal kernel kernels koller learning liao ligence logistic machine makes margin matrix means method methods micchelli minor missing mixture multi multiple neural only optimization optimizing order parameter parameterized perspectives pontil problem proceedings processing programming quadratically references regression regularizer research second shivaswamy simultaneously smola space statistics systems taken tenth that their theory uncertain using values variables view views vishwanathan where williams with workshop http://icml2008.cs.helsinki.fi/papers/111.pdf 96 Preconditioned Temp oral Difference Learning aaai advances advantage algorithms analysis approaches approximation artificial automatic averaged barto bertsekas bowling boyan bradtke carnegie city cityu comparison complexity computationally conclusion conference control convergence data difference differences discrete domains efficient eligibility errors evaluation event except expensive factors figure first frameworks function general generally geramifard grad have hong ieee ilspe ilstd incremental information initialized intel international introduction iterative jectories journal kaufmann kong learning least leastsquares ligence linear lspe lstd machine mellon memory methods mitchell more morgan national nature near nedi neural nonmarkovian number oral over paper perturbation pittsburgh policy preconditioned predict press proceedings processing proposed proximation rate recursive references reinforcement report research respectively saad scmmcg setting siam sixteenth sparse squares state such sutton systems tadic take tasks technical temp temporal terms than that this traces transactions transitions tsitsiklis twenty university using visit weights were whereas which with zinkevich http://icml2008.cs.helsinki.fi/papers/362.pdf 99 Maximum Likeliho o d Rule Ensembles asuncion causes class cohen consider decided documentation effective experiment fast icml incorrect induction known learning machine multi newman occasionally output references repository results rule says slipper software there version which http://icml2008.cs.helsinki.fi/papers/645.pdf 95 Apprenticeship Learning Using Linear Programming abbeel action advances advantages algorithm algorithms also analysis another appendix applications apprenticeship approach appropriate assume bagnell basis because been before behaved bellman better bowling boyd brevity cambridge cherng choice clearly column combined computing conclusion conditioned conference constraints convex cosumn counter decision define definition derivation diagonal diagonally different direction disadvantages disciplined discrete dominance dominant dual dynamic each equivalently exactly expectations experiment experiments extensions faster feinberg first flow following from functions future game given good grant hall handbook have here horn http implies imply indexed information international introduce inverse johanson john johnson large last learning lemma lemmas line linear lizotte lpal machine make many margin markov matlab matrix maximum measure methods most much mwal negative neural next notation note occupancy omitted optimization original other otherwise page pairs planning plug policy prentice presented press probabilities problems proceeding proceedings processes processing produces programming proof prove puterman puthenpura ratliff ready rearranged references reinforcement results reward robust same satisfied satisfies schapire schuurmans schwartz show showed shows simply since singular situations software solution sons spaces specific springer stable stanford state stationary stochastic strategies strict strictly suited syed system systems than that them theorem theoretic theory therefore they this unique university unknown unlike using variables variant variants vector wang where which wiley wise with work write zinkevich http://icml2008.cs.helsinki.fi/papers/216.pdf 6 -Support Vector Machine as Conditional Value-at-Risk Minimization accept achieves advances algorithm algorithms also amsterdam another applications apply approaches around assume assumption ball banking bartlett based berlin between boser bound burges cambridge centered chang change changed characterized check classification classifier classifiers clear colt comes computation compute concave conclusions condition conditional conditions constant constraint constraints continuity continuous contradicts convergent corner corners cortes crisp cruz current cvar data decreasing derived deterministic difference discrete distinct distributions does each enough ensured equivalent error essentially establish even exists expected express extension feasible figure finance finds finite first folee from function functions general generality generalization geometric given global globally good gotoh guyon have hence hermann holds homogeneous horst implies implying imposing increase increasing increment inequality interpretation introduced introducing involved involves iteration iterations ject jective journal jthay karush kuhn lack learning least lemma letting linear live locally loss lowing machine margin mathematics methods minimization model models nature negative nels networks neural nips nonconvex norm note number obtain omit only optimal optimality optimization origin pacific perez performance perturbation points positive press probability problem problems program proof proved quasi radius range references regarded reject respect respectively results risk rockafellar same scale scholkopf score second sense setting share show showed sign since sketch slack smaller smola solution solutions solving space springer statistical step strict strictly such support suppose takeda terminates terms than that then theorem theory there they this though threshold thus tipliers training tucker unless upper uryasev using value vapnik variables vector verlag weston when where which whinston williamson with within without http://icml2008.cs.helsinki.fi/papers/440.pdf 85 Sequence Kernels for Predicting Protein Essentiality allauzen approach available based bioinformatics brutlag chang chen ciaa cjlin csie detection efficient finite general homology http interactions ismb kernel library libsvm machines methods mohri motif noble openfst openkernel predicting protein references remote riley schalkwyk skut software springer state supplement support transducer understanding vector weighted http://icml2008.cs.helsinki.fi/papers/531.pdf 50 Training SVM with Indefinite Kernels algorithms analysis applications bartlett based bollmann boyd cambridge classification computations convex cristianini data database diffusion discrete edition feature ghaoui golub graepel graphs haasdonk handwritten herbrich hettich hiriart hopkins hull icml ieee indefinite infinite input intelligence interpretation introduction johns jordan journal kernel kernels kondor kortanek lafferty lanckriet learning lemarechal loan machine machines matrix methods minimization nips obermayer optimization other pairwise pattern press programming proximity recognition references research review sdorra semi semidefinite shawe siam space spaces springer support svms taylor text theory transactions university urruty vandenberghe vector with http://icml2008.cs.helsinki.fi/papers/166.pdf 138 A Dual Co ordinate Descent Metho d for Large-scale Linear SVM algorithm algorithms appear available bartlett bordes boser bottou carreras chang cjlin classifiers collins colt conditional coordinate crammer csie descent examples exponentiated fields gallinari globerson gradient guyon hsieh http icml jmlr larank large leon library libsvm linear loss machines margin markov maxmargin method multiclass networks online optimal papers problems projects random references report scale singer software solving stochastic support technical training ultraconservative vapnik vector weston with http://icml2008.cs.helsinki.fi/papers/361.pdf 112 Efficient Projections onto the 1 -Ball for Learning in High Dimensions about achieve active actually algorithm algorithms also annals appl applied approximate ascent athena because beck benchmark bertsekas best between bigram binary both bound boyd candes case categorization ccat class classification classifier codes collection comm compares comparing components compressed compressive computation computationally compute conference congress constrained control convergence convex coordinate cormen crammer cumulative data dataset decreasing descent design dimension donoho each ecat efficient equations error estimate estimated examples expected experiment experiments exponential exponentiated feasible feature features figure first friedman function gafni game gave generalized give given gives gorinevsky gradient hand hastie have hazan ieee industrial infinitesimal information interior international introduction invariance journal july kivinen label large lasso last learnability learner learning least left leiserson letters lewis linear logistic long loss lustig machine madrid maintained manuscript math mathematics method methods metric million minimal mirror mnist most multiclass nearly network nonlinear norm number online only onto operations optimization outperform outperforming output over pathwise pegasos playing plots point polyhedra prediction predictors press primal problems proc proceedings processing programming projected projecting projection projections pure ranking rate references regression regret regularization regularized report research results right rivest rose rotational royal sampling scale scientific seen selected selection sensing shalev shows shrinkage shwartz siam signal singer sizes society soft solution solver space spain sparse sparsest sparsity squares srebro stanford statist statistics stein step stochastic structures subgradient systems takes tarjan task teboulle technical terms test text this tibshirani time times topics total training true twentieth twenty underdetermined university unpublished updates using vector versus warmuth weight weights whatsoever while with yang zero zeroing zinkevich http://icml2008.cs.helsinki.fi/papers/160.pdf 87 Causal Mo delling Combining Instantaneous and Lagged Effects: an Identifiable Mo del Based on Non-Gaussianity analysis approach autoregression bentler bollen causal chang clustering comon component components concept cross demiralp econometric econometrica economics equation equations ernst esposito granger himberg hoover indea independent interscience investigating ject john karhunen latent letin methods modeling models multisub multivariate neuroimage neuroimaging oxford pendent processing references relations rinen searching series signal sons spectral statistics structural structure supplement time unified validating variables vector visualization wiley with http://icml2008.cs.helsinki.fi/papers/538.pdf 33 The Dynamic Hierarchical Dirichlet Pro cess blackwell blei conference dirichlet distributions dynamic ferguson international jordan lafferty learning machine macqueen methods models polya proceedings process references schemes statist topic variational http://icml2008.cs.helsinki.fi/papers/266.pdf 42 SVM Optimization: Inverse Dependence on Training Set Size advances analysis application bartlett bottou bounds bousquet bundle burges chapelle communications complexities conference convergence criteria data decomposition decoste descent discovery estimated examples excess fast formal gaussian gradient http ieee information international joachims karthik kernel knowledge large learn learnable learning lecun leon linear mach machine machines making mendelson methods minimal mining networks neural online optimization page pegasos platt practical press primal proceedings processing projects rademacher rates references regularized results risk scale scholkopf sequential shalev shwartz singer smola solver solvers srebro sridharan stochastic stopping structural support svms systems theory time tradeoffs training transactions ttic uchicago using valiant vector vishwanathan weston with http://icml2008.cs.helsinki.fi/papers/327.pdf 36 Efficiently Solving Convex Relaxations for MAP Estimation advanced algorithms analysis approximation barahona boykov brooks calculus cole combinatorics convergent convex cuts energy estimation european extension fitzpatrick goemans graph iccv images improved interactive jects jolly journal karzanov kolmogorov kumar mahjoub mathematical maximum message metrics minimization minimum nips olytop optimal oundary pami passing problems programming references region relaxations reweighted satisfiability segmentation semidefinite thompson torr tree using williamson http://icml2008.cs.helsinki.fi/papers/419.pdf 126 Memory Bounded Inference in Topic Models accelerated advances algorithm allocation american amsterdam analysis annals appear approach association based bayesian beal blei categories collapsed computer cvpr dirichlet distinctive distribution estimating examples features fergus ferguson from gaussian generative gharamani hierarchical hinton ieee image incremental inference information international jordan journal keypoints kurihara latent learning lowe machine minka mixture mixtures model models nakano neural nips nonparametric nunnink object perona problems process processes processing references report research scaleinvariant smem some statistical statistics systems technical tested training ueda university variants variational verbeek vision visual vlassis welling wgmbv workshop http://icml2008.cs.helsinki.fi/papers/542.pdf 136 No-Regret Learning in Convex Games abstracting acknowledgments actions advances agent agents algorithms analyzed annual another applicable application appropriate approximate areas ascent authors bandit behavior blum bounds brown carnegie cases centre chiefly chosen class coarse colt compact computational computer conference contending contrast convergence convex coordination correlated cortes could darpa decision department derived design designing developments deviate dimension dimensional directly discrete discussion discussions dissertation doctoral down duality during each early economic economics efficient efficiently equilibria equilibrium equivalence except explicitly extensive external feasible feedback fenchel first fixed followed forges form forms foster foundation from functions further future game games general generalized given gordon gradient grant greenwald guarantee guaranteeing have hazan helpful high however important incentives include including infinitesimal information instead interactions interest interested interesting internal international into jafari journal kalai kale kernel kernelized known large learning leave like line linear london loss lower lsecdam lugosi machine making manageable mansour mapping markov marks martin mathematics matrices mellon minimize most much multi networks neural never nips nonlinear noregret observed ocps online only outcome panel paper part particularly payoffs phase planning plans player points poker political prediction presented previous problem problems proceedings process processes processing program programming programs providence purposes rational real references region regret relationships repeated report represent representations requires resources school science separate sets several shalev shwartz singer sloan smaller solutions some stengel stoltz strategies study such support supported swap systems technical terms thank that their theoretic theory thereby these this time transformation transformations trees types university vapnik vector vempala very vohra were where which with work would write wrote yield zinkevich http://icml2008.cs.helsinki.fi/papers/497.pdf 38 Stopping Conditions for Exact Computation of Leave-One-Out Error in Support Vector Machines accuracy adaboost advances algorithm algorithms allow allows alpha analysis appear application applications approach attributes automatic available banana bartlett bhattacharyya bottou bound bounds bousquet boyd brailovskiy breast cambridge cauwenberghs chang chapelle cjlin classifier classifiers colt complexity computation computer computing conclusions conditions conf conference contribution convex cross csie cybernetics data decision decoste decremental dept design determine diabetis disc duan efficiency efficient efficiently empirical engineering error estimating evaluation exact expectation fast fields flare francisco freund gaussian gehl generalization german girosi haussler heart http hyperparamaters icml ieee image implementation improved improvements incremental information jaakkola joachims journal keerthi keerti kereval kernel kernels knowledge kruger large laskov learning leave library libsvm linear lunts machine machines making margin margins martin measures mendelson method methods minimal minimization mining model models morg muller murthy national nature needed networks neural neurocomputing nips obtained onoda optimal optimiza optimization osuna over performance platt poggio practical prec press probabilistic probability proceedings processing proposed ratsch reduce references regression related report research ring risk rules scale scholkopf science seeding selection sequential shevade signal significantly simple smola soft software solution solver speedup splice springer standard statistical statistics stopping support svms systems table taipei taiwan technical theory this thyroid time tion titanic tradeoffs training tuning twono university using validaton vances vandenberghe vapnik vector wagstaff wave with without workshop zhang http://icml2008.cs.helsinki.fi/papers/628.pdf 8 A Rate-Distortion One-Class Mo del and its Applications to Clustering accuracy acknowledgments advanced algorithms allerton allows also among analyzed appearances applications approach associations background based bekkerman best bialek bottleneck bregman bregmanian bubble building burges cast censor chechik chiba class classification clustering coherent colt combines communication complex compression computation computing conclusions conference constraints control convex correspond cover crammer data decisions defense demonstrated dense description different dimensional disambiguating dissertation distortion distribution doctoral document domain duin each elements enclosing esann estimating exemplar extended formulation framework friedman from ghosh given goes golland good gupta haystack hebrew high horn house icdm icml idea identifying illinois information japan jmlr large lashkari learning left lerton likelihood lkopf local locating material maximization maximum mccallum method methods model models more move multiple narrower needle network neural nips noise optimization other over oxford paral part particular people pereira phase platt points press previous problem proposed random range rate ratedistortion references regions regularizing robust scalable sequence sequential shawe showed siegelmann sigir singer size sizes slonim small smola social spheres standard subset support supported task taylor that theory this thomas through tishby tracting trades transitions univ university unsupervised upon used using vapnik variables vector vectors view weiss which wiley williamson with work zenios http://icml2008.cs.helsinki.fi/papers/489.pdf 58 Democratic Approximation of Lexicographic Preference Models approximation athena bertsekas bulletin committing complexity computation conclusions consistent decision democratic distributed dombi economics european fishburn future games given imreh instead journal just learning lexicographic lists lpms machine main management martignon methods model negative numerical observations operational orders paper parallel preference presented quesada references research results rivest rules schmitt science scientific strategies survey theory this tsitsiklis utilities vincze with work http://icml2008.cs.helsinki.fi/papers/168.pdf 11 Efficient MultiClass Maximum Margin Clustering aaai according accuracy advances aistats algorithm algorithmic altun analysis applied automating based benchmark better both bound categorization class classification cluster clustering collection component computational conclusions contruction converge convex cora crammer criteria cuts cutting data datasets decreases determine ding directly duda efficiency efficient enough epsilon evaluation evaluations existing experimental fast figure from generalized graph guarantee guaranteed hart higher hofmann however hyperplane icdm icml image implementation increases industrial information interdependent internet involved iterations jasa jective jmlr joachims john journal kelley kernel kowk large larson learning lewis linear linearly loss machine machines made malik margin mathematics maximum mccallum means method methods might mining missing moreover most much muller multi multiclass need neufeld news nigam nips normalized number output pami paper partitioning pattern performs pirical plane plot portals practical preliminary present principal programs propose provided rand rcvi real references related rennie research retrieval rose roughly sample samples scale scales scaling scholkopf schuurmans seconds segmentation semisupervised several seymore should show sigkdd simon singer size small smola society solving sons states stork structured sufficient support svms text than that theorem theoretical this time total training tsang tsochantaridis unsupervised valizadegan variables vector verifies vishwanathan wang webkb where wiley with world yang zhang zhao http://icml2008.cs.helsinki.fi/papers/264.pdf 108 Learning Diverse Rankings with Multi-Armed Bandits absorbing active addison advances affinity agichtein aldous algorithm analysis andrzejewski approximation armed auer average baeza bandit based behavior beyond bianchi boundaries brill budgeted burges carbonell carnegie cesa chains characterizing chen cikm classifiers clickthrough cohen computing cost coverage croft data deeds dependencies descent diversity documentation documents dumais ecole engines evaluation exchangeability exploration feedback fewer field finitetime finley fischer fisher flour freund from functions gael gaussian ghahramani goldberg goldstein golovin gradient graepel graph guiver hamilton herbrich horvitz hullender icml implicit improving incorporating independent information joachims journal karger khuller lafferty large lazier learning less letters machine margin markov mathematical maximizing maximum measures mellon method methods metrics metzler minka model models modern more moss multi multiarmed naacl naor nemhauser neto nips nonsmooth obermayer online optimizing ordinal performance personalizing precision press principle probabilistic probabilit probability problem proceedings processes processing producing programming query radlinski ragno random rank ranking rankings references regression related relevance relevant renshaw reordering report reranking research results retrieval retrieving ribeiro robertson saint schapire scholer search shaked siam sigir simple smooth softrank stochastic streeter submodular subtopic summaries support tasks taylor technical teevan term topics turpin university user using value vector versus walks wesley with wolsey wsdm xiii yates york zhai zhang http://icml2008.cs.helsinki.fi/papers/215.pdf 28 Fast Solvers and Efficient Implementations for Distance Metric Learning account accounting acknowledgments adaptive addition advances algorithms also analysis application applications applied approximate areas ball based bengio beyond blitzer borchers both bottou boyd brazil cambridge chopra class classes classification clustering comp competitive completely computer conference consistent convex corvallis cross csdp cvpr data davis decker describing dhillon diego different digits dimensional dimensionality discriminant discriminatively discussion distance document does domain each ebased efficient eleventh erger erghe error especially examples expected experiments explored exploring extended extensions face field figure finally former foundation framework from frome full functions further generally globally goldb gradient grant gray great grows hadsell haffner hand handwritten hastie have high highly hinton hofmann hope iccv icml ieee image important improvement information input inputs integrating intel interest international into investigation issue jain janeiro ject jordan kaufman kero knowledge kulis large larger largest last latter learned learning least lecun library ligence linearly lkopf lmnn local lower machine machines mahalanobis malik many margin mateo method methods metric metrics mnist model moore more morgan multiple national nearest neighb nels neural newsgroups number observed onent onents only optimization original other ourhood overfitting pami paper parameters parts pattern platt practical press prevents prior problems procedures proceedings processing programming prove quadratically rate rates recognition reduction references regularization relatively research result retrieval roweis russell salakhutdinov saul scale scales science semidefinite sets several shap should showing shows side significant simard similiarty singer skyrockets smola software solver space speedups spur subspace support supported systems take test testerror testing that theoretic these they this though tibshirani torresani total train training transactions transformation trees under university upon used useful using validation value vandenb vanishing variance vector verification vision ways weinb weiss well when while with work xing yalefaces yang yield http://icml2008.cs.helsinki.fi/papers/402.pdf 63 Accurate Max-Margin Training for Structured Output Spaces bordes bottou gallinari icml larank machines multiclass references solving support vector weston with http://icml2008.cs.helsinki.fi/papers/554.pdf 34 Hierarchical Kernel Stick-Breaking Process for Multi-Task Image Analysis advances algorithms analysis annals application approximate bayesian beal biometrika blei breaking cheng clustering college component computational conference correlated ding dirichlet dissertation doctoral dunson ferguson figueiredo gatsby image inference information international jordan kernel knowledge lafferty learning london machine means methods models murino neural neuroscience nonparametric park principal prior problems proc process processing references some statistics stick system topic under unit university variational with http://icml2008.cs.helsinki.fi/papers/367.pdf 117 Rank Minimization via Online Learning achieve address agarwal algorithm algorithms amaldi american analysis another application applications approach approaches approximability approximate approximation arlington arora ascent austin bach barvinok based believe better between boosting bound bounds boyd candes caramanis colorado colt comp comput computer conclusion conference connection constraints control convex convexity correspondence course covering decision decoding decomposition denver dependence dept desirable dhillon dimensionality distance dual efficient equations euclidean factor factorization fast fazel first focs foundations fractional framework freund from further future general generalization generalized gradient guaranteed guarantees hankel hard hardness hazan heuristic highly hindi however html http icml ieee importance improved infinitesimal information interest interesting introduce intuitively jain jordan kalai kale kann kernel kulis learning least line linear littlestone logarithmic majority mathematical matrices matrix meka meta method methods minimization minimizing minimum motivated multiplicative mwsurvey negative newton nips nonlinear nonnegative norm notion novel nuclear obtain obtained obtaining online optimization optimizationonline order over packing paper parrilo particular perspective perturbed plotkin polyhedral predictive preprint present press primal princeton prob problem problems programming properties provable provides pubs question rank recht reduction references regret relations relaxed report require rigorous rockafellar satisfied satisfy saul schapire science second seems semidefinite sets seung shmoys show society solution solutions solving squares strong stronger such survey sustik symposium syst system systems tardos technical texas that theorem theorems theoretic theoretical theory there this tighter transactions type unable understanding univ university unsatisfied update used using variables vempala virginia warmuth weighted weights weinberger were whether which with within work would zero zinkevich http://icml2008.cs.helsinki.fi/papers/588.pdf 40 An Asymptotic Analysis of Generative, Discriminative, and Pseudolikeliho o d Estimators aaai advancement advances agreementbased aistats algorithms allows analysis analyzed approximate artificial association asymptotic bayes belief benefits berkeley besag between bishop bouchard california cambridge characterization classification classifiers clean clustering commonly compare comparison composite computation computational computer conclusion conditional conference considering contemporary cvpr data department different discrimina discriminative distributions druck enables estimating estimation estimators exponential families family fields generative graphical have hybrids icml idea inference information intel international intuitive jaakkola jordan journal klein labeling lafferty lasserre lattice learning liang ligence likelihood likelihoods limited lindsay logistic machine marginal matching mathematics mccallum methods minka model models moment multi naive neural nips outcome partitionings pattern pereira piecewise press principled probabilistic processing propagation pseudo random recognition references regression report research segmenting setting space statistical statistician statistics sutton systems technical trade training treereweighted triggs uncertainty undirected university used vaart variances variational varin vision wainwright wang which willsky wrong http://icml2008.cs.helsinki.fi/papers/396.pdf 17 The Skew Sp ectrum of Graphs accuracy acids activity addison advances alternatives appear applications approach architectures aromatic artificial arxiv available bach based benchmarks best bioinformatics biological biotechnology bispectrum bold bonchev borgwardt breach brenda cambridge cellular chang chem chemical cjlin classes classification clausen collins colt commutative compadre comparison compound compounds comput computation conf conference convolution corelation correlation cross csie cvpr data database datasets debnath department descriptor developments diaconis dimacs discriminability dissertation distributions doctoral duffy each ebeling ectrum efficient eleventh energies enzyme errors evolution explorations fast features feedforward ffts finite flach fold four fourier function functions fundamentals generalized gordon graph graphs gremse group groups guestrin guibas hansch harchaoui hardness heldt heteroaromatic higher homogeneous hong howard http huang huhn hydrophobicity icdm ideker ieee image images indicated inference information instances institute intel international intl introduction invariant irvine james jebara ject kakarala karypis kerber kernel kernels kondor kong kriegel kumar language lecture library libsvm ligence london lopez machinery machines mathematical mathematics mining modeling molecular multiob mutag mutagenic natural nature network networks neural nips nitro nodes novak novel nucleic number online orbital order parentheses path percent permutations prediction press probability proc proceedings processing protein publishers random reduced references relationship repetitions representation representations research results retrieval rockmore rotationally rouvray rtner schomburg schonauer science segmentation series sharan shawe shortest shortestpath shusterman sigkdd skew smola social software some spaces spectrum springer standard state statistics structure structured support survey symmetric symmetries systems table taylor their theor theoretic theory through tomkins tracking transactions transforms translationally triple updates validation vector vishwanathan wale walk wesley with workshop wrobel http://icml2008.cs.helsinki.fi/papers/237.pdf 66 Laplace Maximum Margin Markov Networks accuracy achieve adaptive adoption aistats algorithms also altun among andrew application assumed averaging bagnell bartlett bayesian bennett boosting bound carreras case chan class classification classifiers coefficients collins conclusions conditional constants convex crfs data density different dimensional direct discrimination distribution easily effect efficient enjoys entropy error estimation expected exponential exponentiated extragradient features fields figueiredo figure fitting formalism from ganapathi general generalizable generalized gives globerson gradient graphical guestrin hard hidden high hofmann however icml ieee improved inseparable instead interdependent jaakkola jebara jmlr joachims jordan julien kaban koller labeling lacoste lafferty lanckriet langford laplace lapm largmargin lasso lead leads learning lebanon left less letters likelihood linear linearly logistic machine machines magnitudes makes mangasarian margin markov maximum mcallester mccallum megiddo meila method methods minka model modeling models more most much networks nips nonregularized normal online optim output over pami parameter parameters pattern pereira phillips posterior posteriori prediction predictive prior priors probabilistic programming properties proposed random rates ratliff ravikumar recognition references regression regularization regularized relaxations respect results right rise robust royal scalable schapire seeger segmenting selection sensitive sensitivity sequence sets show shows shrink shrinkage shrunk small smaller softw spaces spacial sparse sparseness sparsified sparsity species stability stable standard statist structure structured subgradient subsumes supervised support szummer taskar that them they this thresholding tibshirani towards training trans tsochantaridis unlike using vasconcelos vector wainwright weighting weights where which with zero zeros zinkevich http://icml2008.cs.helsinki.fi/papers/167.pdf 105 Listwise Approach to Learning to Rank - Theory and Algorithm addison baeza bartlett bounds classification convexity information jordan mcauliffe modern neto references retrieval ribeiro risk technical wesley yates http://icml2008.cs.helsinki.fi/papers/679.pdf 15 Beam Sampling for the Infinite Hidden Markov Mo del acknowledgements adaptation adoption agglomerative algorithm allows also alternative american analysis annals anonymous another antoniak applications applied architecture association available bayesian beal beam because believe bengio blei buffet cambridge carlin cavendish century chain choice clustering coalescents comments communications computation computational computing conclusion conjugate construction constructive conveniently converges current currently data datasets definition dempster density department dirichlet draws dynamic edition effectively efficient efficiently embedded encourage enough ensemble escobar estimation exploring extension extensions faster feature finally finite fitzgerald flexible frasconi from gaussian gelman generalization ghahramani gibbs graphical griffiths handles have helpful hidden hierarchical http idea ieee ihmm implement incomplete indian inference inferring infinite input inputs inspeech inspiration introduced involves ject jectories jordan journal jurgen laboratory laird large latent learn learning like likelihood linear machine mackay made markov maximum methodological methods microsoft mixture mixtures model models more moreover neal nips nonparametric numerical output paper pomdps possibilities press priors problems proceedings process processes processing programming promising rabiner rasmussen recognition recursive references report research resulting reviewers robust roweis royal ruanaidh rubin sample sampler sampling scholarship scott selected sequences series sethuraman showed signal simulation sinica slice slices society software springerverlag state statistica statistical statistics stern straightforward supported systems technical than thank that their this tutorial underway university used using various very walker west whole williams with work would york http://icml2008.cs.helsinki.fi/papers/145.pdf 81 Pairwise Constraint Propagation by Semidefinite Programming for Semi-Sup ervised Classification aaai advances aistats american analysis ando application approach background based basu belkin beyond bilenko borchers bousquet boyd cambridge cardie chapelle chen chung classes classification cluster clustering collapsing colt combining computer cones consistency constrained constraints convex cristianini csdp data density design dhillon dissimilarity distance ensembles equivalence examples fields framework from functions gaussian geometric ghahramani ghosh global globerson goldberg graph graphs harmonic hertz hillel iccv icml ijcai information instance integrating jaakkola jordan journal kamvar kernel kernels kero klein knowledge kondor kulis label labeled lafferty learning level library literature lkopf local machine machines madison making manifold manning markov mathematical matlab matrices means methods metric mixed mooney most multiple nels neural niyogi noise nonparametric optimization over pairwise partially partitions pattern press prior probabilistic processing programming propagation random references regularization relations report research reuse robust rogers roweis russell schroedl sciences sedumi semi semidefinite semio semisupervised separation shawe shental side sigkdd sindhwani smola society software space spectral strehl sturm supervised support survey symmetric systems szummer tang taylor technical theory tong toolbox university unlabeled using vandenberghe vector wagstaff walks weinshall weston wisconsin with wright xing zhang zhou zien http://icml2008.cs.helsinki.fi/papers/530.pdf 65 Discriminative Structure and Parameter Learning for Markov Logic Networks aaai activity advances alberta alchemy aleph andrew application approach areas artificial banff based bayes bayesian between bfgs bonn boston bottom burnside cambridge canada castro chemical clausal comlab computer computing conf constraints corvallis costa curves cussens data databases davis dehaspe density department dependencies deterministic discovery discriminative distribution domingos dutra dzeroski ecml efficient electrical engineering entailment entropy estimation european examination faculty feature first foil formalisms frasconi from ganapathi generalized generation germany getoor goadrich handling http icml inductive inference information integrated integrating intelligence intl introduction invariance inverse journal kernels kersting kfoil king knowledge koller kramer landwehr large learning limited linear ljubljana logic lowd machine machlearn manual margin markov master mathematic maximum memory method mihalkova modeling models mooney muggleton naive natl network networks neural nocedal noise nutshell omnipress optimization order oucl page passerini phillips pkdd poon practice precision press principles probabilistic proc proceedings processing progol programming programs raedt recall references regularization regularized relating relational relationship report research richardson rotational ruckert rule rules scalable scale schapire science selection simple singla sound species springer srinivasan statistical sternberg stochastic structure successes system systems taskar technical thesis training twenty university using verlag washington weight with workshop http://icml2008.cs.helsinki.fi/papers/324.pdf 24 Hierarchical Sampling for Active Learning about active adaptive agnostic algorithm allocation apply atlas balcan based before best between beygelzimer blei both bottom bound bounds broder build castro changes classification cluster clustering clusterings coarse cohn colt committee complexity conference count csail dasgupta data dirichlet distance divergence document documents each effects errors figure four freund general generalization greater hanneke help http icml improve improving indeed infer information initial international into jmlr jordan jrennie know kullback label ladner langford last latent learning ledge leibler length less logistic machine machines management margin method minimax mixture model models monteleoni more neural newsgroup newsgroups nips normalization normalize normalized notion notions nowak observed order outperformed pair pairs pairwise partitioned pedersen people performance plots poor posterior practical preprocessing processing prunings quality query random ratio references regarding regression representation representations represents same sample sampling schohn schutze selective sets seung shamir shows simple specialized speed support systems tasks technique techniques test text these thresholding tishby topic trained training tried unit used using utility various vector vectors velipasaoglu ward were wise with word yielded zhang http://icml2008.cs.helsinki.fi/papers/241.pdf 74 Gaussian Pro cess Pro duct Mo dels for Nonparametric Nonstationarity advances algorithms approximate bayesian berger bernardo bishop cambridge canu classification conference dawid dependent dissertation doctoral family filtering gaussian gibbs goldberg haykin heteroscedastic higdon icml inference input institute international john kalman kern learning machine massachusetts minka modeling networks neural noise nonstationary oxford press proceedings process processes processing references regression smith smola sons spatial statistics swall systems technology treatment university wiley williams with york http://icml2008.cs.helsinki.fi/papers/459.pdf 111 Compressed Sensing and Bayesian Exp erimental Design advances allerton appear approximate baraniuk bayesian bethge bust cand carin chang compressed compressive davenport design domain donoho duarte elad exact expectation freeman frequency from gerwinn highly icml ieee images imaging incomplete inference information jection jections joint journal kelly laska learn learning linear machine macke magazine minka model modeling models neuron nips optimal optimization optimized pixel practical principles prior proceedings processing propagation random reconstruction recovery references relevance research romberg sampling seeger sensing signal simoncelli single snowbird sparse sparsity spie spiking statistics steinke takhar theo tipping trans transactions tsuda uncertainty vector wavelet weiss with workshop http://icml2008.cs.helsinki.fi/papers/638.pdf 104 Training Restricted Boltzmann Machines using Approximations to the Likeliho o d Gradient adapting advances algorithm algorithms analysis annals application approximation approximations architectures artificial august belief bengio bergstra berkeley besag boltzmann borenstein bottom cambridge carreira collaborative combining computation computer conference connectionist contrastive convergence cortes courville data database decreasing deep delalleau dept digits dimensionality dirty divergence divergences down dynamical empirical ergodicity erhan evaluation experts exponential factors families family fast field filtering foundations gehler generalizing gradient graphical greedy handwritten harmoniums harmony hinton holub icann inference information intel international ject jordan journal justifying lamblin larochelle layer learning lecun ligence likeliho machine machines madrid many markovian mathematical mean method minimizing mnih mnist model models monro montr montreal murray neal nets networks neural osindero pattern perpinan pictures poisson popovici press probability problems proceedings processes processing products quantitative rapidly rate rates recognition reducing references report restricted retrieval robbins rosen royal salakhutdinov science segmentation sharon smolensky society spain statistical statistics stochastic stochastics systems technical theory training ullman universit using variation variational vision wainwright welling wise with workshop younes yuille http://icml2008.cs.helsinki.fi/papers/202.pdf 156 Learning to Classify with Missing and Corrupted Features ability algorithms bianchi boyd cambridge carr cesa compact conconi convex exponentialsize generalization gentile ieee information lancia learning line optimization press references relaxations report sandia theory transactions university vandenberghe http://icml2008.cs.helsinki.fi/papers/460.pdf 32 Statistical Mo dels for Partial Memb ership ailab aistats aleks algorithms allocation alternative analysis approach bayesian beal bezdek biol blei buffet buntine cambridge carlo certainly chain chapter cluster clustering clusters collins component components computational conditional conditioned continuous control coregulation corresponding dasgupta data dirichlet discrete draw each eisen erosheva experts exploring exponential expression family feature fienberg finite function fuzzy gasch gatsby gene generalization genome ghahramani griffiths hall heller hierarchical hinton http icann indian inference infinite info information jakulin jasa jmlr jordan kluwer kmeans kosko lafferty latent learning lncs mackay markov matrix membership memberships methods mixed modeling models monte more neal networks neural neuroscience nips nonparametric number objective only overlapping partial pattern pnas point politics prentice press principal probabilistic process processes products promising publications recognition references report represent sample schapire scientific sets springer systems technical that then theory those through toronto unit university using values where which with would yeast zadeh zero http://icml2008.cs.helsinki.fi/papers/341.pdf 20 Online Kernel Selection for Bayesian Reinforcement Learning aaai acknowledgements action adapted adaptive alberta algorithms also anonymous approach approximate approximation asymptotic author automatically award barto bayesian been best bishop bryan build cambridge career cases classes common computation computing conclusion covariance davy details developed difference difficult discussions dissertation doctoral domain draft dynamic ecai edmonton edmunds efficient elements empirical engel estimation even evolutionary exhibit feedback fellowship first friedman from function functions furthermore gaussian generative genton gordon graduate handpicked hastie have hawaii hebrew help helpful higher highly http hyperparameters icml ieee implementation improve improving initial international introduction journal jung kernels knowledge large lasso learned learning least leffler leveraging library littman loth machine mannor meir menlo model models must nips nonlinear novel number online original overall paper parameterizations park part pattern performance perspective poignant polani press preux prior priori proc procedure proceedings process processes processing programming promising providing radar rasmussen recognition references regression reinforcement relocatable representations research results reviewers reward rlai salmond scholkopf seeger selecting selection several signal significant significantly silverthorn simple smith smola some sparse springer squares state statistical statistics stone structured suggest support supported sutton symposium task temporal than thanks that these this tibshirani tobias trials tutorial ualberta university using vector very where white whiteson will williams with without work would yaakov yielded york http://icml2008.cs.helsinki.fi/papers/513.pdf 142 Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer aided algorithm alternating annual approach artificial asymmetric bezdek brew cancer cascade cascaded chen class classification classifiers college comput computer conference convergence cunningham cvpr data decision detection discovery document dublin dundar evaluation feuer fisher fung ghafoor grimaldi hathaway huang hyperplane icip ieee image induction intelligence international jemal joint journal kasif kernel knowledge kubota learning machine manevitz mathe matical mika mining muller murray murth neural nips obligue okada optimization parallel pattern periaswamy philadelphia proceedings programming ratsch recognition references report research retrieval salganicoff salzberg saumuels sigkdd sparse speaker statistics svms system technical techniques thun tiwari trees twelfth university verification vision ward yousef zhou http://icml2008.cs.helsinki.fi/papers/400.pdf 30 Fast Nearest Neighb or Retrieval for Bregman Divergences allocation application banerjee beygelzimer blei bregman clustering common convex cover dhillon dirichlet divergences finding ghosh icml jmlr jordan kakade langford latent merugu method nearest neighbor point references relaxation sets trees with http://icml2008.cs.helsinki.fi/papers/668.pdf 144 Closed-Form Sup ervised Dimensionality Reduction with Generalized Linear Mo dels acoustics actual algorithms alon alternating alternative although always analysis another applications approach approaches approximate artificial assigning auxiliary bayesian bernoulli between binary blitzer bouchard bound bounds case cecchi center classification clearly closed collins competition component conclusions conference converge coordinates correlation dasgupta data dataset datasets decomposition deriving different dimension dimensionality direction distance each easily efficient elastic elasticnet eleventh empirical error experiments exponential extended extension extensions factorization family fast features figure fmri form framework from function future gaussian generalization generalized generalizing gordon grabarnik group guaranteed handle hastie homepage html http hybrid icassp icml include inference instead instructions intel intelligence international investigation iteration jaakkola jama jordan journal kernel labels large learning ligence like linear logistic loss machine margin matrix measured meila members methods metric minimization mixed mixture model models more multiclass multinomial nearest needed negative neighbor ninth nips nonnegative norm obtain omnipress only orlitsky other paper pbaic pereira performance pitt pittsburgh plugging poisson possible potential predicted previous principal problems procedure proceedings processing promising proposed proposes puerto quadratic recent reconstruction reduced reduction references regression regularization remains report research response results rico rish royal rules saul schapire schein schemes selection series seung shen show signal similar simple simply sixth society soft softmax sparse speech statistical statistics such supervised support technical techniques tesauro tested that their this thus type types unconstrained ungar unifying update uses using variable variables variational variety various vector watson weight weighted weinberger with work workshop york http://icml2008.cs.helsinki.fi/papers/632.pdf 131 An Empirical Evaluation of Sup ervised Learning in High Dimensions active algorithm algorithms bagging bauer boosting bordes bottou breiman caruana classification classifiers comparison empirical ertekin fast forests freund icml jmlr kernel kohavi large learning margin mizil niculescu online perceptron random references schapire supervised using variants voting weston with http://icml2008.cs.helsinki.fi/papers/180.pdf 86 Local Likeliho o d Mo deling of Temp oral Text Streams addison advances alternatives american appropriate association asymptotic authorship bandwidth becomes benefits bias categorization changing chapman chen class collection concept concepts conducted conference considering contain contrast counterparts data dataset datasets demonstrate disagreements discussion disputed distribution documents domingos drift drifting email emails empirical encer enchmark estimation estimator estimators examine experiments exploiting extension extreme federalist filtering forman foundations framework global goodman hall harvard hastie helmb hulten illustrate including increase increases inductive inference information joint jones journal kernel language large learning lewis likelihood ling loader local long machine manning many methods minimizing mining model models more mosteller mulligan natural neural neville news noticeable number often optimal oral over pang paper petsche phenomenon potential practice press proc processing properties provides rather rating references regression relational relationship relationships removing research resp rivest rose scales scenario schutze seeing sentiment sharan sigir sigkdd size smoothing spam springer stamped standard stars stationary statistical stories streams study such systems tackling technical techniques temp text textual than their them this tibshirani time tracking tradeoff transfer university validity variance varying wallace wand webkdd webpages wesley with workshop yang http://icml2008.cs.helsinki.fi/papers/290.pdf 68 Active Reinforcement Learning abbeel algorithm apprenticeship brafman exploration general icml inaccurate learning models near optimal polynomial quigley references reinforcement tennenholtz time using http://icml2008.cs.helsinki.fi/papers/611.pdf 53 Semi-supervised Learning of Compact Do cument Representations with Deep Networks access acknowledgments adapting algorithm algorithms allocation allows alone also american amount analysis applications applying approximations artificial aspects authors backprop bacterial based because belief bengio between beyond binary blei bottou boureau capture capturing cheap chopra classes clustering codes compact compare complex computation compute contrastive covariance data dataset datasets deep deerwester dependencies digestive dimensional dimensionality dirichlet discriminative divergence documents dumais editors effective efficient energy engines even experts exploited fast faster feature features figure finds four framework from furnas future gaussian gehler graphical greatly greedy hard harshman hashing have highdimensional higher hinton hofmann holub icml important indexing infections information insights intel interested interesting ject jmlr jordan journ kernels labeled labels lamblin landauer larger larochelle latent layer learn learning lecun level leveraging ligence like linear make minimizing model models more most much muller musculoskeletal mycoses neoplasms nets network networks neural nips numerous ohsumed only osindero outperform pages parasitic partially poisson popovici possible press probabilisitc probabilistic proc processes produce produced products propagating proximity ranking ranzato rate rbms recognition reducing references report representation representations results retrieval rich robertson salakhutdinov scaling science search semantic sesms shallow shown sigir similar simple society some sparse springer stats store structure such suggestions system task technical test than thank that these they this through toronto towards trade trained training trains tricks uncertainty unified unlabeled unsupervised using very virus walker weighted welling when which wise with word words work workshop would http://icml2008.cs.helsinki.fi/papers/279.pdf 64 Training Structural SVMs when Exact Inference is Intractable accelerated acknowledgments addition algorithms alignment altun ambiguous analyzed anguelov appear appl approx approximate approximations automated avoid award bayes belief benchmark bianchi bonn boolean boros boutell boykov brown carreira categorization center cesa challenge chang change chatalbashev cjlin classes classification clustering collection collins combining compared comparison complementation completely computer concepts conditional conference connected coreference csie culotta cuts cvpr data daum dependencies detection discrete discriminative distinguish elber elisseeff emnlp empirically energy exact experimental experiments explored factor fast fields finite finley first flow focusing fractional francisco from functions gemert gentile germany geusebroek gift gradient graph graphcuts greedy guestrin gupta hammer hansen hebert heitz hidden hierarchical hofmann http icml ieee image images increasingly inference inferior information integer intel intelligent interdependent international intractable jmlr joachims kaufmann kernel known koller kolmogorov kulesza kumar label labeling labelled lafferty langford large learn learning lewis library libsvm ligent linear loopy mach machine machines marcu margin markov math mccallum measures mediamill method methods minimization minimizing modeling models morgan most multi multimedia multiscale multivariate murphy naaclhlt natural networks nips nonsubmodular optimization oracles order output pami paper parsing pattern pearl pennsylvanias perceptron pereira performance perpinan persistency piecewise pillardy pittsburgh plausible practice predictive preserving press probabilistic problem problems proc program programming propagation properties protein pseudo publishers quadratic random reasoning recognition recomb references relaxation relaxed report research resolution resulting retrieval reuters review robust roofduality rose roth rother scan scene schmidt schraudolph search searn segmentation segmenting semantic separation sequence shen showing significantly simeone smeulders snoek society software solutions spatial state stochastic structural structured supervised support supported sutton svms synth systems table taskar tech technical tendency test text themselves theoretical theoretically theory this through train training tsochantaridis under used variables vector vishwanathan vision well weston when where wick with work worring yahoo yang yeast yield york zaniboni zemel http://icml2008.cs.helsinki.fi/papers/536.pdf 157 Multi-Classification by Categorical Features via Clustering advantage agreement application applied arxiv asuncion average based bayesian best between blanchard bottleneck bound cases categorical choice classification clear clearly close coefficient colt compare comparing completely computation conclude correla correlation corresponding cover dard data dissertation doctoral elements empirical entropy estimation factorizations fail feature features figure fleuret generalization hammer html http indices information jmlr john langford learning level machine matrix maurer mcallester method mlearn mlrepository mutual neural newman nips normalized note occam occurrence other over paninski parameter performs practical prediction preprint properties provides ranking references repository sabato section seldin sets shalev shamir shwartz significant similarly single slight slonim small some sons srebro stan still subsets success suggested superior test theorem theorems theory there this thomas tion tishby tutorial value where wiley with http://icml2008.cs.helsinki.fi/papers/490.pdf 70 The Many Faces of Optimism: a Unifying Approach algorithms auer bounds brafman learning logarithmic nips online ortner references regret reinforcement tennenholtz undiscounted http://icml2008.cs.helsinki.fi/papers/574.pdf 89 Sparse Bayesian Nonparametric Regression adaptive american analysis applebaum association atomic barndorff basis bayesian calculus cambridge chen decomposition distributions donoho feature figueiredo gaussian ghahramani griffiths ieee intel inverse isba journal latent learning ligence likelihood machine meeting modelling models nielsen nonconcave nonparametric normal oracle pattern penalized press proceedings processes properties pursuit references review saunders scandinavian selection siam sollich sparseness statistical statistics stochastic supervised transactions university valencia variable volatility world http://icml2008.cs.helsinki.fi/papers/487.pdf 69 Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs aaai about acquires action actions active adjust agent algorithm algorithms also analysis analytic andre anytime application approach approaches approximate approximations asru authors averse avoids based bayesian beetle before brafman cambridge carlo case cassandra conclusion considered continuous converge convergence converges converts correctness criteria criterion data dearden decision decisions delayed determine developed dialog dialogue discrete discusses discussion dissertation distribution doctoral does domains doshi doucet each easily ecml efficient entire environments estimate even experience exploration extended fast figure formal forward framework frequently friedman from fully function gordon graphical guaranteed guarantees handle have heuristic heuristics hidden hoey hong however human icml ieee ijcai incomplete incremental information into inverse isaim iteration iterations jaulmes journal kaelbling kakade knowing kong learner learners learning line littman make management mansour markov mdps medusa metaqueries method millet mistakes model modelbased models modeluncertainty monte moral more multi near needed nonstationary observable occasionally only optimal outlines over parameter parameters partially passive performance peters pineau point policies policy pomdp pomdps possible poupart precup press prior priors probabilistic problem proceedings process processes processing provide provided providing queries questions randomly reasoning recent recently references regan reinforcement report representing requesting resamples resets results rewards risk riskbased robustly russell samplers sampling sato scale scaling scenario scoring search selection sequential several shani shimony shows shuttle similar since solution solutions spoken state states stationary strehl strens structures sufficiently summary taking technical that their thinking throughout thrun time true uncertain university update updated updates updating using value variables variational visited vlassis watkins williams with without work workshop young http://icml2008.cs.helsinki.fi/papers/687.pdf 26 Actively Learning Level-Sets of Composite Functions able academic active actively advances aggregates algorithm analyis analysis angluin anisotropy appears application applied approximate arises astrophysical based bennett better between biosurveillance both boundaries boyd bryan cambridge candidate candidates carnegie chapter choosing combined computation computer concept conclusions conference confidence constraints corresponds cosmological counterterrorism cressie current data davis describe described design determination developed different dimensional dissertation doctoral edition efficiently eight emission essence exotic experiments fienberg first fisher foreground from function functions gain galaxies gaussian guestrin have hedges heteroscedastic heuristic heuristics icml identifying including indicate inference information international joint journal kersting learn learning level likely locating london luminous machine mackay many maxvarstraddle mellon meta methods microwave minimizing mining models monitoring moreover most multiple naturally near necessary neural notz number objective observable observations oliver optimal other outperforms parameters perform phenomenon physical placements point potential press probe probes problem problems proceedings process processes processing properties queries ramakrishnan random rasmussen references region regression related research review santner scrutinizing sdss selection sensor sequential sets several shmueli showed siam single situations sources spatial specific spergel springer statistical statistics straddle streams supernova supplemental surface synthetic systems target tegmark than that this three threshold typically university using variance weighted were which while wiley wilkinson williams with wmap workers year york http://icml2008.cs.helsinki.fi/papers/355.pdf 119 The Pro jectron: a Bounded Kernel-Based Perceptron ability advances algorithms best bianchi budget caelli cauwenberghs cesa cheng conconi conference decremental generalization gentile hyperplane ieee implicit incremental information kernels learning line machine neural online perceptron poggio proc processing references schuurmans simple support systems theory tracking trans vector vishwanathan wang with http://icml2008.cs.helsinki.fi/papers/511.pdf 120 Efficient Bandit Algorithms for Online Multiclass Prediction abound advances adversarial aggressive algorithm algorithmic algorithms analysis annual april armed arriaga artificial attributes auer bandit bandits bianchi bounds brain casino cesa class classification computation concepts conference contextual continuumarmed convex crammer dekel descent discrete dual duda elisseeff epoch european exponentiated fink flaxman focs freund gambling gradient greedy hart hypothesis information interclass international irrelevant january jection journal kalai kernel keshet kivinen kleinberg langford large learn learning linear littlestone mach machine machines margin mcmahan method model multi multiarmed multiclass multilabeled nearly networks neural neurocomputing nips online optimization organization pages passive pattern perceptron perspective predictors press primal probabilistic problem problems proceedings processing psychological quickly random recognition references reprinted research review rigged robust rosenblatt scene schapire setting seventh shalev sharing shwartz siam singer sixteenth statistical storage support symposium systems theory threshold tight ullman ultraconservative using vapnik vector vempala versus warmuth watkins weston when wiley without zhang http://icml2008.cs.helsinki.fi/papers/121.pdf 109 Autonomous Geometric Precision Error Estimation in Low-Level Computer Vision Tasks acquisition advances amherst analysis asymmetries autonomous beardsley bishop brown burschka cambridge compilation computational computer conference copyright corrada decorrelation department digital eccv elevation elsevier emmanuel estimates european exploiting extended from hager horizontal ieee image improving indexing intel learning lengths ligence machine massachusetts matching measurement model models optical ostapchenko part pattern pinette proceedings recognition references report reserved rights schultz science sequences springer stereo switzerland technical techniques terms torr transactions uncertainties university verlag vision zisserman zurich http://icml2008.cs.helsinki.fi/papers/236.pdf 10 A Decoupled Approach to Exemplar-based Unsup ervised Learning anal analysis approach athena banerjee bengio bertsekas boyd bregman cambridge carreira cheng clustering comaniciu components convex cornuejols delalleau dhillon distributions divergences edition feature finance finding gaussian ghosh have ieee intel isotropic journal learning ligence mach machine marcotte mathematics mean meer merugu methods mixture mixtures mode modes more networks neural nips nonlinear optimization pattern perpin press programming references research risk robust roux scientific seeking shift space than toward trans tutuncu university vandenberghe vincent williams with http://icml2008.cs.helsinki.fi/papers/257.pdf 47 Learning All Optimal Policies with Multiple Criteria abeel acting added agents ainslie algorithms applications apprentice approach artificial barto bellman bonn breakdown cambridge cassandra clarkson computational constrained criteria criterion decision decomposition different discounted discrete domains dynamic entire expecta feinberg gabor geometric geometry germany hull icml independently intel introduction inverse journal kaelbling kalmar learning ligence littman machine mannor markov massachusetts mathematics maxima maximum models mult multi multicriteria natara negative observable operations order over partially planning preferences press princeton proc programming pull random recover recurrence reinforcement research rewards rewrite russell sampling schwartz sets shimkin shor simplify single stochastic sutton szepesvari tadepalli tion university washington weighted which with zimdars http://icml2008.cs.helsinki.fi/papers/304.pdf 148 Learning to Sportscast: A Test of Grounded Language Acquisition andre bailey barnard binsted blei cogsc commentator development duygulu embodied feldman forsyth freitas herzog ishii lakoff league lexical luke magazine modeling narayanan references rist robocup simulation systems tanaka three http://icml2008.cs.helsinki.fi/papers/399.pdf 77 Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression about adapted agarwal algorithm algorithmic american analysis annals appear applicability application applications approach approaches aspects available bennett beyond bilevel both buchinsky cambridge care carry changes chebycheffian closed complete computation computing conference contributions convex cross customer cvpath dashed data davenport details development does draft easily econometrica economics effect efficiency efficient efficiently eide elements error essentially estimates estimation evolving exponential family figure find finding following forests form formulation framework friedman full function functions further generating geometric great gunther hastie have here high hilbert however identification implementation improve including insights instead interact interest interesting international interpretation investigation jasa jmlr journal kernel kernels kimeldorf koenker kunapuli lawrence leads learning lecture letters level linear lkopf locations lochovsky loss lowing machine machines main many maps mathematical meinshausen minimal mining model modeling motivation multiple natural neural nonparametric noted notes optima optimization other pang papers parameter parameterizes parameters path paths performance perlich piecewise possible practical preparation presented press problem problems proceedings properties quality quantile quantiles range really references region regression regularization regularized reproducing require results rosset saharon sample schinzel scholkopf school sears selection sequences sharir shortcuts should showalter size smola society solely solid solution solutions solved solving some spaces spirit spline springer statistical statistics structure student such support surface takeuchi techniques that their them there this through tibshirani touched tutorial twelfth twodimensional university uses using validated validation various vector wage wahba wallet wang which whole with work yeung york zadrozny http://icml2008.cs.helsinki.fi/papers/655.pdf 137 Strategy Evaluation in Extensive Games with Importance Sampling approximating billings burch davidson game holte references schaeffer schauenberg szafron http://icml2008.cs.helsinki.fi/papers/277.pdf 5 Nonextensive Entropic Kernels aguiar aistats analysis application applications banerjee based berg berkeley boltzmanngibbs bousquet bregman burbea calif categorization chaos charv chen christensen classification clustering comparison complexity concept conf convexity cover cuturi dagm definite denoising detection dhillon diffusion discrimination distributions divere divergence divergences document dortmund dover edge eisner electronic elements endres entropic entropies entropy expositiones features figueiredo foundations fuglede fukumizu functions furuichi gell gences generalization ghosh hamza harmonic havrda hein hilbert hilbertian howard icassp ieee image imaging industrial inequalities informatics information interdisciplinary intern iterative jebara jensen jmlr joachims journal karakos kernel kernels khinchin khudanpur kondor kullbackleibler kybernetika lafferty learning lebanon lkopf machines manifold manifolds mann many martins math mathematicae mathematical measure measures mechanics merugu method metric metrics moreno multimedia multinomial nips nonextensive nonlinearity oxford physics positive possible press priebe prob probability proc processes product properties quantification references related relevant ressel schindelin segmentation semigroup semigroups shannon sigir smith smola some space spirals springer statist statistical statistics stats structural support svms symp symposium text their theoretic theoretical theory thomas tops trans tsallis univ universit university unsupervised using vasconcelos vector vert wiley with xing zhang http://icml2008.cs.helsinki.fi/papers/343.pdf 59 Unsupervised Rank Aggregation with Distance-Based Mo dels about algorithm analysis analyzing applied buhmann busse castro cluster comparing computer conference coste critchlow data dempster diaconis disarray discrete estivill fagin footrule from graham heterogeneous icml incomplete international invariant journal know kumar laird learning lecture likelihood lists machine mannila mathematics maximum measure measures methods metric metrics metropolis notes orbanz partial presortedness proc rank ranked references right royal rubin saloff sciences siam sivakumar society spearman springer statistical statistics system verlag what wood http://icml2008.cs.helsinki.fi/papers/337.pdf 78 Estimating Lab els from Lab el Prop ortions ability about acknowledgments aggregate altun australia australian backing bartlett bounds burton centre chen choon colt complexities convex distribution divergence duality estimation european excellence freitas from funded funding gartner gaussian generalized government group hendrik icde individuals inference jmlr kuck large learning lncs maximum mendelson minimization multiclass musicant network nicta nips programs rademacher ramakrishnan received references regularization results risk scale schapire smola springer statistical statistics structural thank transduction tropy unifying union views vish wanathan with http://icml2008.cs.helsinki.fi/papers/130.pdf 153 Adaptive p-Posterior Mixture-Mo del Kernels for Multiple Instance Learning alignment andrews bags bunescu cristianini dietterich elisseeff hofmann icml instance kandola kernel lathrop learning lozano machines mooney multiple multipleinstance nips perez positive problem proceedings references shawe solving sparse support target taylor tsochantaridis vector with http://icml2008.cs.helsinki.fi/papers/113.pdf 91 The Group-Lasso for Generalized Linear Models: Uniqueness of Solutions and Efficient Algorithms able acceptor achieve active additional additionally algorithm algorithmic algorithms also ambiguous analysis angle appear application applications applied approximate authors available avoid avoids between bioinformatics biology biometrika bjorkegren blockwise both bounds brenner broad brown buhlmann burge cdna certain chandonia chapman characterization class classification comp complete completeness complexity computational concepts conclude configuration confirmed consistent context contingency costs could counting crooks dahinden data decision demonstrated details detects different dimensions donor dual easily efficient efron elsewhere emerick entropy errors estimates estimation example examples existence exon experiment exponential extensions factor families feature features figure focused from full function fundamentals geer gene generalized generator genome glms graphical gray group grouped groups hall hastie have hayworth helped helps here highorder identified identifies ieee implementation important include included inclusion incomplete increases indeed independence information institute interactions interpretations intron issue jaakkola jmlr johnstone kappa keerthi lasso leads least length libraries light likelihood linear logistic logo mathematical maximum mccullaghand meier methods might millions misinterpretation model modeling models motifs much must nelder nilsson noise notable nucleotide observations obtain orders osborne over parmigiani particular partition path pattern pena penalized performance perspective poisson polynomial positions potentially prediction predictive presented presnell previously problems processes proposed real reasonable recent recognition references regarding regression represent request research results richer round selection sequence severe shevade short show showed shrinkage signals simple simpler sinica site sites situations solution solutions space sparse spirit splice splicing stat statistica statistical statistics successfully such synthetic tables tegner test than that theoretical theory these thought thus tibshirani time trans turlach unavoidable uniqueness upper using valid values variables wainwright weblogo wedderburn where while will willsky with within without work world yuan http://icml2008.cs.helsinki.fi/papers/582.pdf 146 Metric Embedding for Kernel Classification Rules acknowledge acknowledgments admits advances algn analysis annals annual appendix application applications applying arbitrary arguments artificial asymptotic banff based batch blitzer bousquet cambridge canada classes classification clustering collapsing comments component components computational conference consider convergence decompose devroye distance each equivalence estimate evgeniou expansions fixed following follows fore form function functional functions generalized globerson goldberger greatly gyorfi herbrich hinton horst improved inference information intelligence international izer jordan journal kernel kernels krzyzak large learning lemma lipschitz lugosi luxburg machine machines margin mathematics metric metrics minimized minimizer nearest need neighbor neighbourhood networks neural online optimization orem overview paper parameterized parameters pattern penal penalty planning poggio pontil press probabilistic proc processing programming proof prove proves pseudo ready recognition reduces references regression regularization representation representer reproducing research result reviewers risk rkhs roweis russell salakhutdinov satisfies saul scholkopf shalev shwartz sideinformation simple since singer smola snapp span springer statistical statistics support systems tesauro thank that their then theorem theory therefore thereh thoai thus torresani vector venkatesh verlag weinberger where which wish with xing york http://icml2008.cs.helsinki.fi/papers/413.pdf 14 Modeling Interleaved Hidden Pro cesses altman american annual association bakir batu chains conference data extension guha hidden hofmann inferring information journal kannan learning lkopf longitudinal markov mixed mixtures model models neural predicting press proceedings processing references setting smola statistical structured taskar theory vishwanathan http://icml2008.cs.helsinki.fi/papers/565.pdf 19 Fast Incremental Proximity Search in Large Graphs aldous algorithms amount analysis approach approximate arbitrary authority aware balmin based before bound bounds brand chains chakrabarti cikm clever closest commute computation compute conclusion consider context csalogany databases define derived direction drastically dynamic effective entity equal equals error estimate estimated exact exchanged expansion expected experiments faloutsos fast fill finding fogaras from fully gives graph graphs have haveliwala hence hitting hoeffding hristidis idea index jectrank katz keyword kleinberg koren large length less liben link lower markov maximizing measure mining moore neighborhood neighbors networks nodes note nowell offline pagerank papakonstantinou path personalized perspective prediction press probability problem proc proceedings profit proximity psychometrika racz random ranking reduce references relation report resistances reversible sampled samples sarkar sarlos satisfaction scaling schemes search sensitive show siam similarity simrank small smaller social sociometric sparsification spielman srivastava stanford starting status stoc storage structural technical than that theorem this time tong topic towards tractable truncated union university upper using value values variable variables vectors vldb walks want which whose widom will wlog would york http://icml2008.cs.helsinki.fi/papers/318.pdf 84 A Repro ducing Kernel Hilb ert Space Framework for Pairwise Time Series Distances altun artificial banerjee bregman burges clustering conditional data dhillon discovery divergences exponential families fields ghosh hofmann intel jmlr know ledge ligence machines merugu mining pattern random recognition references smola support tutorial uncertainty vector with http://icml2008.cs.helsinki.fi/papers/197.pdf 115 Efficiently Learning Linear-Linear Exp onential Family Predictive Representations of State actor algebra applications approximate artificial bayesian belief brand conference critic dagum decomposition dynamical ecml european exponential family fast gaussian hard information intel learning ligence linear littman luby machine michigan models modifications natural networks neural nips peters predictive probabilistic processing rank reasoning references representations rudary schaal singh singular state stochastic sutton systems thesis thin uncertainty university value vijayakumar wingate http://icml2008.cs.helsinki.fi/papers/667.pdf 127 Nonnegative Matrix Factorization via Rank-One Downdate alberta algebra algorithm american analysis applications arxiv asgarian available barkai based bergmann biclusters biggs boutsidis canada classify cohen computing data decompositions deerwester department downdate dumais edmonton expression factorization factorizations furnas gallopoulos gene ghodsi greiner harshman head http ihmels indexing information initialization iterative journal landauer largescale latent linear matrices matrix microarray nonnegative online physical press rank ranks references review rothblum science semantic signature society start university using vavasis http://icml2008.cs.helsinki.fi/papers/209.pdf 110 Multi-Task Compressive Sensing with Dirichlet Process Priors algorithms approximate baraniuk baron bayesian beal blei candes charilaos college compressed compressive computational dirichlet dissertation distributed doctoral donoho duarte exact frequency from gatsby highly icip icml ieee incomplete inference inform information jordan jpeg london methods neuroscience principles proc process reconstruction references robust romberg sarvotham sensing signal theory trans tutorial uncertainty unit univ variational wakin http://icml2008.cs.helsinki.fi/papers/479.pdf 71 Transfer of Samples in Batch Reinforcement Learning aamas action adaptive advances advantages agent agents algorithm applied approximation aspects avoiding barto based batch bonarini both building carlo carroll case change common complete computer conclusions continuous could deal decision defined dietterich difference different effectiveness either ernst exactly experimental fern ferns figure finally finite first five fixed from function functions furthermore future geurts goal goals graph group hand have hierarchical http icml iden ijcai ijcnn improved improving independent inductive information instance instances integrated inter interest introduced iteration jong journal kaelbling know konidaris lazaric learning ledge libraries local machine main mapping mappings markov martin marx matching mcgill measures mechanism mehta method methods metrics mode model models monte multi natara negative neural nips number only options order other panangaden paper parameter part partitioning performance phillips planning policies portable possible precup problem problems proceddings proceedings processes processing proposed random range references reinforcement relevance report research restelli results reward rosenstein same sample samples scenario scheduling school science seppi sequential share show significantly similarity skill solution some source space spaces special state stone subgoals sunmola systems tadepalli target task tasks taylor technical temporal that this through thus tifying total transfer transferring transition treebased useful using usrs value variable ward wehenkel when which wide with wolfe works workshop wyatt http://icml2008.cs.helsinki.fi/papers/432.pdf 79 Self-taught Clustering algorithm bach banerjee basu bayesian bilenko caruana clustering conference conic constraints cover data daum davidson dhillon dirichlet discovery duality elements finley first fourth framework information international interscience intractability joachims jordan journal kernel know lanckriet learning ledge machine machines mallela marcu mining model modha mooney multiple multitask nineteenth ninth prior probabilistic proceedings process ravi references research second seeding semi sigkdd supervised support tenth theoretic theory thomas twenty vector wiley with http://icml2008.cs.helsinki.fi/papers/573.pdf 103 On the Quantitative Analysis of Deep Belief Networks adapting advances aistats algorithm algorithms annealed appear approximations artificial bayesian belief bengio binary boltzmann bounds cambridge carlo carreira chain class collaborative computation computer computing conference constants constructing contrastive data deep department dimensionality directed divergence energy engineering entropy estimating experts fast fields filtering fourth free freeman function gehler generalized hierarchy hinton holub human icml ieee image importance inference information intelligence international jaakkola kernel large latent learning lecun leroux linked machine machines markov maximum methods minimizing mnih model modeling monte motion neal nested nets networks neural nips normalizing object osindero partition patches perpinan poisson power press probabilistic proceedings proceeedings processing products propagation random rate ratios recognition reducing references report representational restricted retrieval roweis salakhutdinov sampling scale scaling science skilling statistics systems taylor technical theory toronto towards training transactions twenty university upper using variables wainwright weiss welling willsky with workshop yedidia http://icml2008.cs.helsinki.fi/papers/270.pdf 143 A Least Squares Formulation for Canonical Correlation Analysis analysis bach belkin bishop component examples framework from geometric independent jordan kernel labeled learn learning mach machine manifold niyogi pattern recognition references regularization sindhwani unlabeled http://icml2008.cs.helsinki.fi/papers/125.pdf 21 A Worst-Case Comparison b etween Temp oral Difference and Residual Gradient with Linear Function Approximation advances algorithm algorithms analysis approximate approximating approximation athena automatic baird barto bertsekas bianchi bounds boyan cambridge case cesa choose computation conference control converges decision define definite descent difference differences discrete dynamic error exponentiated fact faster follows fourteenth from function functions generalization gradient have horn icml ieee immediately information international interscience introduction iteration johnson journal kivinen known lagoudakis last learning least lemma linear long loss machine markov matrix merke methods moore munos negative networks neural neuro nips obtain parr policy precup predict prediction predictors press proceedings processes processing programming proof provably puterman quadratic references reinforcement research residual safely schapire schoknecht scientific similarly squares step stochastic sutton systems temporal than that then transactions tsitsiklis twelfth twentieth university using value versus warmuth where wiley with worst worstcase york http://icml2008.cs.helsinki.fi/papers/278.pdf 83 A Distance Mo del for Rhythms advances algorithm annealing application assayag audio auditory beat beatroot bejerano bengio bilmes blues cambridge carlo cemgil classification computer data dempster dependencies descent difficult dixon dubnov duda edition estimation evaluation events finding frasconi from gaussian gelatt gentle gradient handel hart hidden ieee improvisation incomplete information interscience introduction journal kappen kirkpatrick laird lartillot learning likelihood listening long lstm machine markov mass maximum methods mixture modeling models monte music musical networks neural number optimization parameter pattern perception press proc processing quantization recurrent references research rhythm royal rubin schmidhuber science second sequential signal simard simulated society statistical stork structure style system systems tempo temporal term tracking transactions tutorial using vecchi wiley with workshop york http://icml2008.cs.helsinki.fi/papers/641.pdf 52 An RKHS for Multi-View Learning and Manifold Co-Regularization about academic advances agnan aistats algorithmic algorithms also ambient analysis approach assess bartlett basic bayes bayesian begin belkin benefits bertinet beyond bient bijection blum bold both boucheron bousquet brefeld brings cauchy check choice chose chosen claimed class classes classification clear closed cloud collect colt column combining comparable complement complete completeness complexities complexity component comr conclude conclusion conditions consider constructed construction context convenience convergence coregularization data datasets define defined definition definitions demonstrated derived details deviation distance ducing duct efficient element empirical empirically entries equation equations error erty esaim evaluated exactly examples except experimental farquhar finally find first follow following follows form formula framework from functions generalization generalizes geometric gives goes gram graph grid hardoon hartemink have here hilb hilbert icml immediately implies improvements induces inner integer into intrinsic introducing inversion isomorphic jmlr kernel kernels kluwer krishnapuram labeled laplacian last learners learning least lemma level line localized loss lowing lugosi makes manifold matrix mean mendelson meng methods minimum mitchell more moreover morrisonwoodbury multi multiple natural nearest need neighbors next nips niyogi norm normalized note oneversus orthogonal other over paired paper parameters part performed performs point pointwise practice probability problems product proof prop properties property protocol publishers pythagorean rademacher random rate rates recall recent references referred regression regularised regularization regularizationv regularized report reported repro reproducibility reproducing respect respectively rest restriction results rkhs rkhss rosales rosenberg rtner scheffer semi semisupervised sequence sherman show shown significance significant similarly simplifications since sindhwani single sketch solution some space spaces span splits springer squared squares standard statement statistical statistically statistics steck straightforward strategy subspace supervised survey svms szedmak table tables tabulated taylor test that then theorem theoretical theory these this thomas thus training transductive turned unlabeled unseen used using valid validation validity values varied vector verlag view views well were where which whose williams with workshop writing wrobel zero http://icml2008.cs.helsinki.fi/papers/592.pdf 147 Extracting and Comp osing Robust Features with Denoising Auto enco ders advances aharon algorithms analysis architectures associatives balcan bengio bishop bottou cambridge channels chapelle coding cognitiva computation decoste deep denoising dept dictionaries distribuees elad equivalent fogelman gallinari greedy ieee image information kernel lamblin large larochelle layer learned learning lecun lewicki lkopf machines memoires montr networks neural nips noise noisy over overcomplete paris platt popovici population press proceedings processing redundant references regularization report representations retinal robust scale scaling soulie sparse systems technical theoretical theory thiria tikhonov towards training transactions universit villette weiss weston wise with http://icml2008.cs.helsinki.fi/papers/614.pdf 130 Pointwise Exact Bo otstrap Distributions of Cost Curves able accuracy also alternative american analysis another appear approximate approximated association avoided bandos bandwidth been better between beyond biometrika bonn bootstrap bounds breaks cases chapman characteristic classifier classifiers close compare comparing computations conclusion conditions conference confidence considerations correlated cost costs coverage curve curves data decisions derived diagnostic difference discovery dissertation distribution distributions doctoral drummond efron empirical errors estimating estimators evaluation exact excellent expected explicitly extrapolate fawcett first from gaussian germany graduate graphs hall have health holte hyndman icml improved interest international intervals introduction journal kernel know laboratories learning ledge letters limited lloyds machine macskassy method methods mining misclassification moments monographs nonparametric normalized notes obtain obtained operating paper performance pittsburgh pointwise possibility practical probability proceedings provost public range receiver references report representation representing researchers results rosset sampled school score scores second selection severe sigkdd simulations sixth smoothed statistical statistics summarise summarizes systems table technical than these this thresholds thus tibshirani time true university used values variables visualizing were when within wong workshop http://icml2008.cs.helsinki.fi/papers/452.pdf 43 Learning for Control from Multiple Demonstrations abbeel aerobatic application apprenticeship artificial atkeson based coates control demonstration dynamics flight from ganapathi helicopter helicopters hollerbach icml inaccurate intel inverse learning ligence locally manipulator model modeling models moore nips press proc quigley references reinforcement review robot schaal using vehicular weighted with http://icml2008.cs.helsinki.fi/papers/392.pdf 60 Learning Dissimilarities by Ranking: From SDP to QP adaptive advances agarwal aistats also analysis application applications available bartlett borg boundaries boyd chang classification classifiers clickthrough clustering cohen comparisons components cones construct cristianini csie data define definite defk denote dimensionality discriminant distance easy eigenvalues endix engines framework from function functions gaussian generalized geometric ghaoui global goldberger graepel groenen hastie herbrich hinton http hyperkernel hyperkernels icml idealized ieee information inite jebara joachims jordan journal kernel kernels kimeldorf kondor kronecker kwok lanckriet langford large learning library libsv libsvm machine machines margin mathematical matlab matrices matrix methods metric micchelli modern multidimensional nearest need neighbor neighbourhood nips nonlinear obermayer ones optimization optimizing order ordinal osition ositive over pami pontil positive proc product programming prop prove rank reduction references regression regularization relative research results review roweis russell salakhutdinov scaling schapire schultz science search sedumi semidefb semidefinite semidenite siam side sigkdd silva since singer smola software some spline springer strum support symmetric tchebycheffian tenenbaum that their theory things thus tibshirani toolbox trans tsang using valid vandenberghe vector verify wahba where williamson wishart with xing york http://icml2008.cs.helsinki.fi/papers/630.pdf 88 Detecting Statistical Interactions with Additive Groves of Trees additive algorithms although annals anova answers approach approximation attribute autoregressions barry bird black boosting both bratko camacho cambridge camp carroll caruana challenges christensen citizen compare complex contain control data datasets delve demonstrated dependent detecting diagnostics dimensional discovering discussion ecml elhawary elisseeff ensembles feature fink framework friedman function functional functions generalized ghahramani gradient greedy groves guyon high hinton hochachka hooker html http human icml idea identify indicate inducing interaction interactions introduction intuitive jakulin jcgs jmlr kelling kustra layered learning letters liacc linear ltorgo machine main mining models munson neal novel pace performance plane popescu practical predict predictive presented prevalence probability proc questions quite rasmussen real references regression reliably report restricted results revow riedewald rule rulefit ruppert science selection semiparametric sets sigkdd significance significant skills sorokina sparse spatial species springer stanford statistical statistics structure synthetic technical technique tested testing that theory there this tibshirani torgo toronto trees university unrestricted used variable variables wand which wild will with work wwwstat http://icml2008.cs.helsinki.fi/papers/390.pdf 90 Bolasso: Model Consistent Lasso Estimation through the Bootstrap acknowledgements adaptive advanced agence angle appear arcing asuncion asymptotics bach bagging because bentkus berkeley berry boosting bootstrap boucheron bound bousquet breiman buhlmann chapman classifier closed compact concentration consider consistency constants constrained convergence cost covariance dantzig data dependence depends desired dibert dimension dimensional discussions distributed efron electronic enough esseen estimator estimators exist first fixed form formly french from fruitful garrotte given grant group hall harchaoui hastie have heuristics high hoeffding inequalities inequality inference instability introduction jean johnstone journal kernel knight large lasso last learn learning least lectures like linear lounici lugosi machine matrix mean meinshausen minimizing model models multiple nationale negative newman noisy norm normalln normally obtain ones oracle planning predictors programming project properties property quadratic rate recherche recovery references regression related remaining replace report repository representations respect selection sharp shrinkage sign sparse sparsity springer stabilization stat statistical statistics such supported tech term terms thank that then there this thresholds thus tibshirani tion trunca type uniog upper used using wainwright where with work would yuan yves zhao http://icml2008.cs.helsinki.fi/papers/340.pdf 31 Deep Learning via Semi-Sup ervised Emb edding acknowledgements adaptive advances ahmed ando artificial belkin bengio bentz bottou bromley cambridge caruana chapelle cluster computation cordero data deep delay dimensionality eccv eigenmaps examples feed figures forward framework from funded geometric gong grant greedy guyon hierarchical information intel international journal kernels labeled lamblin laplacian larochelle layer learning lecun ligence lkopf machine manifold mass models moore multiple multitask network networks neural nips niyogi pattern popovici predictive press processing producing pseudo ratle recognition reduction references regularization representation research sackinger sandra semi semio shah siamese signature sindhwani structures submitted supervised systems tasks thanks time training transfer unlabeled using verification visual weston wise zhang zien http://icml2008.cs.helsinki.fi/papers/491.pdf 140 Fast Supp ort Vector Machine Training and Classification on Graphics Pro cessors advances algorithm anal analysis applied approximate asanovi association asuncion baluja based bengio berkeley bhattacharyya bhaya bodik bottou bradski california cambridge cascade catanzaro chang chapelle chen classifier clusters collobert comput computation computing conference cortes cosatto cuda data database dean decoste department design detection discovery document dourdanovic eecs face factorization fast ferreira freund from gebis ghemawat girosi gradient graf haffner handwritten hoffman http hughes hull husbands ieee implementation improved improvements incremental information intel international joachims joint kanade kaskurewicz keerthi kernel keutzer know landscape large learn learning lecun ledge ligence mach machine machines making mapreduce matrix methods minimal mining mixture multicore multiprocessor murthy network networks neural newman nvidia olukotun operating optimization order osdi osuna paral parallel pattern patterson periyathamby platt plishker practical press problems proceedings processing recognition reduce references report repository research rowley saul scale scholkopf second selection sequential serafini shalf shevade sigkdd signal simplified software speeding support svms symposium systems technical text training trans transactions university usenix using vapnik vector very view weiss weston williams working workshop yelick york zanghirati zanni zhang http://icml2008.cs.helsinki.fi/papers/259.pdf 44 Non-Parametric Policy Gradients: A Unified Treatment of Prop ositional and Relational Domains aaai aberdeen actor actorcritic advanced advances algorithm algorithms annals approach approaching approximate approximation arificial artificial ashenfelter athena australia bagnell banff bartlett barto based batch baux baxter bellman belmont benefits berlin bertsekas blockeel blocks bonn boosting boutilier breiman bulatov cambridge canada carnegie chapman classification combines combining comparisions conclusions conditional conference connectionist continuous control coordinated critic data deal decision decisions delayed deroski developing diagrams dietterich directions discretization dissertation doctoral domains down driessens dzeroski edinburgh efficient ernst established estimates estimating estimation european experiences experimental experiments exploiting expressive fast fields finite first fitted following framework freiburg friedman from function future gaussian generalization germany getoor geurts goes gradient gradients graph greedy guestrin guidance gutmann have hidden hilbert horizon hungary hyderabad icml implement improvement incremental india induction inductive infinite information input instance institute integrating intel interesting international into introduced introduction ited iteration jair jmlr joint joshi journal kaelbling kernel kernels kersting khardon konda lagoudakis lational learner learning learns leverage lifence ligence linear logic logical machine mansour mcallester mccallum mdps mellon method methods mode model modelfree models more moreover munos national neural neurodynamic nppg olshen optim order ositional otterlo over parallel parametric parr perception performance perspective pittsburg planning policy portland porto portugal power press problem proceedings processes processing programming prop raedt ramon random reduce references regression reinforcement relational report reproducing research results revise riedmiller robotics robustness rochester rtner sanner schneider scientific scotland search seeks selection selective sequences several show siam significant simple singh slaney solve space speeding state statistical statistics stone straightforward such suggests sutton sydney systems szeged taskar technical through tildecrf time training treatment tree treebased trees tsitsiklis twentyfirst uncertainty unified university using uther value variance veloso versions wadsworth wang washington weaver wehenkel williams with world york http://icml2008.cs.helsinki.fi/papers/323.pdf 92 On the Chance Accuracies of Large Collections of Classifiers academy alon america applications approach arrays attribute benjamini berger broad casella clustering colon conference controlling david decision discovery duxbury expression false feller fifteenth frank gene grove hoboken hochberg icml inference international introduction journal kaufmann learning machine methodological morgan multiple nagara national normal oligonucleotide order pacific patterns permutation powerful practical probability probed proceedings publishers rate references revealed royal sciences selection series society states statistical statistics test testing theory tissues trees tumor united using wiley witten york http://icml2008.cs.helsinki.fi/papers/254.pdf 56 Stability of Transductive Regression Algorithms adaptive aistats algorithm algorithms also analysis based belkin berlin bound bounded bounds bousquet cambridge chapelle chicago classio colt combinatorics comes complexity comprehensive conclude conclusion consistency cortes data dependences dependent differences dimension effectiveness elisseeff empirical estimating estimation experiment experiments fication fields from functions gaussian generalization ghahramani global good graphs guarantee harmonic have however icml inference interscience jmlr lafferty large learning lkopf local machine manifold matveeva mcdiarmid measures method methods metric model mohri nips niyogi novel number often pechyony performance presented press previously references regression regularization report risk same schuurmans selection semi semisupervised show showed since sindhwani some southey springer stability stable statistical such supervised surveys technical than that theory they this those thus tighter transductive trivial university using values vapnik weston wiley with yaniv york zhou http://icml2008.cs.helsinki.fi/papers/551.pdf 123 ICA and ISA Using Schweizer-Wolff Measure of Dep endence acad algorithm amari analysis annals bach belgique beyond blind block cardoso cichocki classe clusters comon complete component components computation concept contrasts copulas czos deconvolution deheuvels dependence dependent diagonalization dimene edition entropy esann estimates fast fastisa fisher fixed fonction fonctions general geodesic gorithms groups high icassp icml ieee indea independent institut into introduction iscas jmlr john joint jordan karhunen kernel learned learning letin leures lrincz marges measures miller multidimensional nelsen networks neural nips nonparametric order param paris partition pendance pendent pirique point proc propri publications random references rinen robust royale schweizer sciences seattle separation series signal signals sions sklar source space spacings spanning springer statistics statistique ster subo subspace szab test theis towards trans trees trique undero universit using variables wiley wolff yang york http://icml2008.cs.helsinki.fi/papers/172.pdf 80 Spectral Clustering with Inconsistent Advice advances advice agarwal algebra algorithms analysis annual application approximation artificial asuncion bansal basis becomes blum california cardie charikar chawla clear clustering coleman columns combinatorial computer computing conference connections constraints correlation corresponding cristianini cuts deletion denote dimension directed distance division eigenspace eigenvalue eigenvector elementary explain fast follows from general generated given goemans graph iapr ieee ijcai image implement implementing inconsistent information inner instance intel international involved joint jordan journal kamvar klein label learning level ligence linear local machine makarychev malik manning mathematical matrix method metric moor neural newman normalized nullspace number optimization original orthonormal other pattern problem problems proceedings processing product programming range recognition references relaxation relaxations replace repository research respect respectively russell satisfactory satisfying saunderson science search section segmentation semidefinite seventeenth seventh shows side smallest solution space spectral statistical structural submitted subspace suppose suykens symposium syntactic systems taking that then theory thirty this three thus transactions transduction trick uncut university using version vertices wagstaff want what where which whose wirth with workshops written xing http://icml2008.cs.helsinki.fi/papers/150.pdf 132 Cost-sensitive Multi-class Classification from Probability Estimates academic algorithms anal analysis annals ayer bourke brunk classifiers conference cost data deng detection diego discovery discriminant distribution domingos egan empirical ewing friedman function general icml incomplete information international know learning ledge machine making mathematical metacost method mining multiclass optimizing pittsburgh press proc references regularized reid rocanalysis sampling scott sensitive signal silverman statistics surfaces theory vinodchandran with workshop york http://icml2008.cs.helsinki.fi/papers/681.pdf 18 Message-passing for Graph-structured Linear Programs: Proximal Pro jections, Convergence and Rounding Schemes agarwal agreement algorithm algorithms anal analysis angles annual appear applications approaches approximate approximation arbitrary artificial athena based belief belmont berkeley bertsekas bertsimas between boston boykov brazil canada censor chekuri codes communication computation computing cone conf control convergence convergent convex cuts cvpr cyclic decoding decomposition department deutsch different disagreement discrete distributed dual edge edges edgewise energies energy estimation exponential families fast feldman field fields figure fixing formulation fraction framework free freeman functional globerson graph graphical graphs grids hundal hyper iccv icml ieee inconsistent inexact inference info information intel international introduction iteration iterations iterative iusem jaakkola janeiro jections jordan journal karger khanna kolmogorov komodakis kumar labeling lafferty lerton ligence like linear lprelaxations mach markov mathematics meltzer message messagepassing methods metric minimization models naor neural node nonquadratic note number numerical operations optimality optimization optimum order oxford paragios paral passing pattern plots point press problem proc processing product programming programs projections propagation properties proximal quadratic random rate ratio ravikumar recall recover references relation relaxations report research revisited reweighted rounded rounding schemes scientific second sets siam sizes small solodov solution solutions solving some statistics structured suboptimality svaiter systems teboulle technical that theory torr trans transactions tree trees tsitsikilis tsitsiklis turbo tziritas unified university using vancouver variational veksler wainwright weiss when which willsky with workshop yanover zabih zenios zero zisserman zosin http://icml2008.cs.helsinki.fi/papers/643.pdf 7 A Generalization of Haussler's Convolution Kernel -- Mapping Kernel abstract accurate advances agreement akutsu algorithmica aoki applied barnard based berry between bunke california cambridge canada carbohydrate clarke collins common comparing compatible complexity computer conference convolution correction cristianini cruz data database dept discrete distance document duffy duncan edit efficient evolutionary extended genome graph haussler hein icml informatics information international introduction jansson jiang kanehisa kashima kernel kernels kingston koyanagi language learning letters machine machines mamitsuka matching mathematics maximum methods natural neural nicolas nips okuno ontario other pattern press processing queen query recognition references relation report rooted sadakane santa science semistructured shawe structuresucsc subgraph sung supertrees support synthetic systems taylor technical tree trees treeto ueda university vector wang yamaguchi zhang http://icml2008.cs.helsinki.fi/papers/415.pdf 101 Discriminative Parameter Learning for Bayesian Networks algorithm bayesian boosting classifiers conference experiments freund friedman geiger goldszmidt international learning machine network proceedings references schapire thirteenth with http://icml2008.cs.helsinki.fi/papers/673.pdf 152 Structure Compilation: Trading Structure for Features above absolute adapting advances allows another applied apply applying argue associated based because being between bijections both bound bounded bounds bucila caruana cauchyschwartz check class close collins communications compilation complexity comprehensible compression conditional conference consider construct convergence correspond corresponding cover covering crammar craven csiszar data define definition directions discovery discriminative dissertation distribution doctoral each eigenvalues elements estimation exists exponent exponential expression extracting fact family features fields finding first foundations free from function functions grammars holds icml identifiable inequality information inspired international into kearns klein know labeling lafferty latent learning ledge lemma likelihood linear loss losses lower machine madison margin maximum mccallum members method methods mining mizil model models multiple natural networks neural niculescu nips nonzero note noting number numbers originally over parameter parameters parsing particular partition pereira petrov points pollard practice probabilistic probabilities probability problem processes processing proof prove random reducing references relate respect result rewritten same segmenting separately shields show sides similarly since size smallest solution solving some sources springer statistical statistics stochastic structure substitute subtracting such suffices superset supremum systems technologies term terms that theorem theory there therefore this thus trading trained trends tutorial twice university upper used using value variables verlag where which will wisconsin with works workshop wortman yields zero zhang http://icml2008.cs.helsinki.fi/papers/311.pdf 151 Fully Distributed EM for Very Large Datasets acknowledgments algorithm algorithms allocation allowing alternative applications asuncion authors benchmark blei both bradski brown categorization clustering clusters collection computational computations conclusion consider contract darpa data dean demonstrated dempster density design dirichlet distributed empirically estimation faculty fellowship from function future general ghemawat graph guestrin have heuristics ieee implementation incomplete inference intel ipto jmlr jordan journal junction kaufman laird large latent learning lewis ligent likelihood linguistics machine mapreduce mathematics maximum mcfadden memory mercer methods microsoft more morgan multicore networks newman nips nowak olukotum operating other parameter paskin pearl pietra probabilistic processing reasoning reduce references research respectively robust rose royal rubin scaling sensor signal significant simplified single sixth smyth society speedup statistical successfully supported symposium system systems text that theoretically this topologies transactions translation tree welling were will work would yang http://icml2008.cs.helsinki.fi/papers/305.pdf 13 An HDP-HMM for Systems with State Persistence amer analysis ancestral appear applications approximate assoc bayes bayesian beal benefits blei century computing considerable constructive database definition demonstrated diarization dirichlet discussion duke dunson eaker eech ered ersistence exact extended gelfand genetic ghahramani have hidden hierarchical http huijbregts iccv icsi ieee inference infinite isds ishwaran jordan kivinen learning lids lncs markov methods model modeling models multiscale natural nested nips nist parameter priors proc process processes rabiner rasmussen recognition recursive references representations rich rodriguez scenes scott selected separate sethuraman sinica sohn space speech stat state sudderth system systems temp tests transcriptions tutorial using which willsky with wooters xing zarep http://icml2008.cs.helsinki.fi/papers/163.pdf 121 Uncorrelated Multilinear Principal Comp onent Analysis through Successive Variance Maximization analysis applications approximation best carmel carroll chang component comput conditions conf data decomposition differences dimensionality eckart edition evolving explanatory factor faloutsos foundations generalization graphics harshman higher ieee individual jolliffe journal kolda koren large lathauwer linear matrix mining modal models moor multi multidimensional order papers parafac phonetics principal procedure psychometrika rank reduction references robust scaling second serires siam springer statistics tensor tensors time tools trans tutorial ucla using vandewalle working young http://icml2008.cs.helsinki.fi/papers/571.pdf 0 An Object-Oriented Representation for Efficient Reinforcement Learning aaai artificial assumptions aware barto boutilier computational conference ctit dean decision decomposition dietterich discrete diuk domains dynamic efficient environments factored fifth framework function gearhart generalizing guestrin hanks hierarchical ijcai intelligence international introduction issn john journal kanodia knows koller learning leverage littman machine markov maxq mdps otterlo planning plans press processes programming puterman references reinforcement relational report research self series sons state stochastic strehl structural structure survey sutton technical theoretic twenty value walsh what wiley with york http://icml2008.cs.helsinki.fi/papers/470.pdf 107 Predicting Diverse Subsets Using Structural SVMs advances appear based broder budgeted burges carbonell categorization chapelle chen cikm classification conference cost coverage cutting desjardins development diversity document documents eaton exact fewer finley fontoura functions gabrilovich goldstein hierarchical hofmann icml inference information international intractable jects joachims joshi josifovski karger khuller know knowledge large learning ledge less letters machine machines management margin maximum measures models more moss naor neural nips optimization plane preferences probabilistic problem proceedings processing queries ragno rank ranking rare relevant reordering reproducing reranking research retrieval retrieving robust search sets sigir smola smooth structural summaries support svms systems training user using vector wagstaff when with workshop zhang http://icml2008.cs.helsinki.fi/papers/599.pdf 75 Sparse Multiscale Gaussian Pro cess Regression advances biological comp csat cybernetics estimation fast franz gaussian gehler herbrich implicit information informative institute lawrence line machine methods neural opper part planck process processes processing references regularised report seeger series sparse systems technical vector wiener http://icml2008.cs.helsinki.fi/papers/476.pdf 141 ¨ Improved Nystrom Low-Rank Approximation and Error Analysis academic accelerate achlioptas acknowledgments administrative advances algorithm algorithms analysis annual approximating approximation approximations artificial bach baker based been belkin belongie bengio block boston carlo chung clarendon classifiers clustering component compression computation computations computer computing conference council data decomposition density different digits distribution drinea drineas edition effect efficient eigenmaps eigenvalue elkan embedding equations error errors evaluation experimental fast fastmap finding fine foundations fowlkes frieze gaussian gersho golub gram grants gray greedy grouping herbrich hong hopkins huggins ieee implementation improved independent inequality informatics information informative input integral intelligence international johns jordan journal kannan kanungo kernel kluwer kong kwok landmark laplacian lawrence learning least letters loan machine machines mahoney malik matrix mcsherry means method methods metricmap monte montecarlo mount muller netanyahu neural niyogi nonlinear numerical nystrom ouimet ours oxford pairs panhellenic partially pattern piatko platt predictive press problem proceedings process processing quantization quantized rank references representations research scheinberg scholkopf science seeger signal silverman singular smola sparse special spectral speed squares statistics support supported suykens symposium systems table techniques testing theory this training transactions treatment triangular university using usps value vandewalle vector vempala williams workshop zhang http://icml2008.cs.helsinki.fi/papers/520.pdf 134 Multi-Task Learning for HIV Therapy Screening altmann antiretroviral antiviral barrier beerenwinkel combination doumer drug fessel genetic improved kaiser lengauer prediction references resistance response rhee savenkov shafer sing therapy using http://icml2008.cs.helsinki.fi/papers/182.pdf 12 Inverting the Viterbi Algorithm: An Abstract Framework for Structure Design acids acknowledgements advances algorithm algorithms alphabet also analysis andreas andronescu annual applications approach areas arxiv aspect asymptotically authors auxiliary backofen baym benefit bioinformatics biological biology biomolecular biophysics biotechnology blanchette both bound bounds branch breaker busch butterfoss cambridge candidates chemie codes comparison competition computational computer computerbased conclusions context convolution current decoding defined demonstrates design designable developing dowell durbin eddy efficient elizalde emission energy engineered error evaluation exploits explore extending extensions fast fields figure find folding form framework free from functions general getting given going grammars graphical hardness have here hmms hofacker ieee inference info information inspired introduced inverse journal krogh kuhlman large length lengths lightweight likely markov math mathematics mathieu michael mitchison model models molecular monatshefte more most negative novel nucleic number only opinion optimal optimum other park particeo particular particulars path paths plays pokala polynomial possible prediction press prints probabilistic probablistic problem protein proteins random references research result results review role running saven scfg scfgs schulz secondary sequence sequences several shown similar simulations size specific states stiegler stochastic structural structure structures such thank that theoretical theory there thermodynamics this time times transactions university unless useful using viterbi waldispuhl were where widespread woods work would yang zuker http://icml2008.cs.helsinki.fi/papers/564.pdf 114 Sample-Based Learning and Search with Permanent and Transient Memories advances analysis armed association auer bandit baxter bianchi buro cesa chess combining computer computers computing conference coulom dahl differences enzenberger evaluation experiments features finite first fischer from functions game games gelly honte international journal learn learning machine machines move multi nets network neural nova offline online parameter patterns play playing problem program ratings references science segmentation silver simple soft sophisticated temporal that time tridgell using weaver workshop http://icml2008.cs.helsinki.fi/papers/129.pdf 128 Dirichlet Comp onent Analysis: Feature Extraction for Comp ositional Data aitchison analysis biometrika component compositional data journal methodological principal references royal series society statistical http://icml2008.cs.helsinki.fi/papers/411.pdf 139 Optimized Cutting Plane Algorithm for Support Vector Machines advances algorithm alto argmink ascending attained axis between bottou bousquet called cambridge chang changes chapelle checking chen cjlin comp compact computation computing condition considerations constantly constrained contains converged convex corresponds cortes cristianini csie cutting data domain dominated efficiently either electronic equal essential estimated every exactly fast fawcett figure first follows found franc fraunhofer function gradient graph graphs greater hence http icml illustrated illustration increasing information institute intersection interval intervals introduce introduction invoked involves iteration jmlr joachims journal jump keerthi kernel laboratories large learning less library libsvm like line linear linesearch logistic machine machines making marized methods minimization minimized minimum mining modular monotonically more must networks neural newton nips notation note notes numbers objective ocas only optimized optimum order palo parallel pegasos plane point points practical press primal problmm procedure proceed publication references region regression regularized report research researchers respect risk scalable scale search second selection semi serafini show shwartz shwawe since sindhwani singer smola software solution solve solved solver solving sonnenburg sorting split srebro step strictly subdifferential subgradient sumi supervised support svms table takes taylor technical term than that then there these this thus time tradeoffs training trust unconstrained using value vapnik vector vishwanathan weng where which whose with working york zanghirati zanni zero http://icml2008.cs.helsinki.fi/papers/398.pdf 150 Modified MMI/MPE: A Direct Evaluation of the Margin in Speech Recognition altun chou conditional conf deng discriminative eisner equivalence expectation fields finite finland flexible fsmnlp gaussian heigold helsinki hidden hofmann icml ieee language learning like machine machines magazine markov methods natural optimization oriented pattern proc processing random recognition references review schluter semirings sequential signal speech state support transducers tsochantaridis unifying vector http://icml2008.cs.helsinki.fi/papers/312.pdf 145 Grassmann Discriminant Analysis: a Unifying View on Subspace-Based Learning absil acad academic achieves acta adaptive algorithm algorithmic algorithms also amer anal analysis analyzing angles appearance appl application approach arias baltimore bars based basis bayesian belhumeur best between beveridge beyond binet bousquet cambridge canonical canu categories categorization cauchy chang chellappa chikuse cipolla classes classification cmsm competitively comput computation computations computer cone conf consistently constraints contour correlations cvpr darrell database definite diego differential discriminant discriminative dissimilarity dist distance draper duin edelman eigenfaces ensemble euro evaluated face facial farag fast feature feng figure fisher from fukui fukunaga functions generalized geometry georghiades gesture golub grassmann haasdonk hein hilbert hopkins idiosyncratic ieee illumination image indefinite info intel interpretation introduction ipcv jebara ject jects john johns kernel kernels kero kirby kittler kley know kondor kriegman learn learning lecture ledge left leibe lighting linder lkopf loan london long lugosi mach machines maeda mahony manifolds many margin mary math matrix maximal measures method methods metric models mukawa multi mutual nearesto neighbor nels neural neurosc notes number observations optimization orthogonality over paclik pattern patterns pekalska pentland performs peterson pose positive press principal probabilistic proc professional rates recogn recognition references regularization represent reproducing riemannian right robot robotics robust sakano sample schiele schoenberg search sepulchre sequence sets shakhnarovich shashua siam similarity small smith smola society space spaces special springer statistical statistics subspace subspaces support svms symp syst tech temporal term trans turk under university using variable vector vectors view viewpoint vishwanathan vision wang washington where with wolf wong yale yamaguchi york zhou http://icml2008.cs.helsinki.fi/papers/455.pdf 62 Bayes Optimal Classification for Decision Trees advances agrawal angelop association averaging cussens data decision discovery exploiting fast hand icml infor kaufmann know learning ledge machine mannila mining morgan oliver oulos proceedings programs pruning quinlan references rules srikant toivonen trees verkamo http://icml2008.cs.helsinki.fi/papers/377.pdf 4 Tailoring Density Estimation via Reproducing Kernel Moment Matching aistats altun analysis antees applications applied approach approximate bandwidth barndorff bartlett based basford bayesian beating berkeley between borgwardt both bounds carlo chapman clustering colt complexities computer condensed consistency constraint context convex cortes data dekker density distance distributions divergence dolia doucet duality dumouchel dvir dynamic earth embedding entropy error estimation expectation exponential families farias files fixed flat flatter framework freitas from function gaussian girolami gordon graphical greenspan gretton guari guibas hall hilbert howard icpr ieee image images incur inference influence information intl jebara jmlr johnson jordan kernel kernels kondor level linear machines marcel math maximum mclachlan median mendelson method methods metric minimization minka mixture mode modeling models monographs monte mover mukherjee multivariate nielsen nips oper optimally outperforming over parzen performance phillips practice pregibon probability problem product programming propagation rademacher rasch rate references regularized results retrieval risk rubner sample samples sampling schapire scholkopf sequential shawe significance silverman smola song space springer squashing stat statistical statistics steinwart structural support taylor tech tested theory thesis times tomasi tpami unifying using value vapnik variational vector verlag vision volinsky wainwright where http://icml2008.cs.helsinki.fi/papers/412.pdf 133 Learning to Learn Implicit Queries from Gaze Patterns academic accurately advances aistat aistats ajanki algorithms approximations artificial beyond bibliography brazdil cambridge carrier chapter clickthrough cohen collaborative combining computer conference cvpr data development drawings drissi dumais easier effective eighth electronic elidan engines evaluating feedback filtering first forum from giraud granka hardoon heitz hembrooke html http ieee images implicit improve inductive inferring information intel interfaces international interpreting issue ject joachims karnawat kaski keenbow kelly kluwer koller learn learning lifelong ligence ligent machine measures meta model movements mydland neural nips norwell okapi order pattern performance perspective poisson pratt precision preference press proactive probabilistic proceedings processing publishers puolam queries radlinski recognition relevance research retrieval review robertson salo savia schapire scholer search second shape shawe sigir simola simple singer society some special start stat statistics survey systems tasks taylor teevan text than that thing things thrun transactions transfer trec turpin tutorial user versus view vilalta vision walker washington weighted white york zaragoza http://icml2008.cs.helsinki.fi/papers/519.pdf 72 Exploration Scavenging ability about action adaptive adserving advances adversarial advertisement algorithm allocation alon analysis annual appear applied approach armed assign associative asymptotically auctions auer automatic bandit bandits based berry bianchi bias borgs casino caveats cesa chapman chayes choices choose click clicks clickthrough coefficients commerce commonly compute computed computer conclusion conditions conference considering context contextual control converges cumulated current data derandomized descending described discrete display displayed domains dupret dynamics econometrica edition efficient electronic elimination empirical engine epoch error estimate etesami evaluate evaluation even examples expect experiments explo exploration family finite first fischer foundations freund fristedt from gain gambling gathered generate given gives greedy hall heckman here highest ieee ignoring illustrate immorlica impossible inaccessible information international interscience intl intuition jain jarvelin john journal kaelbling kekalainen keyword kulkarni lahaie langford learning location logged london machine made mahdian mannor mansour matching mathematics method methodology methods million model multi murdock neural note number observations obtained online opens optimization order over past pennock piwowarski policy poor position possibility previously probabilistic problem problems procedures process processing provide quantity quickly quite random randomized ranking rate reasonable receive references reinforcement relative reorder reordered reordering reorders required research results reusing revenue rigged robbins rules same sample scavenging schapire science search second selection sequential series several show shown side simply slate slot small specification spencer stopping strongly study suggests symposium systems techniques test that them then there this those through thus time training transactions turned unbiased unnormalized used user using value variance wang were when where which wide wiley with without work world would yahoo zhang http://icml2008.cs.helsinki.fi/papers/503.pdf 67 Fast Estimation of First-Order Clause Coverage through Randomization and Maximum Likeliho o d accepted accuracies achieving across algorithm algorithms applied arias artificial automated average avoids based bessi certain classification clause concerned conclusions constrainedness constraint constraints coverage crato data dataset datasets designs django domised efficient engineering estimation exploiting expressions fast favorable feature fern first fundamenta garcia generalization giordana gomes have heavy hold horn hypotheses ijcai illustrated improving induction informaticae instances intel introduced issue journal kaufmann kautz kemen khardon king kuelka laborious lavra learned learner learning ligence likelihood logan logic machine maloberti matching maximum measure mining molina morgan muggleton multi muta mutagenicity ndez order page paper performance phase phenomena precision problems product provides proving quality real reasonable reasoning recall recover references regimes regions relational research restarted restarts ries rouveirol runtimes saitta satisfaction satisfiability search sebag sedano selman sets shown spec springer srinivasan statistical sternberg stochastic stratz study subsumption table tailed tails test testing that theories thetasubsumption this tissot tractable transitions unsatisfiable virtual well which while with world zelezny http://icml2008.cs.helsinki.fi/papers/335.pdf 46 Privacy-Preserving Reinforcement Learning accuracy achieves acknowledgments action adaptive agents appear applications approach approximate apte arbitrarily asia author average balancing barto basic because behavior belief between both cambridge carried center centric channel clustering cogill comp comparison computation computational computer conf contrast control cross cryptography cryptology data decentralized delayed dimacs disclosed discovery discussed distributed does dynamic efficiency episode episodes epsilon equivalent essential evaluation exchange fairplay figure first follow foundation foundations from functions gambs generalisation generate global goldreich grant greedy guarantee higher horizontally idrl ieee inferior institute intermediate introduction jagannathan jiang journal jurik kearns kernels kmeans know kobayashi lall large learned learning ledge limited lindell lncis load loss machine makedon malkhi marketing means meur mgard mining moallemi moore much national negotiation networks nips nisan nonlinear normalized numb number obtain oosting operative optimality optimization optimized others over pacific paillier pakdd partially partitioned partly party perceptions performance pinkas pprl preservation preserving press privacy privacye privacypreserving probabilistic programming propagation public rdrl references reinforcement rewards riedmiller rotkowitz sakuma same sampling sarsa scale schneider schroko science secrets section secure security selection selfish sella shown sigkdd simplification since some springer started statement steps stmt stochastic successfully supported sutton symposium system systems table task tasks technology that third this time tokyo trials tween under university used usenix user using vaidya value verma visitor volume watkins which while with wong work wortman wright zhang http://icml2008.cs.helsinki.fi/papers/242.pdf 100 Prediction with Exp ert Advice for the Brier Game bianchi brier cesa expressed forecasts freund haussler helmbold monthly probability references review schapire terms verification warmuth weather http://icml2008.cs.helsinki.fi/papers/383.pdf 57 On Multi-View Active Learning and the Combination with Semi-Sup ervised Learning active advances agnostic algorithm analysis annual anthony applications artificial atlas austin balcan ballard banff bartlett based belkin bertinoro beygelzimer blum both bottou bounds bridging broder cambridge canada category characterize chen classification classifier coarse cohn combination combining committee complexity computation computational conclusion conference cotraining dasgupta data dempster diego directed diverse documents edinburgh edition enhancing ensembles ervised expansion experts exponential feedback fields first foundations francisco frank freund from functions gaussian general generalization ghahramani goldman graphs harmonic hofmann image improvement improving incomplete information intel intelligence international italy joint jordan journal kalai kaufmann knoblock koller labeled labelled ladner lafferty laird langford large learning ligence likelihood lkopf machine madison margin matveeva maximum mccallum melville miller mining minton mitchell mixture modality monteleoni mooney morgan mozer multi multiple muslea national network neural nigam niyogi paper perceptron petsche pittsburgh platt practical practice press proceedings processing query redundant references relevance research retrieval roweis royal rubin sample sampling saul scotland selective semi semio semisupervised sensing series seung shamir singer society statistical supervised support systems techniques text theoretical theory this through thrun tishby tong tools towards training transactions ularization university unlabeled unlabelled using uyar vector view views washington weiss with witten yang zhang zhou http://icml2008.cs.helsinki.fi/papers/391.pdf 149 A Unified Architecture for Natural Language Pro cessing: Deep Neural Networks with Multitask Learning adaptive algorithms ando annual applications archie architecture bengio bridle cambridge caruana chapelle classification collobert computation data ducharme extraction fast feedforward framework from interpretation jmlr language learning machine mass meeting model multiple multitask nato network neural neurocomputing nips novel outputs pattern predictive press probabilistic proceedings rault recognition references relationships schlkopf semantic semisupervised series souli statistical structures tasks tectures unlabeled using weston with zhang zien http://icml2008.cs.helsinki.fi/papers/229.pdf 135 Manifold Alignment using Pro crustes Analysis acids algebra aligned alignment also american analysis annual application applications approach artificial bank based belkin bengio berman between bhat both bound bourne carried chapman clustering coifman computation conclusions conditions conference corpora cross data decision defined developmental diaz difference different diffusion dimensional dimensionality domain dotted each editing eigengap eigenmaps eigenvectors eigenvetors embeddings empirically evaluate everywhere experiment experiments extensions feature feng figure from functions fusion general gillilandand given good graph hall hancock handbook here hogben icml ieee ijcai image included including information intel international introduce isomap jections joint just keller kostrykin lafon laplacian larger learn learning ligence linear lines lingual locality loose machine mahadevan makarov manifold manifolds mapping maps markov matching mathematical maximum method metzler might minimum motovilov mountain multi multidimensional multilingual neural nips niyogi novel nucleic other paper pattern pendulum performance perturbation plot points policy presented preserving press problem proc processes processing procrustes protein proto pseudo pvfs rather reduction references reinforcement representation research respectively results retrieval same sample saul scaling semisupervised sets shindyalov should show similar society spaces spectral statistics study subspace suitable systems task tasks tests than that theorem theoretical this tight training transactions transfer tried true under used using value values various weissig well westbrook when which with workshop http://icml2008.cs.helsinki.fi/papers/502.pdf 35 Data Sp ectroscopy: Learning Mixture Mo dels using Eigenspaces of Convolution Op erators belkin clustering dasgupta eigenmaps embedding focs gaussians laplacian learning mixtures nips niyogi references spectral techniques http://icml2008.cs.helsinki.fi/papers/379.pdf 16 Graph Kernels between Point Clouds adaptive algorithm approach bach barone belongie bioinformatics borgwardt caelli caetano chapelle computation computer conic contexts diestel duality forsyth function graph graphical hall icml ieee jordan kernel kernels kriegel lanckriet learning machine malik matching models modern multiple object pami pattern point ponce prediction prentice press proc protein puzicha recognition references scholkopf schonauer schuurmans semi shape smola springer supervised theory trans using verlag vishwanathan vision zien http://icml2008.cs.helsinki.fi/papers/151.pdf 73 Fast Gaussian Process Methods for Point Process Intensity Estimation activity advances analysis barbieri basu boyd brown cambridge computation construction convex cunningham daley dassios economics efficient factorizations firing frank from gaussian gibbs gill golub implementation inferring insurance intensity introduction jones lognormal mackay mathematics matrix methods models modifying murray neural neurosci nips optimization point poisson preprint press process processes quirk rates references response sahani saunders shenoy spike spiking springer stimulus theory trains university using vandenberghe vere wilson with york http://icml2008.cs.helsinki.fi/papers/676.pdf 97 ManifoldBoost: Stagewise Function Approximation for Fully-, Semiand Un-sup ervised Learning acad academic approximation bertsekas beyond bickel boost boosting chapelle chen classification complex computing constrained data datasets dept donoho eigenmaps embedding fitting friedman function functions gradient greedy grimes hessian highdimensional icml inference inverse joachims journal lagrange learning lecture levin linear lkopf local locally machine machines manifolds methods monograph multiplier natl networks nips notes numerical optimization polynomial press problems proc procedures radial references regression regularized report rippa scattered scientific semio semisupervised series siam stanford statistical statistics supervised support surface technical techniques text tomography transductive university unknown using vector wang zien http://icml2008.cs.helsinki.fi/papers/682.pdf 2 On the Hardness of Finding Symmetries in Markov Decision Processes aaai about actions agents amarel amsterdam analysis booth bowling breaking carnegie colbourn column crawford dean decision dissertation doctoral elsevier equivalent first flener frisch givan graph hnich holland iaai intelligence isomorphism kiziltan learning limitations logic london machine markov matrix mellon michie miguel minimization model models multiagent north order pearson polynomially presence problems processes reasoning references report representations symmetries symmetry technical theoretical university walsh waterloo with york http://icml2008.cs.helsinki.fi/papers/331.pdf 98 Boosting with Incomplete Information abbeel about adaboost additive advances again algorithm algorithms also annals appendix application approach approaches arcing auxiliary bartlett baxter because between bhattacharyya boosting bound bounded brady bregman breiman built carreras categories change chechik class classification classifiers collins combining computation computational computer computing conditional cone conference confidence conll connection consistent contrary contribution cvpr darrell data decision derivative derive derived deriving description details detection difference discovering discriminative distances distinctive dynamic efros elidan emnlp entity estimation evidence explore exponential expression extractor features fergus field fields find first frean freeman freund friedman from function functional functions future games gaussian generalization ghahramani gives google gradient handling harmonic hastie heitz hidden holds hypotheses icml ieee image images improved incomplete inequality information input interesting international invariant jmlr jointly jones journal kadir keypoints koller labeling lafferty large last layout learner learning lebanon level likelihood line location logistic loss lowe machine margin marquez mason maximum mccallum mease minimize missing model modeling models multiple named neural nips nonlinear number object objective objects occluded order outputs overview padro parallel parameters partially perona perspective point prediction predictions press proceedings produced programming provided quattoni random rated rather real recognition recognizing references regression related represented representing reranking research respect resultant robust rohanimanesh rules russell saliency same sampling scale scaleinvariant schapire sciences search second segmenting semantics semi sequences sequential setting shivaswamy shotton shown sign similar similarly simple singer sivic smola springer stationary statistical statistics supervised support sutton syntax system takes taking technique techniques than that their then theoretic theories tibshirani time topic uncertain uncertainty unnormalized unsupervised update updates updating upper used using value variable vector vecwhere view viola vision vote weak when which will winn with workshop wyner zero zhang zisserman