http://www.informatik.uni-trier.de/~ley/db/conf/icdm/icdm2004.html ICDM 2004 http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420335abs.htm 44 Matching in Frequent Tree Discovery Bjorn Bringmann ¨ Machine Learning Lab, University of Freiburg Georges-Kohler-Alle, Geb. 079, 79098 Freiburg, Germany ¨ bbringma@informatik.uni-freiburg.de able acknowledgments additional adopted aggarwal albrecht algorithm algorithms also amount andreas anonymous appears application applied apriori arikawa arimura asai author based below best better blue both brodley calculate called candidate cases cinq classifier clearly code comments compared complete computation computing conclusions conference considering consisting constraints cost could count data database databases dataset datasets deal definition depends depict depth directly discover discovered discovery dramatically each edges effect effective efficient efficiently embedded embedding enable enabling encouragement european evaluate even expands experiments explorations exponential extend facilitate fanout fast features figure finally find finding first focus forest fourth fragement frasca freqt frequent from furthermore general generated graph graphs grows growth gspan hand have helped hope hypothetical icdm idea ieee ijcai improve improves inclusion incorporate incorporation indicates inokuchi instance international into introduced justified karwath karypis kawasoe kindly kohavi kramer kuramochi labels large less levels levelwise like lines logic lowering many martini mason master matching maximum memory mine minimum mining mohammed molecular more most motoda much necessary nice nodes notion novel onion only order organizers other pages paper pattern patterns peeling pkdd plot post presented proc proceedings process processing project provided providing pruning raedt rather real reduce reduces references regarding related report representation required requirements requires research restricted results reviewers right rousset sakamoto scales search sebag semistructured sense sets show shows siam sigkdd significant since slower smaller smart software solid some source space specific speedup step still structural subgraph subgraphs substructure substructures subsumption subtree such support supported surprising synthetic technique termier than thanks that there therefore they this time towards tree treefinder treeminer treeminerv trees unfortunately unique until usage used useful user uses using version vertical visits washio website well when which whose with work world would writing xrules zaki zheng zimmermann http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420249abs.htm 31 Bottom-Up Generalization: A Data Mining Solution to Privacy Protection achieved achieving acknowledgement addressed agrawal algorithm altogether amenable annealing annual anonymity approach argus attribute attributes avoidance based believe better bled bottom bureau census cercone child clark climbs comments company compared conference confidentiality consistent constraints constructive control current data databases determined different disclosure discovery draft elements entirely escaped existing extensions factor figure files finally focused focusing fourth framework fuller further future fuzziness general generalization generalizations generalized generalizing getting good greedily greedy greenberg handling helpful heuristic hierarchy hill housing hundepool icdm ieee incorporating index information international introducing introduction investigate issue issues iterative iyengar ization journal kaufmann knowledge knownledge learning limitation local machine make masking matched methods metrics microdata mining model more morgan much necessarily noise novel novelty numeric official optimum oriented partial plan possibility practical preserving privacy procedures proceedings process programs project projecting proposed protection pruning quality quinlan references report research reviewer reviewers satisfy scalability scaleup section seminar several sigkdd sigma sigmod similar simulated software solutions srikant state statistical statistics stochastic structure stuck study suggested suppression survey sweeney systems taken tape thank that therefore these third this tically trade transforming uncertainty using utility value values vldb washington where will willenborg winkler wish with without work xecutie http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420233abs.htm 29 Privacy-Preserving Outlier Detection accounts additional alberta algorithms allowing analysis annual anomaly anonymity another applications appropriations association atallah auctions barbara barnett based bases bayes bayesian bidding bill british cachin california cambridge canada chapter chicago city clifton clustered columbia commun communications communities comparative completeness computational computer computing conference constructing corzine cost council crowds cryptographic cryptography currently data datamining datasets december defense department detecting detection directive discovery distance distancebased distributed done edition edmonton efficient eighth enable enacted engineering environments ertoz estimators european evaluation exchange expert explo ezawa feingold figure first foundations fourth francisco frank fransisco free from game general generate goldreich goldschlag grama honest horizontally icdm ieee illinois implementations implementing important individuals information integrating international into intrusion intrusions ioannidis jajodia java john journal july kantarciogl kaufmann knorr knowledge kumar large lazarevic learning lewis louisiana machine majority management manner mental methods micali mining moratorium morgan movement nelson network networks norton novel oblivious october official onion operations orleans outliers ozgur packages pages parallel parliament partitioned party personal play practical predict preserving press privacy private problem proceeding proceedings processing products proposed protection protocol protocols public ramaswamy rastogi rations reed references regard reiter robust routing rubin rule rules schemes science secrets section secure security senate sept sets seventh shim siam sigkdd sigmod software sons space special srivastava statistical study such symposium system syverson techniques telecommunications them theorem theory these third title tools transactions transformation transformations tucakov uncollectible university using vaidya vancouver vertically very viii vldb volume weka wigderson wiley with witten wyden york zamar http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420567abs.htm 102 Learning Weighted Naive Bayes with Accurate Ranking according accuracy accurate achieve achieves addition addresses addressing algorithm also although among another artificial attribute attributes based bayes bayesian below besides best better beyond both calculate choice classification classifier classifiers climbing combined compared conclusions conditions conference consistent continuous contributions data databases dataset datasets decision differences directly discretization discriminating discussions does domingos experimental experiments extensions fact facts fayyad first firstly five four fourth friedman from gain geiger goal goldszmidt good have hence high higher hill huang icdm ieee ijcai illustrated independence indicates induction intelligence interesting international interval irani issue joint just kaufmann knowledge lastly learning ling looking lose loses loss machine major mateo mcmc measure measured merz method methods mining more morgan most multi murphy naive network none observation observe optimality original other outperform outperforms paper pazzani performance performs practice precisely probabilitybased proceedings programs proposed provost quinlan ranking ratio recent references repository results search second should show simple slightly some statistically studied suggests systematically table term terms than that them there thirteenth this ties tree used using valued values variants various vector vehicle weight which wins with wnbg wnbs works zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420403abs.htm 61 Evolutionary Algorithms for Clustering Gene-Expression Data* Eduardo R. Hruschka, Leandro N. de Castro, Ricardo J. G. B. Campello Universidade Católica de Santos (UniSantos) {erh,lnunes,campello}@unisantos.br acad accuracy algorithm algorithms analysis annals appear article better bioinformatics biology bumgarner campello castro cluster clustering clusters conclusions conference considering data datasets ebecken efficiency estimate evaluate evolutionary expression finding fourth future gene genes genetic genome going good groups have hruschka iberamia icdm ieee improvements improving informatics intelligent international introduction investigate issue kaufman lnai mainly mathematical means measurements medical medvedovic microarray mining number operators pattern performances performed press probability problem proc proceedings proposed real recognition references repeated results right rousseeuw series several shown similar simulations statistics structural survey tackled techniques terms that this valafar well were wiley with work works world yeung http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420331abs.htm 43 Text Classification by Boosting Weak Learners based on Terms and Concepts addison algorithms also analysis annual application approaches authors automated automatic automatically background barcelona based being bloehdorn boosting boston bozsak canada categorization classification clustering combined commerce comparison computational computed computing concept concepts conclusion conference corpus data date decision development disappointing discovery document does ends enprovence eurocolt european evaluation example experiments extracted feature features fourth fragkou france freund from generalization good hofmann hotho however icdm ieee improvement improvements improves informaion information integrating intelligent interest international into journal kaburlasos kaon kehagias knowledge large latent latter lead learning line machine means mining mixed number often ohsumed ontological other particular petridis plsa probabilistic proc proceedings processing publishing quite recently references related reported representation representations research result resulting results retrieval salton scale scattered schapire seattle sebastiani second semantic sense senses settings several sigir sigkdd significant spain staab statistically stumme surveys systems table tasks technologies term terms text that theoretic theory therein toronto towards using very were wesley with word wordnet work workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420419abs.htm 65 Divide and Prosper: Comparing Models of Customer Behavior From Populations to Individuals across adomavicius against aggregate aggregation algorithms allenby also among analysis applications approach arbitrary based bayesian beaver because behaved behavior behaviors best better between build building cadez case ceder center claim class classifiers clearly clustering cohen comm comparable comparative compared comparing computing concave conclusions conducted conference consumer continuous convex cortes could curves customer customers data datasets dependent described differences different dimensions discovery distributions divide dominate dominates dominating doubleday driven econometrics effective equivalent estimating evaluation even eventually expected experimental experiments expert extracting factors farthestfirst fast finer finite first focused found four fourth framework frank from further future gain general generalize giles given granularity grouped hall hancock hand heterogeneity heterogeneous heuristic heuristics high hochbaum however icdm icml identified idiosyncratic ieee implementations individual individuals induction influence influencing informs instances insufficient international into introduction irvine issue java jiang john journal knowledge kotler langley language last lead learning level levels like little lowvolume machine management manavoglu mannila marketing mathematics measures mendenhall mentioned method mining mixture mobasher model modeling models monotone more multiple nature need needs never observed occurs operational other outcomes outperforms over padmanabhan paper particular pattern patterns pavlov peppers performance performances performed performing performs personalization phenomenon plotting poor poorly populations possible practical predicting prediction predictive pren presented press primarily principles probabilistic probability problem proceedings profiles programs progressively prosper publishing quality quinlan random reconstruction references research results rogers rossi rule running same second section sections segment segmentation segmenting segments sequential session settings shmoys show showed shows sigkdd signatures significant significantly since smyth sparse sparsity special spiliopoulou statistically statistics stern streams strongly study such sufficient suggest surprising tasks techniques than that theoretical there this thomson three tools tradeoff trans transaction transactional trying tuzhilin types under understanding usage user using validation variables very volume want well what when while will with without witten working worse would yang yielding york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420122abs.htm 15 Mass Spectrum Labeling: Theory and Practice Z. Huang, L. Chen, J-Y. Cai, D. Gross*, D. Musicant*, R. Ramakrishnan, J. Schauer*, S.J. Wright advances aerosol aerosols agrawal algorithms anal analysis arindam artificial association associations atmospheric atofms banerjee based basu benson between birch bound characterization chem chemical chen classifiers clustering communications composition conference constrained correlated cost data databases design development discovery edition efficient eibe environ faloutsos fast fodo fourth frank fung gard groups huang icdm icml ieee imielinski implementation individual integer intelligence international items java jayne john jorge kaufmann knowledge labeling large leard learning limited livny machine madison mangasarian mannila mass massive mayer mccarthy measurement memory method metric mining mooney more morgan networks neural nips noble nocedal nordmeyer numerical optimization particles performance phenomenal portable practical practice prather problems proc proceedings profiles programming quantitative ramakrishnan raymond real references relational report reviews rules salt search seeding semisupervised sequence sets shavlik sigmod similarity size spectra spectrometer spectrometry spectrum springer srikant steven submicron subspaces suess sugato support swami tables tech technical techniques technol theory time tools towell using variable vector very vldb wiley with witten wolsey wright zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420067abs.htm 8 Communication Efficient Construction of Decision Trees Over Heterogeneously Distributed Data aaai advances agnik algorithm algorithmic algorithms analysis applications applied approaches approximate arriaga artificial associates autonomous bagging baltimore boosting bowyer building caragea challenges chan chawla classifier collaborative comparison component computational computer concepts conference constructing context data december decision design dietterich directions distributed editor editors efficient empirical ensemble ensembles erlbaum evolutionary existential experimental foundations fourth from future general generation grant hall handbook hecht hershberger heterogeneous honavar icdm ieee imitating intelligence intelligent international isda issues january japan johnson joshi journal kargupta kegelmeyer large lawrence learning lecture life machine maclin maebashi mahwah meaning meta methods mining nasa next nielsen notes october oklahoma opitz organized pages parallel park pleasures popular press principal privacy private proceedings prodromidis projection purpose puttagunta random randomization randomized references report representations research road robust rolling rules sanseverino scalable scale science security self silvescu silvestre sites sivakumar sources springerverlag study systems technical technique theory three tree trees tulsa using vectors vempala volume workshop yesha york zaki zhan http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420027abs.htm 3 Density Connected Clustering with Local Subspace Preferences Christian B¨ohm, Karin Kailing, Hans-Peter Kriegel, Peer Kr¨oger Institute for Computer Science, University of Munich, Germany {boehm,kailing,kriegel,kroegerp}@dbs.ifi.lmu.de aaai agarwal aggarwal agrawal algorithm algorithms ankerst applications approach architecture automatic based bmbf breunig buena cairo campbell carlo church clustering clusters compliance concepts conf conference consent dallas data databases density densityconnected despite determination dimensional discovering discovery efficient egypt ester expanded fast finding fingerhut fourth gehrke generalized genetic genetics grant gunopulos high hinneburg hughes icdm identify ieee international jones kailing kamber kaufman keim knowledge kriegel kries lake large liebl madison management medicine mining monte morgan multimedia murali nature nearest neighbor nennstiel network newborn noise oger olgem oller optics ordering pages parental philadelphia points portland press preventive proc proceedings procopiuc program projected projective raghavan ratzel references roscher sander screening seattle siam sigmod space spaces spatial structure subspace systematic tavazoie techniques very vista vldb what wisconsin with written york zapf http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420447abs.htm 72 Dynamic Daily-living Patterns and Association Analyses in Tele-care Systems B.-S. Lee1, T. P. Martin1, N. P. Clarke1, B. Majeed2, and D. Nauck2 abilities abnormal about access activities adaptive addition advanced affected agrawal algorithms analyses analysis another applicability applied assistance association barnes basis before behavior behaviour being between capture cardiology care carry centre challenging change changing chen cleaner community competence comprehensive computers computing concepts conclusion conf conference considered constantly contact control daejon daily data databases deliver demand demonstration dependent deterioration developed difficult discovering discovery domain dynamically each efficient engineering environment erroneous exact expansion explain explaining fact focus formations fourth frail freedom fridge from functions further fuzzification fuzzy garner given giver going handle have highly home icdm identifying ieee illustration imielinski independence individual information intelligent interactive internal international interpreted intl intrusive items japan journal klootwijk knowledge korea large late less lifestyle living long management meij membership mental mind mining mobile monitor monitoring more morning necessarily necessary nelwan network normal objective observe obtain obtained osaka over paper particularities patient pattern patterns people person persons physical placed preprocessed primarily proceedings programming project provide provision quantitative real references remotely report robot robots rules sensor sensory sets sigmod skubic solution such summary supported swami symposium system systems technology tele telecare tendency term their therefore they things this time trend tzeng ubiquitous updated used uses using usually values very volz vulnerable washington wellbeing what whereby which with without http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420359abs.htm 50 An Evaluation of Approaches to Classification Rule Selection Frans Coenen and Paul Leng Department of Computer Science, The University of Liverpool, Liverpool, L69 3BX frans,phl @csc.liv.ac.uk aaai accurate addition agrawal algorithms alternative association authors based been best better blake california case classassociation classification cmar coenen conclusion conf conference confidence considered coupled cpar data databases discovery during efficient established evaluation examined experiments factor fast fifth findings first flach follows four fourth generation given good goulbourne have high higher html http icdm ieee inductive integrating international irvine knowledge laplace lavrac learning leng less logic lower machine measures mechanism merz mining mlearn mlrepository more morgan most multiple number ordering overall paper precedence predictive principal proc proceedings process produced produces programming proposed provided pruning references relatively repository result results rule rules satisfaction sets siam significant specific springer srikant strategies strategy structures suggested suited tested than that there this threshold thresholds tree unifying university used using verlag view vldb were where with works workshop zupan http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420387abs.htm 57 LOADED: Link-based Outlier and Anomaly Detection in Evolving Data Sets accuracy admit agrawal algorithm algorithms anomaly applications approach approximate approximation architecture association attacks attribute attributes aware barnett based bolton breunig categorical chan clinical clustering combines comparison conference continuous correlation cost counts daspa data defined density detecting detection disk distance evaluated evolving execution fast flexible fourth fraud frequency ghoting good guha hand high icde icdm identifying ieee information integral international intrusions journal knorr laboratory learning lewis limitation linear link loaded local loci mahoney major manku mechanism memory methods metric mining mixed models motwani multivariate near network normal novel ohio operate outlier outliers over papadimitriou paper parthasarathy pass penny performance presented proceedings processing pruning randomization real reduce references report requirements results review rithms robust rock royal rule rules safety schwabacher science sequeira sets several siam sigkdd sigmod similarity simple small society space srikant state stationary statistical statistician statistics stream streams systems technical that this time traffic unifying university uses using vldb with workshop zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420363abs.htm 51 Mining Frequent Closed Patterns in Microarray Data Gao Cong, Kian-Lee Tan, Anthony K.H. Tung, Feng Pan School of Computing National University of Singapore 3 Science Drive 2, Singapore {conggao, atung, tankl, panfeng}@comp.nus.edu.sg acmsigmod after algorithm algorithms also among association available axes bastide because besson best biclusters bioengineering bioinformatics biological both boulicaut breast cancer cannot carpenter charm clearly closed closet code compared comparison conclusions conf conference cong consume consuming creighton cremileux data database databases datasets decreased default deterministic differences discover discovering discovery dmkd efficient error executable existing experimental experiments expression faster fastest figure find finding finish five following follows fourth frequent from gene generally genome give graphs grateful hanash have hsiao http icdm icdt ieee including increases interesting international itemsets jianyong jiawei knowledge lakhal least less leukemia linux logarithmic long magnitude make memory microarray minimum mining minsup mohammed more most needs note observe order other paper pasquier pattern patterns performance points prefix proc proceedings proposed publications references relation reporting rept rerii results rioult rule rules running runtime scale schemes searching several show showed showing shown shows siam sigkdd sigmod slow slower slowest some source steepest strategies sults support symposium taouil than that them theory these this three time times transposition tree treebased tung typically unclear usage using usually version wang which while will windows with workshop yang zaki zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420351abs.htm 48 An Adaptive Learning Approach for Noisy Data Streams Fang Chu Yizhou Wang Carlo Zaniolo accuracy achieving acknowledgement adaptability adaptive algorithm also amount application approach bagging bars base better bilmes block blocks boost capable card changes classification classifier classifiers comparable compatible computation computed computer concept conf conference correspond credit curve data decision detected detection different discovery drifting drops elements emerging ensemble ensembles ensemfigure ensures estimation evaluation even experiments fast fewer figure foundation fourth friedman from full fullsized further future gaussian gentle goal good grant grown hastie have having hidden highly icdm icsi ieee inference instead integrated international into jumps knowledge large larger leads learning life light logistic lower markov method methods mining mixture model models more most national nodes noise noisy normalized obtained occur other outlier outliers over parameter part perform performance prediction proceedings produces propose ratio real references regression report resulting results robust robustness running samples scale scheme science seconds shown shows sigkdd similar size small smaller sped springer statistical streaming streams street study summary supported technical teradata terminal terms than that these this thus tibshirani time totals transaction trees true tutorial ucla unweighted upper used using verify wang weak weighted weighting when which with work zaniolo http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420327abs.htm 42 Detecting Patterns of Appliances from Total Load Data Using a Dynamic Programming Approach about addison additional affordable algorithm algorithmen already analyse analysed appliance appliances applications approach assimilate association bandemer baranski based been bell berlin bezdek bologna building buildings canada chief civil clamped clustering clusters collected combinations communication computing concordia conference consumer consumption cookers could create creating daily data days december department designed detected detection devices different disaggregate displaced domestic draft each electric electricity elektronischer energy engineering enviromental equipments evaluated evaluation everyiteration evolutionsstrategien farinaccio feddersen find finite five flow forney fourth from fsms fuzzymodelsforpattern generated genetic genetische gentic george german germany hart haushaltsz heaters heidelberg heinzmann hidden hler homes house household icdm ieee improve individuals individualsordered information installedelectricity international into intrusive isbn isusedtoupdatetemporarystoredparametersofdetected italy jamesc journal july june kohonen lastenheft lawrence like linda load lower machines major maps markov mathematical mathematischen meterin meters method methods mining models monitoring montreal neburg network nonintrusive number october ofdata offers operator optical ordinary organizing paper parameters pattern patterns performance powertech presented press proceedings quality quebec rabiner radu ratschl real recognition reduces references refrigerators results rough search selected self sensor sequences series shannon show siences simulated simultaneously soft sources spectrum speech springer state stoves structre stuttgart sufficient suitableandrobustparametersettodetectappliances summary system takes technical techniques tested tests teubner that theory three toelaboratea total transitions tuned tutorial typical umgang ungewissheit university upper uses using variety vdew verlag version viterbi voss wesley when wide with zmeureanu http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420186abs.htm 23 SCHISM: A New Approach for Interesting Subspace Mining Karlton Sequeira and Mohammed Zaki about absolute adaptive after agarwal aggarwal agrawal algorithm alimoglu alpaydin also american annals applications applied approach association asymptotic automatic available based between beyer bounded breaking brusco chakrabarti cheng chernoff choudhary classifiers clustering clusters code combining completed concepts conclusions conf conference connected contributions cradit curse data databases define definition density design determine different dimension dimensional dimensionality ding distance easy efficiency efficiently entropy european fast faster finding fourth framework francisco frequent functions gehrke generalized goil goldstein gouda grid grids guarantees gunopulos handwriting heuristic high highdimensional hinneburg hoeffding hypothesis icdm icdt ieee imielinski indexing inequalities interesting interestingness international intuitively itemsets itself jones kailing kamber kaufmann keim kriegel kroger kumar large local making massive math maximal meaningful means measure mehrotra methods mining montecarlo morgan multiple murali nagesh nearest neighbors networks neural numerical observations ofitems opposed optimal parameters park patterns penbased pkdd probability problem proceedings process procopiuc projected projective provides prune psychometrika raghavan ramakrishnan random ranking recognition reduction references relatively reported representations requires right rules schism search selecting selection sequek sets shaft siam sigkdd sigmod simon spaces srivastava statistical statistics subspace subspaces sums swami symp systematic techniques tests these towards traditional turkish user variable variables vldb wanka when which wolf zaki zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420463abs.htm 76 Revealing True Subspace Clusters in High Dimensions Jinze Liu, Karl Strohmaier, and Wei Wang Department of Computer Science, University of North Carolina, Chapel Hill, NC 27599 {liuj, strohma, weiwang}@cs.unc.edu acknowledgement adhesion advice after aggarwal agrawal algorithms alok among analysis andrew applications automatic base based between biologists both cell certain cheng choudhary chuck clustering clusters columns common conclusion considering data datasets defined density developing dimension dimensional discussion divided each edition effectiveness efficient embedded entropy erasmus erim experiments expression extended filtering finding flannery framework from fuzzy gaussian gehrke gene generated genes given goil grid gunopulos harsha help high highlight hyperrectangular icde icml identifying includes institute interpretable intervals into kaymak large length less levels line logarithms mafia management matrix measure measuring meta mining minutes mixtures moore noise normalized number numerical ongoing only organize overlap overlapping paper park pelleg perou point press process procopiuc projected provide raghavan real reasonable recipes rectangles redundancy references report research result revealing rotterdam scalable selected setnes sets shape show sigkdd sigmod significant similarity simple soft space spaces strength strohmaier subspace subspaces synthetic tailed technical teukolsky than thank that their them then this threshold time took transformed type university using valuable very vetterling wang were which whole with wolf work zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420379abs.htm 55 Decision Tree Evolution Using Limited Number of Labeled Data Items from Drifting Data Streams Wei Fan1 Yi-an Huang2 Philip S. Yu1 accurate active aggarwal also analysis applications approaches april august available babcock babu based between both brand callaghan changes changing chen china chunk class clustering comparable computer concept conclusion conducted conf conference constructed continually current data database datar decision diagnosing dimensional discovery distribution domingos dong drifted either evaluating evolution evolve evolved evolving examples existing expanding expansion experimental extended find focs found foundations fourth framework francisco from guha have higher hongkong however huang hulten icdm ieee immediately important international issues items knolwedge knowledge labels large leaf life loss madison management maximum methods milshra mine mining model models more motawani motwani multi necessary node nodes online order original pages pass pattern pods press principles probability problem problems proc proceedings progress proposed queries real references regression replacement replacing requires science seattle series siam sigkdd sigmo sigmod significance significantly similarity solves spencer statistical stream streaming streams studies suspected symposium synthetic systems test than that time tolerable tree true tutorial using very vldb wang washington were whenever widom wisconsin with http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420475abs.htm 79 Estimation of False Negatives in Classification accuracy accurate actual address adomavicius along also amer analysis another applications ascertainment assessing assn automatic axis been better between biometrics biometrika blake book burke capture case census chapter christensen chronic classified classifier classifiers closed color colton completeness conference contingency continuous cover cross current darroch data databases dataset different diseases distributions edited elements entries entropy epid epidemiological epidemiology estimate estimated estimation evaluation even false features fienberg first former fourth francisco frank furthermore gave generation given goldberg goodman having hence higher hook icdm ieee illustrated implementations independence independent information interactions international issues iterative java jozef kantardzic kaufmann knoke learning light likewise limitations linear logistic lower machine mane march math medical mehmed merz method methods miner mining missed missing moddemeijer model models more morgan multinomial multiple mutual need negatives note noted number obtain obtained obtaining opportunity pair plotted population positives possible practical presented proc procedure proceedings publications quasi range real recapture references regal regression relatively repository represents research respectively reviews sage sales screening second shown sidel sources springer srivastava stat subfigure subfigures sufficiently table tables tabulation than that theory thomas though thus tools total training using vayghan verlag versa vice were when wiley will wise with without witten wittes world york zurada http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420487abs.htm 82 Privacy-Sensitive Bayesian Network Parameter Learning acknowledge acknowledgements advances agrawal algorithm almost analysis approximations april assuming august authors automata baltimore based bayesian berlin between binary breaches buena case chen classification clifton colloquium computation computer conclusions conditional conference considerably considered corresponding count county crypto cryptology data dataset datta decomposition department diego different discussion distributed economist electrical engineering entries equations error estimate estimated evfimevski exchange expected experiments explorations feature feigenbaum foundation foundations fourth free from gaussian gehrke generate generated given grants heterogeneous hold icalp icdm identical ieee implementation information inner international involving ishai june kantarcioglu kargupta lake languages learning lecture less limiting lindell linear malkin maryland matrix mean melbourne method mining multi multiparty multiple multiplicative multivariate national network nissim noise notes november obtain obtained orthogonal orthogonalized pages parameters parties partitioning party perturbation perturbed pinkas pods preserving privacy probabilities problem proceedings produces product programming projection properties proposed ramakrishnan random references regression relevant report required requires results ryan science seattle secrets section secure sensitive showed siam sigkdd sigmod single sivakumar solution some springer srikant states statistical strauss structure subset supports symposium table tabulated technical techniques tenth that then this tools true unit united university using vaidya valued values variables variance vectors vertical vista volume wang were where whose with wright yang zero http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420483abs.htm 81 Active Feature-Value Acquisition for Classifier Induction academic acapulco access accuracy acknowledgments acquire acquisition acquisitions active addressed addresses also alternate alto applied approach artificial assume assumes atlas attempts attribute austin balaji bayes blake budgeted class classifier classifiers classifying cohn compared complementary complete computationally concept conclusions conf conference consider cost darpa data databases dataset datasets developed different discovery discrete economical effective efficient error estimation etoys examples existing expedia experiments feature featurevalues federov focuses fourth framework francisco frank from general generalization given goda grant greiner have however html http hurt icdm ieee implement implementations improving incomplete individual inducing induction inductive instances intelligence international intl issue ithaca java june kaufmann kimbrough know knowledge labeled labels ladner learners learning like lizotte machine madani measures melville merz mexico minimize mining mlearn mlrepository mooney more morgan most naive optimal padmanabhan pages palo personalization policies policy practical prem presented press priceline probability problem proc proceedings propose providing provost quinlan randomly ranking rather raymond recent reduction references related report repository respect results saar sampling select selected sensitive sigkdd significantly simple size some specific superior supported table technical techniques test texas than thank that theory this tools traditional training tsechansky turney types uncertainty university unknown unlabeled usage useful utility value values were what which with within witten work workshop would zheng zhiqiang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420241abs.htm 30 SUMMARY: Efficiently Summarizing Transactions for Clustering Jianyong Wang and George Karypis accelerating accuracy accurate achieving addition agarwal aggarwal agrawal algorithm algorithms also among antonie approximation association attributes bamboo based basket bastide bayardo beil best between biological block boolean boulicaut brin brute burdick bykowski cactus calimlim candidate carpenter categorical categorization charm chen choose cikm class classification closed closet clustering clusters cmar computing condensed conference confidence cong constraint constraints counting data databases datasets decreasing deeply design developed directly discovering discovery distributed dmkd document dynamic each effective efficient efficiently enabled ester evaluated even explore exploring extremely fast fimi finding force fourth free frequency frequent from fung future gade ganti gehrke generation goethals grahne guha gunopulos hash hicap hierachical hierarchial high hsiao hyper icde icdm icdt ieee imielinski implementations implication integrating international introduction item items itemset itemsets iterative journal karypis knowledge kumar lakhal large length lengthdecreasing long lpminer mafia mamoulis mannila market maximal methods mine mines minimum mining most motwani multiple nishio novel number parallel park pasquier pattern patterns performance plan prasad prefix presence preservation proceedings projected projection properties pruning pushing queries querying ramakrishnan randomized rastogi reduce references representation representations rigotti robut rock rule rules runs saluja sampling search searching seno sentences sets shim showed sigmod significantly some space specific srikant steinbach storing strategies structure study summaries summary support supported swami tang taouil term text that toivonen tough transaction transactional transactions tree trees tsur tung ullman uses using variant very vldb wang when which while with without workshop xiong yang zaiane zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420431abs.htm 68 Integrating Multi-Objective Genetic Algorithms into Clustering for Fuzzy Association Rules Mining achieve achieved addition adjusts advantages agrawal algorithm algorithms alhajj analysis application approach approaches appropriate artificial association attribute attributes automatically base based case chan changing chien cikm classical clustering clusters comparative conclusions conference congress cure data database databases demonstrated described desired determined direction discovered duration each effectively efficient evolutionary experiments figure finally first fourth from functions fuzzy gabased genetic given gives guha hong icde icdm ictai ideal ieee ifsa important information intelligence intelligent interesting international interval itemsets kaya large larger lent literature membership method methods miller minimum mining moga more multi nafips number objective obtain obtained optimal optimized optimizes optimum other outperforms over paper pareto possible proc proceedings proposed provide quantitative ranges rastogi references relational reports result results rules second sets shim show sigmod solution solutions springer srikant strength study summaries summary support swami systems tables than that these thiele this three through together tuning used using values which widom work world yager yang zhang zitzler http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420495abs.htm 84 A Comparative Study of Linear and Nonlinear Feature Extraction Methods acadamic algorithm algorithms american analysis application applications association based cambridge chen cities classification clustered combining comparative compared complexities complexity computational computer conference costs cristianini data datasets decomposition department difference differences dimension direct discriminant does dramatic edition effective engineering extraction face feature figure first fourth friedman fukunaga generalized gsvd have high highdimensional however howland icdm ieee indicating instead international introduction introductiontostatisticalpatternrecognition jeon journal kernel learning less liao linear lowest machines make matrix method methods mining minnesosta nonlinear null obtained other overall pardalos pares park part pattern performance performances performed preserving press problem problems proceedings recognition reduction references regularized reports rlda sample science second shawe should show siam singular size small solve space statistical step structure study support system table taylor technical text than that therefore third tobe transformation transformed twin undersampled university using value vector which while with yang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420455abs.htm 74 Classifying Biomedical Citations without Labeled Training Examples Xiaoli Li*, Rohit Joshi, Sreeram Ramachandaran, Tze-Yun Leong* School of Computing, National University of Singapore Computer Science Program, Singapore MIT Alliance* aaai accuracy accurate achieves acknowledgments agency agrawal algorithm algorithms alliance applied approach artificial association based baseline bases bayes best better blum both build cancer caner cannot categorization chang citations classification classifier classify cleaning colorectal combining comparison computational concepts conclude conclusion conf conference data database databases dataset decreases dempster directly discovery documents does domain each education engine established event example experiment experimental extract fast figure figures filt filtering first fourth from general gets gives google grant help here higher http hurt icdm ieee important improve improves incomplete increases intelligence international intl iteration iterations iteratively journal keywords know knowledge labeled laird large learning likelihood machine makes maximum mccallum medline mined mining ministry mitchell models moreover mutually ncbi nigam obtains ontological original other page pages paper partially pebl perform performance phrases positive proc proceedings promising propose proposed pubmed qualified query references reinforcing research respectively result results royal rubin rules sars science search second show shows significantly singapore society specific srikant star statistical step supervised supported technique techniques technology terms text than that then theory this three thrun training umls umlsks unlabeled using utilize various very which will with without words workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420075abs.htm 9 Non-Redundant Data Clustering David Gondek Thomas Hofmann Department of Computer Science, Brown University Providence, RI 02912 USA {dcg,th}@cs.brown.edu accounting advances advantage again agree algorithm allerton also american annealing annual application artificial association asymptotic balancing based been below better bialek bottleneck bounded bucila cardie chapman characterization chechik classification clus clustering communication compression computing concave conditional conditions conf conference consequence constrained constraintbased constraints control convergence correspond corresponding counterparts craven data database databases depends deterministic differentiating dipasquo discovery distance distribution documents dropped dual dualminer every explicitly exploratory extract extracting fact finally first follows fourth freitag friedman from fulfill function gehrke generalized gets ghosh given global gondek hall have hofmann icdm ieee information instance instancelevel intelligence international intl isisconcave itemsets iteration joint jordan kamvar kifer klein knowledge labeled lakshmanan large learning lemma level linear linearfor logarithm london machine makes making manning maxima maximum mccallum mccullagh method metric mining mitchell model models more moreover mosenzon most multivariate nelder neural nigam nineteenth nips normalization objective observe obtained only optimality optimization optimizations optimize order over pages pang parameters pcxy pereira point precisely prior probabilities problems proc proceedings processing prove pruning readability reduced references regression related relation relevant respect resulting results rewritten rose rule russell saddle scheme separately sets setting seventeenth side sigkdd sigmond simple since slattery slonim soft solutions space stationary straightforward structures subscripts subspaces suggestively symbolic systems tering terms text that their theory there these this thrun tishby tung uncertainty unlabeled update using wagstaff where white wide will with workshop world xing zero zhong http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420571abs.htm 103 Learning Rules from Highly Unbalanced Data Sets aaai achieved addressing algorithm analysis application applications applied artificial attributes based budget challenges chimerge class classifiers comparison conclusion conference cost curse data described designed despite detection diego direct discovery discretization distributions domingos enforcement enough exiting fifth figure forth fourteenth fourth general generic high highly holte icdm ieee images imbalance imbalanced inspection inspiration intelligent interlligence international japkowicz jose kaufmann kerber knowledge kubat learning ling machine making marketing matwin metacost method mining morgan national novel numeric other posed press problem problems proceedings radar reasonably recognition references results rlsd rule sampleing satellite sensitive sets several sided sigkdd similar solutions spills stephen study such systematic tenth these this tools unbalanced uncertainty well with york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420439abs.htm 70 GREW--A Scalable Frequent Subgraph Discovery Algorithm Michihiro Kuramochi and George Karypis Department of Computer Science and Engineering University of Minnesota {kuram, karypis}@cs.umn.edu about according acknowledgment algorithm also although artificial astronomy aviation background based beam benchmark berkeley berthold best borgelt california called clip comments comparison complete computer concept conclusions conference connected cook could created data dataset datasets default dept description directly discovered discovering discovery diverse each efficiency efficient eight either evaluated experiments figure find finding finish four fourth frag frequencies frequency frequent from gave graph graphs grew gspan hand harvard heuristic highly hiroshi holder hour hours huan icdm ieee implementation induction inference inokuchi input intelligence interesting international isomophism karypis kindly knowledge kuramochi large learning least length machine matsuda measured ments mine minimum mining minnesota molecular molecules more most motoda number only operate osaka other pages paper parameters pattern patterns presence presented principle prins proc proceedings providing real references related relevant report reported research results runtime scalability scalable science seconds shorter showed shown significantly single size sizes sparse spends structured stsci subdue subgraph subgraphs substructure substructures system table takashi technical tetsuya than thank that their this three times tkde took ucla ucop undirected university useful using various version very wang washio which whose window wise with within yoshida http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420281abs.htm 35 A Polygonal Line Algorithm based Nonlinear Feature Extraction Method Feng Zhang Texas A&M University College Station, Texas 77843 zhangfeng@neo.tamu.edu about accuracy algorithm also american analysers analyzers apley applications approximating approximation arising artificial association avoid banfield based been believed between bishop boundary case chang classification clustering complexity component computation computing conclusion conference corresponding curve curves data dayan decrease design digits dimension dimensional each effect effective estimation experimental extended extensively extract extracted extraction factor feature features figure finite floe fourth from gave ghahramani ghosh handwritten hastie helps hinton icdm identification ieee image images indicate initialization international interpretation jasa journal kambhatla krzyzak learning leen linder line linear local manifolds mathematical method methods mining mixture mixtures model models morphology multivariate networks neural nips noise nonlinear number outperforms paper parameter pattern plot polygonal practical press previous principal probabilistic problems proceedings processing propose proposed raftery recognizing reduced reduction references report results revisited revow satellite sets shown space spie statistical statistics studies stuetzle submitted subspace successfully technical techniques that this tibshirani tipping toronto tpami transactions underlying university using utilized variation vector which with zeger zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420209abs.htm 26 Pei Sun University of Sydney School of Information Technologies Sydney, NSW, Australia psun2712@it.usyd.edu.au again algorithm algorithms another applications apply applying approach area around artificial attributes august autocorrelation based bases because bernstein both bound break breunig calculate calculation california captured captures carry causes change chapman chen city clear climate compared computer conference consequences consists constant consumed contemporary current dallas data database databases datasets december density desire detecting detection deterministically directly discover discovery distancebased editors effects elseiver encyclopedia environmental factored feature figure florida fourth francisco from function future geoinformatica global graph hall have hawkins heteroscedasticity icdm ictai identification identifying ieee index instability instead integration intelligence interesting international into kaufmann knorr knowledge known kriegel large leaves less lier like likely local london mainly management mcphadden measure melbourne method methods mining more morgan most motivated multiple naughton nearest neighbor neighborhood neighbors never nina nino novel november number oscillation outlier outliers pages part patterns potentially prentice proceedings proposed quantifies queries reduces references related relative results running sacramento sander science search seventh sharper shashi shekhar shown sigkdd sigmod slom society southern spatial stable standarddeviation state statistical summary surprising system techniques texas than that therefore they time tion tools total tour tree unified useful value values variance very which wilcox with work would york zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420375abs.htm 54 Using Representative-Based Clustering for Nearest Neighbor Dataset Editing about accomplished accuracy achieved achieves achieving algorithm algorithms also although another applications approach april asymptotic average based been benchmark california called canada centering classification classifier cluster clustering compare compression computer computing conducted conference consisting cyber cybernetics data dataset datasets decision dept easily edited editing eick eight empirical enhanced evolutionary example examples experimental experiments fourth gains general generation graphs have high houston html http icdm idea ieee importance important improvements initially interface international introduced irving june learning look loss machine master medoid mining misclassified mlearn mlmta mlrepository montreal more moreover nearest neighbor note number only other outperforming paper penrod prediction proceedings progress properties proposed prototypes proximity publication rate rates recent references removing replace repository representatives research results rule rules science several show significant small splitting stress style submitted subset summary supervised surprisingly symposium syst systems tasks technique techniques tested than that thesis this toussaint traditional training trans transactions tree unedited university used using vegas very wagner were wilson with without zeidat zhao http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420035abs.htm 4 On Closed Constrained Frequent Pattern Mining abstract accurate adaptive addressed agrawal algorithm algorithms andl anticipated application applications applied approach april association associations august based bastide bayardo being benefits best between bonchi bonsai both boulicaut bucila candidate cannot characterized charm closed closet computational computing conciseness conclusions condensed conference constrained constraint constraints convertible data database databases deep deeply definition derived discovering discovery dual dualminer efficient efficiently eighth engineering examiner exante exploratory extended fast finally finding fourth fragment free frequent from gaining gehrke generation giannotti goethals growing have hsiao icde icdm icdt ideas ieee ijcai imielinski increasing international into issues item items itemsets jeudy kifer knowledge kramer lakhal lakshmanan large level levelwise literature long lossless losslessness mannila mazzanti mining miningfrequent molecular monotone multiple optimizations optimized overlooked page pages pakdd pang paper pasquier pattern patterns pedreschi performance pkdd point possible postprocessing previous problem proc proceedings process provided pruning push pushing qualitative quantitative raedt reduction references representation representations requirements research rules satisfying searching selectivity sense sets shown siam sigkdd sigmod small space srikant strategies swami symposium taouil that this thus toivonen tough trees under uses using version view vldb wang which white whole wise with without works workshop zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420407abs.htm 62 Mining Ratio Rules Via Principal Sparse Non-Negative Matrix Factorization Chenyong Hu1,Benyu Zhang2,Shuicheng Yan3,Qiang Yang4,Jun Yan3,Zheng Chen2,Wei-Ying Ma2 advances agrawal aims algorithm algorithms appendix argmin association associations aumann auxiliary because between called cell code completes components conclusion conference convergence convex corresponds data databases desired difficult discovery each easily easy emergence experimental factorization faloutsos fast field fixed following fourth from function given gives holds icdm ieee illustrate images imielinski information international items korn kotidis labrinidis large latent leaning learn learning level likewise lindell logvij matrix minimize minimizing mining more multiple multiplicative natural nature negative neural nonnegative objective olshausen paradigm principal proc proceedings processing proof properties proposed prove proves psnmf quantifiable quantitative ratio receptive references relational representing results rule rules sets setting seung show sigmod simple since solving sparse srikant statistical suited swami systems tables testify than that theory therefore this thus update updated updating verify vldb where which wikhkj will with wklhlj work http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420339abs.htm 45 A biobjective model to select features with good classification quality and low cost Emilio Carrizosa Facultad de Matematicas. Universidad de Sevilla (Spain) ´ ecarrizosa@us.es Belen Martin-Barragan Facultad de Matematicas. Universidad de Sevilla (Spain) ´ belmart@us.es Dolores Romero Morales Said Business School. University of Oxford (United Kingdom) ¨ dolores.romero-morales@sbs.ox.ac.uk adapted adapting allwein approach bayer binary biobjective both call case classifiers computational conference cost data description designed detailed diagnostic eecs efficient find formulations fourth from generate http icdm ieee instance integer international journal july knapsack learning library linear machine margin method mining mixed multiclass norm objective oregon oregonstate pareto phase policies problem problems proceedings reader reducing references research respectively results schapire sensitive singer solutions solving specifically state tackled thesis unifying university using which zubek http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420427abs.htm 67 Orthogonal Decision Trees Hillol Kargupta£and Haimonti Dutta Department of Computer Science and Electrical Engineering University of Maryland Baltimore County, 1000 Hilltop Circle Baltimore, MD 21250 hillol, hdutta1 @cs.umbc.edu accuracy acknowledge acknowledgments advances aggregated algorithm also although analysis analyzing approach around associated author authors available award bagging based best better boosting breiman captured career circuits classificaiton classification clearly coefficient colleagues complexity components concludes conclusions conference considers constant construct constructing convergence cortes data decision depth directly discovery done drucker each earlier efficient eigenvalues eigenvectors engineering ensemble ensembles environments equivalent example exploits explore explored figure first five following forests found fourier fourth francisco free frequently from functionally future generalization gives grant icdm ieee illustrates induction information international introduced issues journal kargupta knowledge large learnability learning like linear linial machine mansour many matrix methodology mining mobile models most nasa networks neural nisan notion offered offers only onto opens orthogonal other pages paper park performed plan possibilities predictors principal proceedings processing produced projected properties proposed provide quinlan random reducing redundancy references represent representation representations research respectively scale section seventh several should sigkdd sigmod significant significantly simpler smaller spectrum stability stacked stacking streaming streams street supports system systems table techniques that them theory these they this thus tools transactions transform tree trees using variance were with wolpert work workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420011abs.htm 1 Subspace Selection for Clustering High-Dimensional Data Christian Baumgartner, Claudia Plant University for Health Sciences, Medical Informatics and Technology, Innsbruck, Austria {christian.baumgartner,claudia.plant}@umit.at Karin Kailing, Hans-Peter Kriegel, Peer Kr¨oger Institute for Computer Science, University of Munich, Germany {kailing,kriegel,kroegerp}@dbs.ifi.lmu.de academic according accurate acknowledgment acmsigmodint agarwal aggarwal agrawal algorithm algorithms anders andh andj ankerst applications approaches austrian automatic based baumgartner bioinformatics biolology bmbf botstein breunig broad brown called carlo cell cerevisiae certain choi classification clustering clusters comparative complex comprehensive concepts conclusion conf conference contrast cycle dash data databases density dimensional dimensionality discovering discovery disorders does education effectivity efficient eisen empirically ester european evaluation experimental fast favor feature filter finding fourth fund futcher gehrke generalized genes german global grand grant gunopulos high hitt hybridization icdm identification identify ieee industrial inproc input interesting interestingness international introduced iyer jones kailing kamber knowledge kriegel large learning less liebl machine management marini metabolic method methods microarray mining ministry molecular monte more most murali newborns noise oger olgem oller only onmanagement optics ordering outperforms paper parameter parameterless parts pkdd points practice press principles problem proc proceedings procopiuc projected projective promotion raghavan range ranking ranks recent references regulated rely research roscher saccharomyces sander scheuermann science selection settings sherlock showed shows sigmod solution space spatial spellman stable structure subspace subspaces supervised supported surfing techniques technology terms that their this threshold umit wanka weinberger which with work yeast zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420499abs.htm 85 SVM and Graphical Algorithms: a Cooperative Approach François Poulet ESIEA ­ P ECD BP 0339 53003 Laval Cedex - France poulet@esiea-ouest.fr aaai according accuracy accurate advances algorithms allows analysis appear applications approach automated automatic avidan avoiding based becker bengio between cambridge camp caragea classification classifications classifiers cleveland collobert comp comprehensibility computational computer conclusion conference cook cooperation cooperative coordinates cristianini data dataset datasets deal decision dedicated detection detective dimensional discovery displaying distance distribution dynamics easy ecml enterprise environment evaluate evaluation example explain fayyad feature figure fine finite finland first fourth frontier fullview fung future graphical graphics hammoudi have help helsinki here high histogram honavar icdm ieee image increase increased information infoviz inselberg institute interactive international introduction journal kernel kind kluwer knowledge lagrangian large last learning line linked machine machines madison mangasarian margin matrices melbourne method methods mining mixture models multidimensional nature needed neurocomputing newton nips obtained only optimization other parallel parameter perform piatetsky piattini pkdd plot poulet presented press problems proc proceedings quality reduce reduced references regression report result results rsvm same scale scatter science selection separating sept shapiro shawe simple smyth springer started statistical step supervised support surface svms systems tasks taylor technical that then theory this time tool tools towards tree tuning type understand univ university used useful user uthurusamy vapnik vector verlag very views visual visualization visualizations when wilks will wisconsin with wong work workshop york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420043abs.htm 5 Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle Institute for Computer Science, University of Munich, Germany {brecheisen, kriegel, pfeifle}@dbs.informatik.uni-muenchen.de accelerated access agrawal algorithm algorithms also ankerst approach baeza based benefit bounding braunm brecheisen breunig broad carried ciaccia cikm clustering clusters compared complex compulsory computations compute computing concept conference content correct dasfaa data databases dawak dealing demonstrate demonstrated density dimensional direct discovering distance distances dubes dynamic effective efficient efficiently ester evaluation exact experimental faloutsos feature filter fodo fonseca fourth from full further future gaede guttman hall high icde icdm identify ieee index indexing integration international jain join jorge kriegel large leads ller lower marroqu mashael means method methods metric mining multi multidimensional multiple nauer navarro noise nther object objects only optics ordering other paradigm patella performance pfeifle points prentice proceedings processing queries query range real references representations result retrieval sander scan schubert search searching seidl sequence sets sigmod significant similarity spaces spatial speed stage step structure structured supporting surveys swami table test that they traditional tree trees using vectors vldb voxelized well where will with work world yates zezula http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420371abs.htm 53 Extensible Markov Model1 Margaret H. Dunham and Yu Meng able accuracy acknowledge acknowledgements advanced advantage agency agents algorithms applied approach archive area aspect authors autonomous available ayewah behavior being bhat birch black briefly british clearly clustering compares compression computer conference coordinating cormack dani data databases dataset degree department depends designed detail detection dunham dynamic dynamics east edition efficient elements emms environment event examined experiment experiments expressions extensible facial flow forecaster found fourth future goldberg good gratefully gregory group grows hall have helpful higher horspool html http icdm ieee image index information input interaction international intro introduced introductory john journal laboratory large least leeds length level line linear livny local look maja margaret markov mataric method miller mining minnesota miron mndot mobile model modeling models more motion narayan nare nathaniel national nercwallingford network neural nodes north nrfa observed office ongoing other ouse pages parameterized performance predicted prediction prentice proceedings processes profile provided providing raghu ramakrishnan rare rate recognizing references regression reported research ridings river robot seattle sequences services shows sigmod simon size solution sons sophisticated spatiotemporal specifically states stochastic studied sublinear summarize summary system table take tdrl technique testing than that their third this thus tian time tool topics training transportation tuple typical using variable very vision washington water well where wiley will with work yacoob zhang zhigang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420083abs.htm 10 Fast and Exact Out-of-Core K-Means Clustering Anjan Goswami Ruoming Jin about accelerating accrue accurate adam advances algorithm algorithms analysis analyzed andrew anil annual application applied approaches approximate approximately approximating archive assoc attribute attributes august available badoiu bangalore basic believe bengio berkeley berkhin better between bottou bradley braverman callaghan categories categorization centers chandra chapter charikar charles chekuri cherikar classification clearly cluster clustering clusters column compare compromise computation computer computing concepts conclusions conference consider containing continuous convergence core correct cory could counting create created data database databases dataset datasets dayal difference different dimensions discovery disk domingos dubes dynamic ealbaum editor editors eighteenth either elkan engineering entire errors evaluated evaluation exact execution existed experimental experiments explorations extensively farnstrom favors fayyad feature feder fekm fifth final following fourth fredrik frequency frequent from further general geoff geometric getting ghosh guha hall handbook hartigan have high hulten icdm idea ieee images increased incremental india indyk information integer international jain james jiawei joydeep kamber kaufmann kelvin kept kmeans knowledge krithi labeled large lawrence learning leen leon leung lewis liadan like machine macqueen march massive mathematical means method methods meyerson micheline mihai million mining mishra moore morgan moses most motwani multivariate neural nificantly nina nittel none nong normalized noted number observations observed obtained ocallaghan ones original pages panigrahi passes pavel pedro pelad pelleg piotr practice prentice presented presents press probability problems proceedings processing produce produces properties provably publishers quality question rajeev ramamritham randomly real reasoning records reduce references regard reina report reported required requires resident resulting results retrieval reuters revisited richard rina running same sampled sampling sariel scalability scalable scaling selecting sets should shown shows sigkdd significant silvia similar size small society software some speeding speedup squared statistics stream streaming streams sudipto super supersampling survey symposium synthetic systems table technical techniques tesauro text that then theory these they this three thus times tomas touretzky transactions typically umeshwar usama used using valued values vectors view vijayaraman volume were what which while with wong words yoshua zero http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420443abs.htm 71 Predicting Density-Based Spatial Clusters Over Time Chih Lai Nga T. Nguyen Graduate Programs in Software Engineering University of St. Thomas St. Paul, MN 55125 clai@stthomas.edu ntnguyen1@stthomas.edu actions airborne algorithm algorithms also among approach appropriate areas associated based because between birch clai cluster clustering clusters computations concentrated concepts conclusion conf conference confirm conflicting contents coot coots cots current data database databases density densitybased detailed determining discarded disconnected discover discovering discovery each effective efficient environment ester european existing experiments filter filtered filtering first focus formulas fourth from further future guide have higher hour http icdm ieee important incremental information inserted international intersect intervalchecking into kamber kaufmann knowledge kriegel large livny long management method mining mobile more morgan most much needed neighborhood never next nguyen noise object occur occurs only other over pairs particular patent paths period periods persist personal places precisely precision predicting prediction predictions prepare present proc proceedings process propose provide ramakrishnan reduce references referred relationship relationships remaining respectively results reveal right sander segment sigmod simple snapshot space spatial specified spws stthomas study such system techniques than that their then this time toward uninteresting unnecessary unwanted used users utilizes vehicles very vldb want warehousing when where will wimmer window with zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420162abs.htm 20 Improving Text Classification using Local Latent Semantic Indexing Tao Liu Zheng Chen* Benyu Zhang* Wei-ying Ma* Gongyi Wu Nankai University, China * Microsoft Research Asia Nankai University, China liut@office.nankai.edu.cn *{zhengc,byzhang,wyma}@microsoft.com wgy@nankai.edu.cn acknowledgments actually algebra algorithms always analysis another approach assigned authors automated automatic background berry brien butterworths captures categorization chen cikm classifcation classification classifiers clustering combining comparative compared comparison computing concentrate conclusion conference crucial curve data deerwester descending developed different dimension dimensional discriminant discrimination discussions document documents done drops dumais ecml edition effective efficient evaluating evaluation exact examination experimental experiments feature features feedback found four fourth from fruitful furnas fuzzy gerard gives global greatly gunopulos harman harshman have hearst help higher hirsh hull icdm icml idea ieee important improve improving indexing information institute intelligent international introduced joachims journal keeping landauer latent learning lewis like linear local machine many meachines method methods mining modeling more most much nature nearby network neural noise optimize optimizing original paper pedersen performance phrases pirolli polytechnic predictors presence problem proc proceedings processing projections propose quite reduction references region relevance relevancy relevant report representation representations results retrieval review rijsbergen ringuette rocchio routing salton schutze science sdair sebastiani selection semantic separating several show siam sigir silverstein site slightly smaller smart smooth society space spotting springer stanford state statistical structure study succeeds suitable support surveys system task text than thank that theory therefore thesis this topic torkkola tracks trec university using utility vapnic vector verify virginia weigend weighted weights weiguo wensi which whole wiener with work would xerox yang zelikovitz http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420527abs.htm 92 On Ranking Refinements in the Step-by-step Searching through a Product Catalogue Nenad Stojanovic Institute AIFB, University of Karlsruhe, Germany nst@aifb.uni-karlsruhe.de acknowledgement after american approach based behaviour being bmbf buckley calculated calculation carries case characteristic coefficient comprehensive conclusion conference constraint constraints contain converges data defined destination distributed elementary expand failures feedback financed focus follows forgetfulness forgotten fourth frank from function further given have highly however icdm identity ieee impact implicit improving include incorporating infinite information informativeness international into inverse journal kaufmann lalmas learns less line logic matrix mining model moreover morgan nature navigating navigation need next note number ontology operation otherwise over paper partially past path performance preferences present presented press prioritises probability proceedings process project publisher queries query ranking reaching references refinement refinements regarding related relevance relevant requires research retrieval retrieves rijsbergen ruthven salton science search selfimprovement semiport several society sources starting step steps stojanovic stop successes such tailored target technology that then this through thus transitions traversing underlying uniformly user uses using usually very weight where witten http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420547abs.htm 97 A Greedy Algorithm for Selecting Models in Ensembles aaai aarts accuracy algorithm algorithms also analysis annealing annual appears applications applied approaches arbiter assignment automated avail averaging bagging base bayesian before between blake boosting both bowyer bradley breiman california called case chan classifications classifiers clustering combination combiner combining compared comparison computation computer computing conceptual conclusion conf conference constructing construction data databases datasets decision delegating department developing dietterich different discovery distributed distribution draper enhances ensemble ensembles estimates experimental experts fayyad fourth frank freund from future greedy hand handwritten heterogeneity heterogeneous hierarchical homogeneous html http huang icdm ieee implementations includes information initialization intelligence international interscience introduced irvine iterative jacobs java jordan kaufmann kegelmeyer knowledge kuncheva laarhoven labeling learning local machine majority mateo mechanisms menlo merz methods michalski mining mixtures mlearn mlrepository model modeling models morgan moving multiple natural neural norwell numerals numerical other outperform outperforms park partitioned partitions pattern portions practical prediction predictors press problems proc proceedings programs quinlan randomization recognition references refinement reflect reidel reina repository resulting scaling scenarios schapire schemas science selecting selection several simulated singer small software specialize springerverlag stepp stolfo suen symposium system taxonomy techniques that theory three tools traditional trans trees unconstrained university using versus voting waikato warmuth weak weighting weka when wiley with witten woods work york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420519abs.htm 90 Cluster Cores-based Clustering for High Dimensional Data algorithm attributes based categorical clustering clusters computers conf conference connectivity data densities different dimensional engineering ertoz finding fourth francisco graph guha hartuv high icdm ieee information international intl jarvis kumar letters measure mining nearest neighbors noisy patrick proc proceedings processing rastogi references robust rock shamir shapes shared shim siam similarity sizes steinbach transactions using http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420423abs.htm 66 Filling-in Missing Objects in Orders Toshihiro Kamishima Shotaro Akaho National Institute of Advanced Industrial Science and Technology (AIST) AIST Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568 Japan mail@kamishima.net (http://www.kamishima.net/) s.akaho@aist.go.jp ability according acknowledgments active advantage algorithms analysis analyzing apparently applied architecture arnold arose artificial assuming balakrishnan based baseline baselines becomes bergstrom best better between breese case central changes chapman characteristics clearly clickthrough collaborative common commonly comparative compared comparisons computer conclusions concordant conf conference cooperative correlation correlations could course data decrease default deviations differences disappear discovery effective emprical engines equal equation estimated estimation evaluated except expanded figure fill filtering first fixing fourth from grant grouplens hall have heckerman however iacovou icdm ieee illinois improved inappropriately indicates ineffective intelligence international itself japan joachims john judgment kadie kamishima knowledge larger least lengths list make marden meaning means measured measurement method methods mining missing modeling monographs more mosteller nagaraja nantonac netnews nonpersonalized note notion number objects observed open optimizing order orders original osgood other pages paired paper part performance personalized popular predictive preference preferences press probability proc proceedings promotion proposed provide provided psychological psychometrika quality rank ranked recommendation recommendations recommended references relative remarkably remarks resnick response responses results review riedl sample science search seller sets shared sharing shorter shows significant similarities simple since size sizes society solution sons sorted spearman specific squares standard statistically statistics such suchak suci superior supported tannenbaum than that then therefore this those thurstone treated true uncertainty university user users using volume wellpersonalized were when wiley with work worse would http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420305abs.htm 38 Dynamic Classifier Selection for Effective Mining from Noisy Data Streams Xingquan Zhu, Xindong Wu, and Ying Yang Department of Computer Science, University of Vermont, Burlington VT 05405, USA {xqzhu, xwu, yyang}@cs.uvm.edu ability accuracy accurate adopting adult algorithm algorithms almaden among amount analysis anil application approach arbitrating argamon artificial attribute base becomes been belmont best between blake bowyer breiman cardona carries challenged challenges chan characterized chen class classification classifier classifiers classifying classsyndata clear clustering clusters columbia combination combine combining commonly competing concept conclusions conf conference conflictive consequently constructed correspondence cross cyber data dataset datasets description determine disjoint domain domingos dramatic drift drifting dubes dynamic dynamical each efforts eliminating ensemble error especially estimation evaluate evaluation even evolving existing experiment experts explore extensible fact features fisher fourth friedman from fusion generator given gonz hall handwriting handwritten have helps high holte hopefully however html http huang huge hulten icdm icml ieee ignores immune impacts important improving incorporating indicates inductive information instance instances instead intelligence international into intuitive jain kaufmann kegelmeyer knowledge kolter koppel krzyak kucheva large learnability learned learning lenz linear local machine majority maloof mateo merit merz meta method methods mining model morgan most multiple mushroom nasraoui need networks neural noise noisy numerals olshen once optimal ortega others overall pami partition pazzani perform performance portion prentice presented press proc proceedings prof programs publisher quantitative quest quinlan raise recognition reduction referees references regression regressions reliable repository research resources results review rojas rules scalable schaffer schapire scheme selecting selection selects sigkdd significant similar simple software solution specific speed springer stacked statistics still stone stream streams strength study subset subsets suen suffers switching syndata synthetic system systems tecnostreams test that their these thesis this through time tracking traditional trans transactions trees ueda unconstrained underlying univ used using validation values verlag very volumes wang washington wdbc weak weighted well will with woods http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420114abs.htm 14 Dependencies between transcription factor binding sites: comparison between ICA, NMF, PLSA and frequent sets Heli Hiisila¨ and Ella Bingham Neural Networks Research Centre, Laboratory of Computer and Information Science Helsinki University of Technology, P.O. Box 5400, FIN-02015 HUT, Finland heli.hiisila@hut.fi, ella@iki.fi acids adnan algorithm algorithms analysis analyzing andre applied arep attributed august bacteria berkeley bioinformatics biology brazma build bussemaker centre charles collado comon component composite computation computational concept conf conference correlation data datasets derti detection dictionary dimensional discovery discrete effective eisen element elements episodes eskin event expression extracting factorization fast fastica feature february finding fixed fourth framework frequencies frequent fricke from geffers gene genes genetics genome genomes genomic grama gyllenberg harvard hehl helden helsinki heubrock high hirsimaki hofmann hornischer hsmmupstream http human hyvari hyvarinen icdm identification ieee implementation independent indexing international intl isbell john jonassen journal karas karhunen keles kloos knowledge koski koyuturk laan labgc land latent learning lewicki ling machine mannila margoulis matlab matrix matys method michael mining molecular mouse mrnas munch nature negative networks neural ninth nips nonin nonuniqueness november nucleic numerical objects october oligonucleotide package pages parts patterns pevzner plsa pnas point potapov predicting presumptive probabilistic probability proc proceedings processing profiles projects proximus references refseq region regulation regulatory reilink research restructuring retrieval reuter rotert saxel scale scheer selection semantic sequences seung siggia sigir sigkdd signal silico sites sons sparse statistical technology teemu thank thiele toivonen transcriptional transfac ukkonen university unsupervised upstream using verkamo verlaan very vides vilo viola wiley wingender with yeast http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420383abs.htm 56 A Machine Learning Approach to Improve Congestion Control over Wireless Computer Networks Pierre Geurts, Ibtissam El Khayat, Guy Leduc University of Liege,` Sart Tilman, B28 Liege` 4000 - Belgium {geurts, elkhayat, leduc}@montefiore.ulg.ac.be access acknowledgment algorithm also application approach areas avoidance bagging bandwidth been behavior belgian belgium berkeley bishop boosting breiman california causes classification classifier communication communications compares computational computer computers conference congestion considerations control data decision designing deterioration doctoral efficiency enhancement ernst european existing extremely favorably fawcett floyd fnrs forests fourth framework freund friedman friendliness generalization gerla geurts graphs have help icdm ieee improve improvement interest international issues journal jsac july khayat laboratory labs lawrence lbnl learning leduc liege liew line links literature loss machine macroscopic magazine mahdavi mahonen mathis mccanne mining motion network networking networks neural notes olsen over oxford packet pages partially pattern performance policy polyzos post practical predictors press proceedings program project proposed protocol random randomized recognition references regression report researcher researchers resulting review rules saaranen sanadidi schapire science second selected semke shows significant simulator stone study submitted supported symposium technical techniques that theoretic theory this thus tion tradeoffs traditional transmission tree trees university usage valla veno very wadsworth wang wehenkel westwood wired wireless work xylomenos http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420491abs.htm 83 MMSS: Multi-modal Story-oriented Video Summarization£ Jia-Yu Pan, Hyungjeong Yang, Christos Faloutsos Computer Science Department Carnegie Mellon University jypan, hjyang, christos @cs.cmu.edu achieves among anchors applications august auin autodocumentary automatic avoids because best better both callan carbonell carnegie characterization christel cikm combination computer conclusions conference content correlation creating creation cross cvpr data deployment digital discovery document does domain duygulu edwards effective encodes evaluating evolution experiments extract fact faloutsos february forsyth fourth from gives goldstein gong graph graphbased hauptmann iccv icdm ieee image information international introduction june kanade keyframes knowledge kraaij language learned learning lessons library linguistic link logos major management matches meaning meaningful mellon mentioned methods mining mittal mmss modal modality modeling moreover multi multimedia naacl namely news ninth november observation obtain october oriented over pages parameter pictures probability proceedings propose random ranking references relevance report restarts retrieval scene sentence shot shots show sigkdd skimming smeaton smith sometimes sophisticated specific stationary stories story summaries summarization teasers technical techniques terabyte terms textual that these those through tomatic towards tracking trec trecvid tuning understanding unfortunately university unlike used video vision wactlar well which white with word words work workshop yang zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420543abs.htm 96 DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases Alexandre Termier,£ Marie-Christine Rousset & Michele` Sebag CNRS & Universite´ Paris-Sud (LRI) - INRIA (Futurs) Building 490, Universite´ Paris-Sud, 91405 Orsay Cedex, France. termier, mcr, sebag @lri.fr acknowledge acknowledgments algorithm algorithms along also another applications approach apriori arikawa arimura arlington asai association available avril based bastide belgium borgelt both botta canonical charm chris closed cmtreeminer conference critical dags data databases dehaspe determine discovering discovery eclat efficient efficiently especially extend felkin fimi first forest forms fourth free frequent from further giordana graph graphs help helsinki hsiao http icdm icdt ichi ieee implementations inokuchi international israel itemset itemsets jerusalem journal kawasoe kilpelai kindly labels lakhal large learning leuven lines logic lyon machine made maebashi maloberti mary matching maximal melbourne mining motoda muntz nakano novembre number ongoing opens order other pakdd paradigm parameters pasquier pattern performed perspective perspectives phase phdtermieren pkdd problems proc proceedings promising publis references region relational report research rooted rousset rules saitta sakamoto same sapporo sciences search sebag semi several siam sidney sigkdd step structured studies substructure substructures subtrees taouil technical termier text thanks their thesis this towards transition tree treefinder trees types ucla university unordered using values warmr washio with work yang zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420273abs.htm 34 IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects access across advances after agglomerative annual april august available bartlett based beeferman berger between canada categorization chakrabarti chemnitz chen chien chuang class classi classification classifying clickthrough clustering cohn comparative comparisons conference connectivity content context contextual cortes craven data datasets decision describing development different discovering discovery document domains dumain each ecml editors edmonton effectively effectiveness eighth engine engineering england enhanced enriching european expansion experiments feature features flake fourteenth fourth friedman from future germany getoor glover grimmett hierarchical hofmann hong html http huang hyperlinks hypertext icdm icml ieee improve incrementally indyk information integrate interactive interesting intermediate international interrelated introducing issue iterative jasist joachims jplatt july june knowledge koller kong large lawrence learning likelihood link links logs machine machines management many margin measure method methods micro minimal mining missing model models multiple multitype myaeng networks neural objects olkopf optimization outputs oxford oyang page pages patterns pedersen pennock platt possible practical press probabilistic probability proceeding proceedings process processes processing queries query random real recom references regularities regularized reinforce reinforcement related relational relationship relationships relevant research results retrieval schuurmans search seattle selection sequential session show sigir sigkdd sigmod significantly similarity simrank sixth slattery smola soft stanford stirzaker structural structure study subject such suggestion support system systems taskar taxonomies tenth term terms test text that their through toronto transaction tsioutsiouliklis type types under university used user using vapnik vector volume wang washington whether wide widom will with world would yang zeng zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420059abs.htm 7 Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window acknowledgement adaptively addition advanced agarwal aggarwal agrawal algorithm algorithms also although applications approximate approximately arbitrary arikawa arimura arrived asai association automata average based bases bayardo because best better between both boundary catch chang change charikar charm chen cheung closed closet code colloquium colton computing conclusion conf conference contains content counts current dasfaa data database databases dataset decreases developed discover discovered discovery distributed efficient efficiently engineering enumeration even experimental exponentially farach fast fifth figure filtering finding form formance fourth frequency frequent from general generation giannella gouda grow have hidber hsiao icdm ieee incremental incrementally indiana info institute international intervals intl item items itemset itemsets journal kawasoe knowledge kohavi languages large linear long magnitudes maintain maintaining maintenance management manku mason maximal memory mines minimum mining mohammed moment monitors most motwani muntz newly nodes novel number online orders outperforms over paper parallel pattern patterns performance perin polytechnic prasad previous proc proceedings professor programming projection propose providing ratio real recent record references regression relative remains rensselaer report rest robertson rule rules running same samples scheme searching seen semi sets show shown shows siam sigkdd sigmod size sliding smaller source srikant state strategies stream streams structure structured studies suggests support synthetic systems technical technique temporal teng than thank that this time total transaction transactions tree twelfth university update updates updating usage used using very vldb wang webview when which window without wong world zaki zheng http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420539abs.htm 95 Sparse Kernel Least Squares Classifier Ping Sun School of Computer Science The University of Birmingham Birmingham, B15 2TT, U.K. P.Sun@cs.bham.ac.uk adaboost advantage advantages algorithm algorithms alkadhimi analysis apart applying artificial aspects based basis been benchmarks berlin billings bring cambridge cases cawley ccit characteristics chen chng choose choudhury classification classifiers compared compares competitive computational conclusions conference connectionist constructing control cost costs cowan cross data december demonstrated different discriminant doesn dual each editor efficient efficiently employed engel error errors esann european evaluation experimental extensively fast fifteenth file first fisher focus fold former fourth framework francisco from function future gammermann gaussian generalization grant greedy have having icdm icml ieee ijcnn ilar implement include includes international journal just kaufmann keane kernel kernels kopf lang large later learning learns least letters ller lukas machine manner mannor margins matrix meir memory mercer mika minimum mining model morgan nair networks neural nonlinear note obtained obvious obviously october only onoda optimal order original orthogonal other over pages parameters parsimious part performance present press proc proceedings processing produce proposed quite radial rank realizations records recursive recusrive reduced references regression regularized report requirement research results ridge saunders schol schools selected selecting sets shavlik similar size sizes sklsc small smola soft some sparse sparsification spent spiral squared squares state steps summer support suykens symposium table technical technion technology tell term test that these thesis time tlabot train training transactions unified university unknown used using validation vandewalle variables vector vectors were which while will witbrock with work zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420154abs.htm 19 Mining Associations by Linear Inequalities Tsay Young ('T. Y.') Lin Department of Computer Science San Jose State University San Jose, CA 95192, USA tylin@cs.sjsu.edu aaai about academic advanced agrawal algebraic analysis applied approach april artificial aspects association attribute attributes barr bell between binary birkhuser book boston brualdi china chongqing combinatorics complete completion computational computing conference congress current data database databases december deductive design discovery dunham duong engineering experiments fayad fayard feature feigenbaum foundation fourth frameworks france from gracia granular hall handbook hawaii honolulu icdm ieee image imielinski intelligence intelligent international introduction introductory isbn items japan jean journal july june kaufmann kluwer knowledge large lecture limit lnai louie machine maebashi management march margaret mathematical methods michel mining modeling molina montpellier morel motoda networks neural notes october ohsuga oriented overview paper pawlak piatetsky polkowski power prentice press proceeding proceedings processing prospect publishers reasoning references relational relations richard rough rsfdgrc rules segmentation selection semantics september sergio sets seven sigmod sjapiro skoworn smyth solimini springer subset swami system systems technical theoretical theory topics transformation trends ullman uthurusamy variational verlag washington willam windin with world zhong http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420170abs.htm 21 Dependency Networks for Relational Data Jennifer Neville, David Jensen Computer Science Department University of Massachusetts Amherst Amherst, MA 01003 {jneville|jensen}@cs.umass.edu aaai abbeel ability accuracy accurate acknowledge acknowledgments addition advances advantage afrl aggregating algorithm allow also analysis approach approaches approximation artificial assess assistance authors autocorrelation autocorrelations avoiding been besag better bias both building buntine carlo categorization cause ceiling chain chakrabarti characteristic chickering chosen class classification collaborative collective comments compare computer conclusions conditional conference consider constructs contract cornell craven data datasets degree dependencies dependency dept dipasquo discovering discovery discriminative discussion disparity distribution distributions domain domingos dynamic efficiency empirically engines enhanced evaluation existing expect experiments exploit exploring extract fast feature fellowship fields figure filtering form fourth freitag friedland friedman from future gallagher general getoor given graduate graphical have heckerman helpful here hyperlinks hypertext icdm ieee improve improved improvement indicate indyk inference instances intelligence international invaluable jensen joint journal kadie knowledge known koller labeling labels labs lafferty lattice learn learning linkage loiselle machine macskassy management markov mccallum mcgovern meek method methods mining mitchell model models monte multi national neal nearly networks neville nigam numbers observed offer order other over pages paper parameters parsimonious pereira performance pfeffer possible present presented press primary prior probabilistic probability procedures proceedings properties pseudolikelihood quality random range rattigan rdns real reason references related relational relatively rennie report research results retained rounthwaite rpts sanghai science search segmenting selection selective sequence seymore shapira showed sigkdd sigmod significant simple size slattery specific springer statistical statistician structure supported symbolic synthetic taskar technical techniques that this those toronto training trees ubiquitous uncertainty under university used using verlag visualization weld were when which wide will with work workshop world would http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420459abs.htm 75 Improving the Reliability of Decision Tree and Naive Bayes Learners accuracy accurate acknowledgements advances alternative analysis annals another applied argue artificial atkeson attribute based bayesian benefit both calibrated calibrating calibration cestnik class classification classifier classifiers clrc come comments comparision comparison conf conference cost could create crucial data dawid decision decrease degroot differing discovery discussion distinct distributions elkan empirical enabling england epsrc error estimates estimating estimation euro evaluation fawcett fayyad feinberg forecasters forecasting forecasts fourth francisco frank from funded gammerman generation grant gross holloway however icdm ieee implementations implemented imprecise improving increases incremental infeasible information intelligence interesting international investigate irani java journal kaufmann know knowing knowledge learner learners learning limitation lindsay locally london machine made meteorology mining moore morgan most murphy naive neural nouretdinov obtaining output outweighs over overlooked partition performance perspective plot possible practical practitioners prediction predictions presented probabilities probability probably problem proc proceedings processing provide provost rate references reliability reliable reveal review royal schaal score selection self shafer should significantly similar slight statistician statistics studentship successfully suppose systems task techniques thank that their therefore this tools tree trees trust under university user vector versions view visualisation visualising vovk weighted whether which will with witten would zadrozny http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420367abs.htm 52 Clustering on Demand for Multiple Data Streams Bi-Ru Dai, Jen-Wei Huang, Mi-Yen Yeh, and Ming-Syan Chen Department of Electrical Engineering National Taiwan University Taipei, Taiwan, ROC E-mail:mschen@cc.ee.ntu.edu.tw, {brdai, jwhuang, miyen}@arbor.ee.ntu.edu.tw according acknowledgement adaptive advantages aggarwal aggregate algorithm algorithms appropriate approximate approximations babcock babu based been between bucket bulut callaghan characterizing chen cluster clustering collection compact complex conclusions conference contract council counts coverage covered current data datar deal desired devised dobra domingos dynamic dynamically each efficient efficiently else encapsulate enough entries entry environment evolving exceeds fitting flexible focs fourth framework frequency from future garofalakis gehrke general generate guha hand have hierarchical hierarchies high hulten icde icdm ieee implications increase initially international interval into issues june keogh large latest level levels look lower maintain maintaining major manku maximum meaningless meyerson mining mishra model models most motwani multi multiple namely national networks number obtained offline ofpods ofvldb online order other over pages part pass past pattern performed phase possible precisely proc proceedings processing producing provided providing quality queries range rastogi references regression represents requirements research resolution resolutions results retrieve retrieved scan scheme science series sigkdd sigmod singh single size spencer statistics step stops stream streaming streams subsequences summaries summarization summary support supported swat systems taiwan temporal temporary teng that then these this time timechanging truppel types under various very vldb wang when where while widom window windows with work yang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420479abs.htm 80 Correlation Preserving Discretization accuracy achieve achieved aggarwal agnostic algorithm also among analysis approach around aspect association attribute attributes averaged based better bing bytes cases catlett caused changing charles charu chen chimerge christopher cisrc classification clear colt component compressed compressibility compression conceptual conference considering constrained continuous correlation cretization data datasets deal december deciding demonstrate difference different dimensional dimensionality discrete discretizaion discretization discretize discretizing dougherty easily effectively effectiveness efficient ensures european evaluate even extension factor fashion fayyad features floating four fourth framework frequent from gerhard high highest however hypotheses icdm icml ieee ijcai importance including incomplete information integrating inter interactive international interval into irani issues james jolliffe kaufmann keki kerber knowledge kohavi learning lowest ludl maass machine marcus massively mehran mehta method methods might minimum mining missing morgan most mueller multi multivariate national numbers numeric ordered over pages parthasarathy pkdd points practical prediction preprocessing preserving principal proceedings programs propose publications quinlan randy rather ready real reconstruction reduce references relative represent require requirements resulting results rule rules runs sage sahami sameep scheme schemes section session sets show shows simple simultaneously some springer srinivasan statistical step stephen storage store subramonian such supervised systems table tasks than that them this thus time tkde unsupervised usama used usually validate validates value valued values various venkata verlag visual webkdd well when wherease which widmer with wolfgang working wynne yang yiming http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420559abs.htm 100 AGILE: A General Approach to Detect Transitions in Evolving Data Streams access accuracy after agile alternative amnesia approach assessed automata average avoid aware based becoming been before behavior bejerano build capture changes characterize checking classification cluseq clustering compact compare comply computation computational conclusions conference consequence consumed cost course current data decision declines demonstrate detect detected detection detects develop different distinguishable dramatically during each effective efficient effort emission employed ends engineering entire equally even every evolving expensive experiment fails families first follow form found fourth framework from further future general good guarantees high icde icdm icml ieee immediate important incubation interesting international know last learning least length less list longer machine manages markov memory merchant method methods mining mixture model modeling month more objective observed obsolete once online over paper partially period potential power powerful predict probabilistic probability proc proceedings process processes produce promptly protein purchasing quality rebuilt recently recomb recommendation records references related report represented required requirement requires respond result save season second segmentation seldin sequence session shown singer sliding some sources stationary still storage stream streaming streams structure successfully suffix suggests switch symbol takes technical terms test than that third this though three tishby trace transition transitions tree trees underlying unknown unsupervised used useful usefulness user using variable very wang well when whether window with work would yang yona http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420523abs.htm 91 Metric Incremental Clustering of Nominal Data acquisition aime algorithm algorithms almost american amica applicable approach architectures area attributes barthel behavior blake both buffer butterworth california carpenter carrier cepadu character charikar chekuri chemical chosen clustering clusterings clusters combined complete computed computer computing conceptual conclusion conference connaisances contains cornuejo could created currently data databases dependency dept describe detection discrete discretization document dominant dynamic edbt editions ensembles environment essentially ester european examination example exception experience explored extraction feder first fisher five flynn follows fourth france from further future general gestion getting giraud good grossberg gunopulos hartigan having hierachical high homogeneity hospitals html http iberamia icdm ieee include incremental independence infectious information initial input instance international investigations irvine iterative jain john keogh knowledge kriegel langford large learning leclerc less machine magee marian math mathematical mathematics maximal median medicine merz metric metrics metriques mining mixed mlearn mlrepository monjardet more mostly motwani murty networks neural next nine nominal note objects obtain obtained order ordered orderings ordinal ordonnes organizing outbreaks pages parially partitioning partitions pattern performed permutation permutations previous procedure proceedings processing produced progress propriet providence provides quality random randomly rather recognition references relative remain remarques repository resulting retrieval review robust roure same sander sciences search self sensitivity series sets show shows similar simovici size snavely society special springer stability stable standard stoc strategy study such supervised survey surveys synthetic systems table talavera than that there they this threshold through time total toulouse transaction transmitters under university using value values various very vlachos vldb warehousing warranted which wiley will wimmer with work york zeros http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420355abs.htm 49 Scalable Multi-Relational Association Mining acad activity agrawal algorithms altos artificial aspects association baeza beaulieu bell blockeel clare compressing compression computer conference data databases declarative dehaspe department derived development discovery documents edition editors efficiency efficient elin engineering evaluation exploring farmer fast finland first forum fourth frequent functional genome gigabytes icdm ieee images improving indexes indexing inductive information intelligence international inverted jarv journal katholieke kaufmann king knowledge language languages large lazy learning leuven logic machine managing mining moffat morgan muggleton myaeng nijssen order packs padl pages pattern practical proc proceedings programming publishers query references relationships research retrieval rules scalable scholer science second sigir similarity space srikant srinivasan sternberg structure tampere thesis through transactions universiteit very vldb williams witten yates yeast yiannis zaki zobel http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420575abs.htm 104 Relational Peculiarity Oriented Data Mining Ning Zhong½¾¿, Muneaki Ohshima½¾¾ acid adcances algorithm also although amino analysis analyzed application applied area attribute based beijing brain building cameron certain cognitive computer computing concepts conference data database databases demonstrate design details discovery disposable dzeroski effective elsevier employee expected fact family fmri foil found fourth from generation granular high house huang human icdm ieee ifbothdisposable iffact ifincomeofstate images income incomeandincomeofthestate incomeandincomeofthestateowned inconclusive indicate induction inductive inter interesting interestingness international jones kamber kaufman kaufmann kerschberg knowledge lavrac limitation lnai logic manufacturing mathematics mechanism method mining mizuhara morgan multi multidatabase multiple murata nakamaru neighborhood notion obtained ohshima ohsuga omit oriented other ownedunitemployee ownedunitemployeeandotherincome pakdd peculiarity people perception pkdd potentially press price proc proceedings process programming programs proposed quinlan references related relational research respectively result results ribeiro rough rules science sets several since skowron soft space springer studying subgroups symbols systems tables target techniques technology that then this tkde tracking unit university useful using verify wrobel zhong http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420391abs.htm 58 SVD based Term Suggestion and Ranking System advertising also american analysis angles apparent arnoldi arpack artificial become becomes behavior berry bishop brand change clear close closer clus cluster clustering computations computers conclusions conference corresponds curve curves cutoff data dataset decline deerwester demonstrates developed display distances distant does drmac dumais editors effect eigenvalue embedding even examples external feedback figure flower fourth frey from furnas generality golub good guide harshman high histogram hopkins however huang icdm identifying ieee implicitly improving indexing information instrument instruments intelligence international into investigated january jessup john journal landauer large latent lehoucq levels likewise loan market marketing matrices matrix methods mining more musical natural negative ninth novel occur occurs original overture plateau positive precision press problems proceedings projected projection projections query recall references related relevance represents research restarted results retrieval review right scale science scores semantic siam similarity since society solutions sorensen sources spaces spectral statistics steep strategy subspace subspaces suggest suggests system take tering terms that then theorem there these this tool uniform unifying univ users using varying vector vectors were when whose with workshop yang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420315abs.htm 39 Using Emerging Patterns and Decision Trees in Rare-class Classification Hamad Alhammady and Kotagiri Ramamohanarao Department of Computer Science and Software Engineering The University of Melbourne, Australia Email: {hhammady, rao}@cs.mu.oz.au accuracy algorithm alhammady application asia australia blake butterworths california challenge cheng class classification classifier classifiers coil computer conference cost cross data databases department diego differences direct discovering discovery disease domingos dong efficient emerging epdt eprc essential euthyroid experiments explorations fold fourth frank general hatzis hayashi home html http hypothyroid icdm ieee implementations improving information insurance international intrusion irvine january java joshi jumping june kaufmann keogh knowledge krogel learning liacs library ling london machine making marketing mateo measure merz metacost method mining minnesota mlearn mlrepository models morgan morishita network november over pacific page pakdd patterns phenomena pnrule practical precision predicting problems proceedings putten quality ramamohanarao rare recall references report repository results retrieval rijsbergan ruiter sampling scan science sensitive sese sick sigkdd single solutions someren sydney table taipei taiwan techniques test thesis tools trends university validation weighted with witten york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420395abs.htm 59 The Anatomy of a Hierarchical Clustering Engine ability above academic access achieved achievements activities actually admit adopts after against aids aimed algorithm allergy also alternatives analysis anatomy applications architecture asked attardi authors automatic automatically based been being belonging benefits bill blocks blog books both bringing browsing build builds bush business carried carrot cases categories categorization categorizing checked chen ciirarchies claim classification cluster clustering clusters combining commerce compact comparable compare comparison complement computed computer conference consist construction contain contentbearing contents context contiguous create croft data days depth depths dexa didn different discovery divx document domains dong download drawing dumais during dynamic each easily electronic engine ephemeral equity ester etzioni evaluation exceed exceeds execute experimental extending extracted extracting extracts fagin fails fair fantasy fathers feature features ferragina final first firstlevel flat forms fourth framework free frequent from function fung further furthermore gates gather generating giannotti good google greedy grouper gulli have hearst hiearchical hierarchical hierarchies hierarchy highlight hill hits however hypothesis icdm identical ieee implementation improving informatica information instead interesting interface intermediate international introduction investigated investigating iraq issues itemsets itself jiang joshi knowledge krishnapuram kummamuru labeled labels language large larger last lawrie learning level lexical light like link linux list little maarek main managing many mathematical mcgill mcgraw meaningful minimization mining minute model modern module monothetic mooter more moreover most nanni nasa need networks news number objective obtains occurs often online open order other ours page part paths pedersen pedreschi pelleg people performance performed period pisa point polish possible preliminary probabilistic problem proceedings process produce produces proposed proposes provide provided provides quality queries query range ranked recursive reducing reexamining references refining remain rental repeat report reported research researchers response result results retrieval retriever returned running said salton satisfied scatter search searches sebastiani sebd second selected semantic sense sentences several shaul sigchi sigir significantly similar simple since single slower snaket snippets software softwares some sony source stated stefanowski students studies study studying substring subtrees such summaries summarization summary system technical tend tends terms test text than that their them themselves then these theseus they third this those three thus together tool topics under understanding unifying university used useful user users using usually variegate vivisimo wang warterlo webcat weiss were what whether which widm with words yitong zamir zeng zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420099abs.htm 12 A Bayesian Framework for Regularized SVM Parameter Estimation Jens Gregor and Zhenqiu Liu University of Tennessee Department of Computer Science Knoxville, TN 37996-3450, USA jgregor@cs.utk.edu, zliu@utk.edu advances agreement algorithm algorithms allows analysis approach artificial bayes bayesian before belgium bers beta blake bound bounds bousquet bruges california called chapelle chine choosing chung circle class classification classifiers cleveland close coincide computation computed computer conclusion conf conference constrained correctness cortes credit data databases datasets discovery discriminant does domain easy efficient error esann estimate estimated estimation european evaluated evidence experiment feature figure final find fisher five fourth framework from function fung galaxy gaussian gestel given graepel group guarantee gunn have heart herbrich heuristic html http hyperparameter hyperparameters icann icdm identify ieee illustrate image implement indians indicates information initial institute integrating intelligent international interpolation interpretation intl ionosphere irvine isis iterative keerthi kernel knowledge kwok lambrechts lanckriet learning least like listed machine machines mackay mangasarian margin marker merz method methods minimum mining mlearn mlrepository moor mukherjee multiple nabney nature netlab networks neural newton note obermayer optimality overall pages parameter parameters partment pattern perform pima practice presented probabilistic problem proc proceedings processes processing programming proposed public quadratic quite radius range rate rates recognition references regression regularized report repository research result results rule samples sciences screening seem selection sets several shown shows simple sollich solving southampton speech springer squares standard statistical step subset support suykens symp system systems table technical test that theory these this thus trans transduction tuning unconstrained university using value values vandewalle vapnik vector wang well while wide wisconsin with work http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420399abs.htm 60 Query-Driven Support Pattern Discovery for Classification Learning Yiqiu Han and Wai Lam Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin, Hong Kong about academic accurate achieves affected algorithm algorithms although amount association attribute attributes based behavior blake break capturing cardinality certain class classification cmar common conference considerable constraints constructs contains contrary data databases decision discover discovered discovery distribution eager editor effect efficient examined figure fourth frank from full global horizon html http icdm ieee implementations induction inductive inherently insert integrating international intersections into investigate java kaufmann keogh knowledge labor largest learned learning leaves logic machine merz mining missing mlearn mlrepository more morgan move muggleton multiple only outline outperforms pages pattern patterns performance possible practical press proceedings produce programming provide quinlan qusup reason references relating relational repository return returns rule rules satisfying simple small space such support techniques than that these this tools training tree trees useful using utilizes utilizing values with witten http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420289abs.htm 36 AVT-NBL: An Algorithm for Learning Compact and Accurate Naive Bayes ¨ Classifiers from Attribute Value Taxonomies and Data Jun Zhang and Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University Ames, Iowa 50011-1040, USA {jzhang, honavar}@cs.iastate.edu abstraction accurate acknowledgments addison aggregate aggregation algorithms ambiguous american analysis annual appear application applications approximation artificial ashburner association attribute automatica avtdtl bayes bayesian becker behaviors berners beyond bias biology bottleneck broad cambridge caragea centric chen chiu clare classes classification classifiers classify classifying clustering clusters colorful commerce compact computational computer concise conference consortium construction daml data databases decision description desjardins detection discovery distributed distributional document documents driven electronic engineering english evaluating evaluation experimental feature first foundation fourth framework friedman from further geiger gene generation genet getoor goldszmidt grants graphics group haussler health hendler heterogeneous hierarchically hierarchies hierarchy honavar hybrid icdm ieee imprecise improving induction inductive information instances institutes intelligence intelligent intelligible international intrusion intrusive issue journal kang king knowledge kohavi koller label langley language large lassila learning lecture levels linguistics machine mani mccallum mcclean meeting merz method mining mitchell modeling multi multiple national natural network notes odbase ontologies ontology operations over part partially pathak pazzani pereira phenotype press proceedings provost quantifying range references reformulation related report research review rissanen rosenfeld rules sahami scale science scientific scotney semantic semantically semantics sets shankle shapcott shortest shrinkage sigir silicon silvescu simple slonim sommerfield special specified statistics sufficient supported symposium systems target taxonomies tech tenth text thirty this thompson tishby tool trans transactions trees tseng uncertain undercoffer unification university using valiant value very visualization wesley word words yamazaki zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420003abs.htm 0 Detection of Significant Sets of Episodes in Event Sequences accumulated acknowledgment adopted algorithmica algorithms analysis annual apostolico approach approximated arbitrary asieniec atallah authors average bernoulli bibliographic billingsley boasson candidate case choice chris clifton closely combinatorial compact compactness computation computational compute computer conclusions conditional conference corasick crete data database detection dictated discovery dynamic efficient encouragements episode episodes event exact existence experiments figure flajolet fleischer florida formulas fourth frequency frequent frequentpattern full function generation grateful greece guessarian guivarc gunopulos gwadera hidden higher homes however http icalp icdm ieee implementation including information inserted international john karkk knowledge lecture life linear lncs mannila many markov markovian mart matching matiyasevich measure melbourne mining model notes number occurrences only order pages parallel pattern patterns pods presented probabilities probability problem proc proceedings prof programming purdue real realization recognizers references regni reliable remarks require respective science search sequels sequence sequences serial sets showed significance significant since statistics string subsequence suitability symposium szpankowski techniques that then third threshold through toivonen transactions tree upper using valid vallee valuable verkamo very wiley window without would york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420347abs.htm 47 Spam Filtering using a Markov Random Field Model with Variable Weighting Schemas abound accuracy acknowledgments algorithm algorithms analysis arnault assis attributes authors berlin besag bigrams binary breyer chhabra classifiers clifford clique combination combining conclusion conference controllable data define dependence derived determining direction discriminator discussions ecml extraction feature field filtering finite form fourth fruitful future generalized given graphs grateful hammersley hashing have html http icdm ieee improvements incremental inst interaction interesting international irrelevant journal laird lancaster lattice lattices learning linear littlestone lncs machine markov messages mining minsky mutilator neighborhood nick obtained only optimal orthogonal other pages pairwise papert past paulgraham percent perceptrons pkdd plan plateau polynomial potentials proceedings quickly recently references reflected regex research royal sbph schemas sequences series siefkes significant single size sizes society software sourceforge spam sparse spatial springer statistical subject superincreasing surrounding system systems table test threshold ties trainable university unpublished using variant varying verlag volume weighting weights when where window winnow with word words work yerazunis http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420194abs.htm 24 A Transaction-based Neighbourhood-driven Approach to Quantifying Interestingness of Association Rules aaai about according account accounted action actionable addition adomavicius along also alternative analyzing antecedent anticipation appealing apply approach asia aspects association associations assortment august based basket baskets been beyond boston both brijs brin building canada chen coefficient compare compares complementarity component computed computer conference consequent consider considered considers conviction correlated correlations counting cricket data database databases datasets decision decisions defined degree department dependence dependent directionality discovered discovery distance dong dynamic engineering european evaluating examine existence expectations explorations fashion favourably feasibility finding flexible fourth framework freiburg freitas function future fuzzy general generalizing hamilton hand have hierarchy hilderman hsieh icdm identifying ieee implication improve information intelligent interaction interest interesting interestingness international into introduced intuitively item items itemset jaroszewicz june knowledge kumar learning life like lnai machine makes making management market measure measures method mined mining more motwani mutual natarajan negative negatively neighborhood neighbourhood neighbours objective occur occurring omiecinski optimization other others pacific padmanabhan pair part patterns perspective pkdd postprocessing presented principles proceedings product propose ranking real references regina relatedness relationships report rice right roddick rule rules sahar science second seen selecting september sets shekar sigkdd sigmod silberschatz silverstein simovici simultaneous springer srivastava statistically subjective substitutability substitute substitution suggested support survey swinnen symmetric systems takes technical teng terms that them theory third thresholds thus transactions treats tsur tuzhilin ullman unexpectedness university unlike user using values vanhoof verlag were wets what with work workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420323abs.htm 41 Attribute Measurement Policies for Time and Cost Sensitive Classification Andrew Arnt and Shlomo Zilberstein Department of Computer Science University of Massachusetts at Amherst Amherst, MA 01003 [arnt,shlomo]@cs.umass.edu abnormally above account action admissibility algorithms allan allows also amount applications approximated arnt arriving artificial attribute attributes augmented augmenting available average better bottom call cases chakrabarti choosing classification classified classifying combined component composition compounded conclusions conf conference consider considered cost costs current data dealing deciding delay dependent desarkar desirable detail dietterich during dynamic each effective effectively estimated examining existing expanding expected experience explore explored figure focused fourth fraction from function further future generated ghose going good greedy handle have heuristic heuristics horvitz icdm ieee improved improves inadmissible incremental incurred indeed information instances instead intelligence intelligent international interval intl introduced journal large learn learning loss machine make many measure measurement memory mining misclassification model more mouaddib must necessary need nodes normally note number often opportunity other over overestimate pages past penalties policies potential prematurely probability problem proc procedures proceedings process processing property prune pruning quantity queue quickly refer references remaining responsiveness resulting retrieval risk rutledge search searches sensitive sequences settings show shown significant similar simple single solely space starting state statistical stream such suppose system systems task tasks techniques terminate terminates testing than that there therefore these they this those time training transition transitioning uncertainty under unexplored units unnecessary used using utility value varying vector waiting when where which while will with work zilberstein zubek http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420225abs.htm 28 Analysis of Consensus Partition in Cluster Ensemble accumulation accuracy again algorithms altincay analysis another applications approach approaches argument assignment assuming bagging bartlett based because better between bioinformatics both buhmann burns cases classifier cluster clustering clusterings clusters combination combining comparing complements components computational computer computing conclusion conference consensus consistent consists convergence converges coordinates correct corresponds cramer current data december decreases decreasing define dembo demirekler denote dependent determined deviations difficult dimensional dubes dudoit each empirical ensemble ensembles error estimate estimating event evidence existing expert exponential exponentially fast finding first fischer fourth framework fred fridlyand ghosh give gives goes graphs guaranteed hall have high however icdm ieee improve incorrect increasing indeed independent infinity information infx inner intelligence interest international invoke irregular jain jones journal karypis knowledge kumar large learning letters literature lncs long machine measure meila metric minimum mining mixture model more multi multilevel multiple munkres negative number numbers objects obtain once pages paper partition partitioning partitions path pattern performance plurality positive possibility prentice presented probability problems proc procedure proceedings product proof prove provides publishers punch quality random rate real recognition references research respect result reuse rigorous robust scheme scientific second sets shown siam simske sixteenth smallest society solutions space spect springer states statistically strehl subset such systems techniques than that theorem theoretical theory there this topchy transactions transportation true typical unique using utility values variance variation vector version very vision volume voting weak when while with xinfa yacoub zeitouni zero http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420257abs.htm 32 A Probabilistic Approach for Adapting Information Extraction Wrappers and Discovering New Attributes Tak-Lam Wong and Wai Lam Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong, Shatin Hong Kong fg@se.cuhk.edu.hk wongtl,wlam achieves adaptation adapting agent agents algorithm ambite application applications approach artificial attribute attributes automatic autonomous average bagnell barish bayesian blei both chinese classification cohen comparison computer conference crescenzi data databases department dependent developed discovering discovery documents doorenbos eighteenth eleventh employ engineering etzioni execution experimental experiments extract extracting extraction features february first flexible forteenth fourth fragments framework from getting headers here heuristics hong html hurst icdm ieee independent information innovative intelligence interactive international invariant jensen knoblock knowledge kong large learning lists management mccallum mecca merialdo mining minton model models muslea networks newp newr optimizing page pages percentages performance planning precision probabilistic proceedings promising real recall refer references refers report respectively results roadrunner scalable scope shopping show siam site sites some system systems table tables technical technique text that their there towards travel twelfth uncertainty university unseen very vldb weld wide with wong world worlds wrapper wrappers wrapping http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420411abs.htm 63 Feature Selection via Supervised Model Construction Y Huang, PJ. McCullagh, ND. Black School of Computing and Mathematics, Faculty of Engineering, University of Ulster, Jordanstown, BT37 0QB, Northern Ireland, UK E-mail: yhuang@infj.ulst.ac.uk, {pj.mccullagh, nd.black}@ulst.ac.uk aaai academic accuracy advances algorithms analysis applications approach artificial attributes available bauer been belanche bell benchmark blake california classification classifier common comp computer computers conference continuous coopis data databases decision department discovery discretization electr encouraging estimating estimator european evaluation experimental extension fayyad feature features fourth from fssmc full greer highways icdm ieee induction information intelligence international irvine issek kauderer kluwer knowledge kononenko learning machine mathematical mathematics measures merz methods mining mixed mlearn modelbased molina most mucha nebot odbase overview piatesky practical preserve press proceeding proceedings promise proved publishers qualitative references relief relieff repository results robnik saliency science selected selection sets seventh shapiro show simec smytth springer statistical steppe study supervied survey this trees university using variables verlag wang while with workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420415abs.htm 64 Mining Generalized Substructures from a Set of Labeled Graphs Akihiro Inokuchi Tokyo Research Laboratory, IBM Japan 1623-14, Shimo-tsuruma, Yamato, Kanagawa, 242-8502, Japan inokuchi@jp.ibm.com agrawal algorithm apriori arikawa arimura asai association based bases baskets beyond brin cient ciently conf conference connected correlations data discovery effi european experiment fast figure forest fourth frequent from generalized generalizing graph graphs gspan icdm ieee inokuchi international karypis kawasoe knowledge kuramochi labeled large management market mining motoda motwani nishimura pattern principles proc proceedings references report research rules sakamoto semi siam sigmod silverstein srikant structured subgraph subgraphs substructure substructures taxonomy trees used very washio zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420091abs.htm 11 Mining Frequent Itemsets from Secondary Memory Gosta Grahne and Jianfei Zhu ¨ Concordia University, Montreal, Canada {grahne, j zhu}@cs.concordia.ca access accurate advances agarwal aggarwal agrawal algorithm algorithms almost also analyzed approach apriori association based between candidate chiang closed conference conquer cubes data databases datastructures depth detailed diffsets dimensional discovery disk divide efficient efficiently episodes estimation event experimental experiments exploring extensions fast fimi first fourth frequent from fully future gave generation goethals gouda grahne guided icde icdm ieee imielinski implementations include indeed instance international introduced introduction items itemset itemsets kamber knowledge large layout limited long magnitude main mannila many maximal memory metarule methods mining multi navathe novel number numerous omiecinski orders outperforms pages pattern patterns prasad prefix proceedings prodeeding recurrences reduces references results rithms rules sampling savasere saves scales secondary sequences sequential sets show sigkdd sigmod since situations some sometimes srikant successfully swami techniques terabytes that then there this toivonen tree trees used uses using validate various verkamo vertical vldb washington well where which will without work workshop zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420551abs.htm 98 Mining web data to create online navigation recommendations Juan D. Velasquez1 , Alejandro Bassi2 , Hiroshi Yasuda1 and Terumasa Aoki1 University of Tokyo, {jvelasqu,yasuda,aoki}@mpeg.rcast.u-tokyo.ac.jp academic acceptance accepted adaptive after aoki appear apply approximate archive automatic based been behavior beijing berlin between bezdek brusilovsky business cadoli cases china clicks clustering communications compilation complete complex conf conference considered considers contains content cooley created creating creation data dealing directions discovered donini effectiveness engineering examples exchange expert extraction february figure follow fourth from furthermore future have high icdm ieee ieice indexing information intelligence interface international introduced issues journal just knowledge koutroumbas leave links logs many measure mendations mentioned methodology mining mobasher model more navigation nielsen normal november october online operation page pages pattern patterns percentage possible press proceedings processing procs proposed rate real reasoning recognition recommendations references relational remaining results rules runkler salton september should show significant similarity since site sites slightly space special springerverlag srivastava structure successful suggested suggestion survey system systems technologies technology test tested than that them theodoridis thieee this through transactions tutorial understand urls usage user using utilization vector velasquez very visitor visitors want weber were with wong work would yang yasuda zhong http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420297abs.htm 37 Cost-Guided Class Noise Handling for Effective Cost-Sensitive Learning Xingquan Zhu and Xindong Wu Department of Computer Science, University of Vermont, Burlington VT 05405, USA {xqzhu, xwu}@cs.uvm.edu ablex acquisition analysis applications applied archive artificial attribute based bibliographies bibliography blake breiman brooks brunk bureau chan chen class classification classifiers conference connection corp cost costs costsensitive data databases datasets dependent dietterich digits distributions domain domingos duda eliminating example experiments filtering fourth friedman from gamberger geibel general groselj hart hettich heuristic html http hume icdm icml ieee impacts improves info intelligence international iterative kaufmann knowledge langford large lavrac learning machine making mateo mathematics medical merz metacost method mining misclassification moller morgan multi murphy national neumann neural noise noisy olshen pattern pazzani perceptron peter press proc proceedings programs proportionate pruning pulishing purl quantitative quinlan random reducing redundant regression repository review robotics rulequest scalable scene search sensitive series sets standards stolfo stone study supervised systems techniques their toward training trees turney uniform used various wadsworth weighting wiley with wysotzki york zadrozny zubek http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420471abs.htm 78 Finding Constrained Frequent Episodes Using Minimal Occurrences Xi MA1 HweeHwa PANG2 Kian-Lee TAN1 ability achieves agrawal algorithm algorithms allow also apriori assocgen based because better between both candidates cannot caused chen cikm collection conclude conclusion conducted conference considerably consistent constraint constraints constraintssupport cope data database databases datasets decreases decreasing default define degrade depth derived despite different discovering documents during edbt effect efficient efficiently entire episode episodes especially evaluate event events execution existing experiments expression extend faster figure figures flexible fourth frequent from future gains garofalakis generalizations generalized generate generated generates generator good growing growth handle hash have here icde icdm ieee improvements included inclusive increasing indicates inefficiency international into introduce labelled lack large lavg learning least length less like limit limitations long longer machine mannila many margin memory methodology metric mine minepi minimal minimum mining modified more much number observed occupies occurrences only outperforms over overcome pairs paper parameter parameters pattern patterns performance performs position prefix prefixes prefixspan primary proceedings process projected proposed push rastogi ratio real realdata references regular related report required requires restricting result results runtime safely satisfy satisfying scan search selective selectivity sequence sequences sequential series short show shown shows significant similar single slice sliding slower space spade span spirit srikant ssequence strategy streams study such summarize support synthetic systematic table text than that their this those threshold thresholds time times toivonen total toward tree types unlike using utilizing various varying verkamo version vldb wang well when whereas which while will window with work worst zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420511abs.htm 88 Evaluating Attraction in Spatial Point Patterns with an Application in the Field of Cultural History Marko Salmenkivi Helsinki Institute for Information Technology, Basic Research Unit P.O.Box 68, FIN­University of Helsinki, Finland marko.salmenkivi@cs.helsinki.fi account advances aims algorithm another appear applied applying autocorrelation based beach cavtat cheung class clusters collocation collocations computational computing conf conference confident cressie croatia data databases density discovering discovery dubrovnic effect efficient ester estimation european evaluating fast finding finnish florida folk fourth francisco frequent future goal hiisi huang hypothesis icdm ieee interestingness international into knowledge kriegel large leino location mamoulis mannila measure melbourne methods mining modeling morimoto neighboring noise onomastic oregon pages patterns pitkanen plan portland possible practice principles probabilistic proc proceedings proposed pulkkinen redondo references religion results rule rules salmenkivi sander seattle sets shekhar shou sigkdd simple spatial statistics study summary support symposium take technique temenos temporal testing threshold washington what wiley with without xiong york zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420319abs.htm 40 Discovery of Functional Relationships in Multi-relational Data using Inductive Logic Programming Alexessander Alves, Rui Camacho and Eugenio Oliveira LIACC, Rua do Campo Alegre, 823, 4150 Porto, Portugal FEUP, Rua Dr Roberto Frias, 4200-465 Porto, Portugal alves@ieee.org {rcamacho,eco}@fe.up.pt tel: +351 22 508 1849 fax: +351 22 508 1443 accuracy activation algorithms also alves analysis anderson approach bain based blockeel brockwell burnham camacho capable cliffs clustering commun compression conference control customised data davis delay discovered domains down each econometrics edition editor encompassing englewood evaluation forecasting fourth from functions general hall hoover human icdm icml ieee imperfect inducing induction inductive inference international journal july kauffman kaufmann lazy learnable learning like logic machine machines methods mining model models morgan muggleton multi multimodel networks neural numerical oliveira pages perez porto positive possibility prentice proceedings prog programming raedt ramon reasoning reconsidered references regression reinsel relational report rule search selection series shavlik significance skills sleeman specific specification springer srinivasan support switching system technical theory thesis time trees universidade university using valiant varying vector with workshop york zelezny http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420531abs.htm 93 Learning Conditional Independence Tree for Ranking Jiang Su and Harry Zhang Faculty of Computer Science, University of New Brunswick P.O. Box 4400, Fredericton, NB, Canada E3B 5A3 hzhang@unb.ca aaai accuracy against algorithm algorithms analysis appear area based bayes better brunk california case citree class classification classifier classifiers column compared comparing comparison conditional conference corresponding cost costs curve data databases datasets decision dept diagnostic discovery distribution domingos entry estimation european experimental fawcett fifteenth fourth frank generalisation hand html http hume hybrid icdm ieee implementation imprecise independence induction international irvine java kaufmann knowledge kohavi learning ling loses machine means measuring merz mining misclassification mlearn mlrepository morgan multiple murphy naive pages pazzani performance practical press probability problems proceedings provost ranking reducing references repository results scaling science second simple springer summary swets systems table techniques that third ties till tools tree trees under university visualization wins with witten zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420265abs.htm 33 Aligning Boundary in Kernel Space for Learning Imbalanced Dataset Gang Wu & Edward Y. Chang Department of Electrical & Computer Engineering University of California, Santa Barbara 93106 {gwu@engineering, echang@ece}.ucsb.edu academic adaptive addressing advances algorithms alignment amari annotation application approach area artifical artificial aspects august bagging bakiri based bayes berkeley beyond biased boston boundary bowyer bradley breiman burges calif cambridge campbell categorization chan chang chawla circuits class classification classifier classifiers codes combnet computer conceptual conference conformal content controlling correcting cost cristianini curse curve data dataset department description detection dietterich discovery distributed distribution distributions dynamical ecml edition effect effects elements elisseeff empirical error european evaluation fawcett feature features figure forecasting fourteenth fourth fraud friedman fukunaga functions fusion gangwu geometry hall hastie http icdm ieee ieice image imbalanced improving inference information intelligence international introduction invariance issue iwata january jiao joachims joint journal july kandola karakoulas kegelmeyer kernel kernels knowledge kubat kuhn kuroyanagi learning linear machine machines many mathematical matwin methods mining minority mmdb modifying multi multicamera multiclass multimedia multimodal networks neural ninth nonstandard notes november nugroho optimization optimizing output over pages papers pattern point prediction predictors press probability problem problems proc proceedings processing programming provost recognition references refining regression regularization relevant report research retrieval rutgers sampling scholkopf science security selection sensitivity sequence sets shawe sided situations smola smote soft solution solving space spatio special specificity springer statistical statistics stolfo study support surveillance sychay synthetic syrup systems target taylor technical technique technology temporal tenth text tibshirani training trajectory transaction transactions transformation tributed tucker twentieth ucsb under uneven uniform univ university using vector veropoulos video wahba wang weiss with workshop york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420507abs.htm 87 Quantitative Association Rules Based on Half-Spaces: An Optimization Approach Ulrich Ruckert, Lothar Richter, and Stefan Kramer ¨ Technische Universitat Munchen ¨ ¨ Institut fur Informatik/I12 ¨ Boltzmannstr. 3, D-85748 Garching b. Munchen, Germany ¨ {rueckert, richter, kramer}@in.tum.de actual agrawal algorithm applied approach association associations attributes aumann based below blake cartographic cell cells column compendium complexity conference containing continuous cover data databases depends describing deviation different discovering discovery discrete each editors experiments expression favorably fourth from function functional give half hughes icdm ieee imented informatik information instances institut intelligent international jagadish journal july knowledge kramer land large learning leaving lindell line machine management mean meersman merz meter mining mumick munchen normalized number numeric optimization pages parameter pentium performed press proc proceedings profiles properties quantitative random references relational remains removed report repository restarts richter ruck rules runtime runtimes scales search searches shows sigkdd sigmod size sizes spaces srikant standard statistical step steps subsets systems table tables technical that then theory therefore thirty total type variables varies various webb were wijsen with zero http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420178abs.htm 22 Hybrid pre-query term expansion using Latent Semantic Analysis Laurence A. F. Park Kotagiri Ramamohanarao ARC Centre for Perceptive and Intelligent Machines Department of Computer Science The University of Melbourne {lapark,rao}@cs.mu.oz.au able allocation also analysis annual apply approaches average based behaviour benefits blei buckley building cient component computers concept conclusion conference construct data davis decomposition development dirichlet documents does dumais editor effi expanded experimental experiments external fast files findings found fourth fraction from gaithersburg greater harman have high hofmann hyvarinen icdm ieee improving include including incorporates independent indexing information institute instruments international interscience introduced inverted january jordan journal karhunen keep latent learning load machine maintain management mapped mapping maps march mean memory method methods mining mitra moffat more national november obtain only pages pentium performed precision press probabilistic proceedings processing produce producing pruned publication query querying ranking receive references research resulting results retrieval sacks salton same seconds self semantic shown sigir significant singhal singular smart sources space special standards still system systems takes technology terms text that these third this time tois transactions trec users using value values vector when which while wiley with zobel http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420515abs.htm 89 Efficient Relationship Pattern Mining using Multi-relational Iceberg-cubes Dawit Yimam Seid, Sharad Mehrotra Department of Computer Science University of California, Irvine dseid,sharad @ics.uci.edu ability abstract achieves acknowledgments across again against aggregates aggregation although applies approach apriori association attribute attributes average avoid based believe beyer borgelt both bottom calif case challenges citeseer close common companions compared compares computation conf conference counting cube cubes cubing data dehaspe discovery dissemination domingos dzeroski eclat effect efficient explodes explorations extended farmer fast figure fimi finally first found foundation fourth frequent from getoor grant growth haas homepagesearch hpsearch http icdm iceberg ieee implementations increase increases inductive international intl irvine item itemset jensen join joined joins knobbe knowledge krogel lavrac learning length light links logic lower maximum mehrotra melbourne mine mining missing models mricube much multi multiple multirelational national nijssen number order outer pages pakdd path pattern patterns performance performs possess probabilistic proc proceedings program programming propositionalisation prospects pruning query raedt ramakrishnan rate redundant references relational relationship report research resulting rules science seid series shades show showed shows siebes sigkdd sigmod single slower soparkar sparse springer stars starting structure summary supp supported table tables target technical technique that then thenmine this three time toivonen transformation trier under univ used useful using value verlag versus wang while with workshop wrobel http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420555abs.htm 99 Alpha Galois Lattices V´eronique Ventos L.R.I., UMR-CNRS 8623, Universite´ Paris-Sud, 91405 Orsay, France Henry Soldano L.I.P.N, UMR-CNRS 7030, Universite´ Paris-Nord, 93430 Villetaneuse, France Thibaut Lamadon L.R.I., UMR-CNRS 8623, Universite´ Paris-Sud, 91405 Orsay, France about above acknowledgments adapt aimed algebraic algorithms alpha alphagalois also american analysis another approximate artificial association based basic basis bastide believe belong besides between birkhoff built canonical class closed clustering colloquium comparing complex computing concept conceptual conclusion conf conference connections consequence construct construction contrary contribution correspond corresponding dague data databases datamining dealing degree denoted discovery does draft earlier ecml efficient engineering even exceptions experiment experimental expresses extent extents extracted family flexible formal formalization formally fortunate foundations fourth frequent from functions fundamenta galois ganascia ganter generating graphs handle have here however iccs icdm iceberg icml ieee implication inclusion indiscernibility individual individuals informaticae information inherit instance intelligence intent intents interesting international intl investigate investigates island issues itemset itemsets joint kaufmann knowledge kuznetsov lakhal languages large latter lattice lattices learning liquiere lncs logical longer machine many mathematical membership mentioned mining modifies modifying more morgan much nathalie nested nodes note notion obiedkov obtain opportunity pages paper particular partition partitioned partitioning pasquier patient pattern pawlak performance pernelle philippe practical preliminary prerequisite presented priori proceedings projections properties publications reading redundant references referred related relations relative represent respect restricting results rhode rough rousset rules sallantin same sets should shown simply society soldano some springer still structural structure structures stumme such system systems taouil tdis term terms than thanks that their then theoretical theoritical theory there they this those titanic tool transitivity unusual used using usually valuable ventos verlag very view volume waiyamai when which wille with work zaki zoom http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420106abs.htm 13 Unimodal Segmentation of Sequences Niina Haiminen and Aristides Gionis Helsinki Institute for Information Technology, BRU Department of Computer Science University of Helsinki, Finland first.lastname@cs.helsinki.fi acknowledgments additionally again algorithm algorithmica algorithms also alternative annal annals another applied approximation archive average ayer based behaves bellman between binary bovik brunk centers cient class communications computation computing conclusions conference constraints curves data datasets described design devise differ discrete discussions distinguish distinguishing distribution does dykstra dynamic eamonn effi empirical error errors evidence evimaria ewing examine exists experimentally experiments figure fourth frisen from function geng gives good greedy guarantee guha hand hardwick have heikki histograms hold http icdm ieee incomplete indicate inference information interesting international into isotonic kari kaufman keogh known koudas laasonen larger letters line locally locating mannila many mathematical mathematics measure median mining monotonic monotonicity more norms obtain open optimal optimality optimizing optimum order orderings original other pages pardalos particular perform performed permutation permutations pieces practice precedence presented press principles problem problems proceedings processing produced programming proved provided quality random real references regression reid remaining response restrepo restricted results robertson same sampling science second seems segmentation segmenting segments sequence sequences series service shim shown signal significantly smyth some standard statistical statistician statistics stem stout streams suggested suggestions symposium tamir terzi test tests thank that theorem theory there they those threshold time transactions tsdma types umbrella unimodal unimodality unimodally used useful using value variables verified very well which with wright yong http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420138abs.htm 17 Semi-Supervised Mixture-of-Experts Classification aaai acting adaptive addition additional advances advantage also although american analysis analytics analyzing annual applies apply approaches area artificial association assumptions based bayesian behavior behaviour benefit bias biasvariance bienenstock biological blake blum both canada cases causes center cibc cirelo class classi classification classifier cohen colt combining compared complex complexity components computation computational conference correctness counteracts cozman craven customer data databases datasets december deciding decomposition decompositions degrade degrades depending depends dilemma dipasquo documents doursat econometrica effect effects empirical error estimators evaluation even example exhibits experts explain extend extract fication findings first fourth freitag from functions future geman geoscience ghahramani heavily heckman help heskes hinton html http hughes icdm ieee improves including incomplete incorrect increasing inferring information intelligence international introduce jacobs jordan karakoulas kaufmann knowledge kohavi labeled labelled labels landgrebe largest learner learners learning least less likelihoodbased local loss machine madison many mccallum merz miller mining missing mitchell mitigating mixture mixtures mlearn mlrepository model models more morgan most much multi networks neural nigam nowlan number oles ones other overfit overfitting particularly performance phenomenon plus pointing popular potentially press preventing probability problem problems proceedings processing reducing references regularization regularizer relative remote report repository robust salakhutdinov same sample samples second seem seems seen select selection semi semisupervised sensing shahshahani showed size slattery small space specification streams suggest supervised symbolic systems take technical techniques term text than that theory there they this three thrun thus time toronto training transactions underlying underway unlabeled unlabelled used using uyar value variance variants washington when where which wide with wolpert work world worth york zero zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420467abs.htm 77 An Adaptive Density-Based Clustering Algorithm for Spatial Database with Noise Daoying Ma and Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Email: {daoma,azhang}@cse.buffalo.edu access accuracy accurate acmsigmod adaptive adjusted advances advantage affect affection algorithm algorithms analysis ankerst applications approach arbitrary artech automatically avoid axis based because beckmann been berkeley better birch blackman blair breuning cause caused certain chain chameleon choose clarans classification classified cluster clustering clusters collins compare computer conclusions conf conference considers correctly cover cure data database databases dbscan demonstrated density densitybased design detect detecting different discovering discovery distribution dramatically dynamic efficient efficiently engineering enough especially ester even examine example experimental fail fifth figure finding first flexibility formation formations formed forming fourth gain global groups guha have hierarchical hinneburg hofmann house however icdm identify ieee improve improvement include individual instead international into introduction john karypis kaufman keim knowledge kriegel kumar large larger last least lichtenegger likely lineal linear livny make management mathematical mcqueen measurement method methods military mining modeling modern more most much multimedia multisensor multitarget multivariat must neighbor neighboring neighbors noise norwood novel object objects observations observe obtain obviously only optics order ordering organized pages paper parameters partitioning percentage performance points popoli positioning possible practice probability proc proceedings proposed proved provide quality ramakrishnan rang range rastogi rectangles reduce references reflect reflects region results robust rousseeuw same sander schneider second seeger select selection serval sets shalom shim shown sigmod significant similar simulated since size small smaller some spatial springerverlag statistics still strategy structure study symposium system sysytem tactical than that theory there therefore this those thus time tougher tracking trans travel tree used users uses using valid very void volume want well wellenhof when wider wien wiley with york zhang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420343abs.htm 46 Incremental Mining of Frequent XML Query Patterns above acharya algorithm almost also based branch caching call candidates child clark computation conclusion conf conference consider consortium containment countsupp cutting dasfaa data database define derive derose descendant designed details discussion dist distinguished doesn dunham each efficient equivalence evaluate exist experiments follows formed fourth fragment frequent from function gains gerome getocc given have here however icdm ieee implement incremental incrementally increqpminer international investigate language leaf leftmost lemma limitation linearly list maximal method miklau mining more multi next node nodes number obtain occurrences operations other paper path patterns performance performed pods proc proceedings propose queries query recommendation recxpath references report respectively results rightmost rsts running scales sets show space substantial subtrees such suciu supporting supports technical than that their there this through tidlist time transactions update version vldb which wide will with world xiao xpath yang http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420563abs.htm 101 Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining agglomerative agrawal algorithms approach association based beil best bisecting browsing candidate closed closet cluster clustering collections compare comparison conf conference cutting data databases discovery document dubes editors ester experiment fast fourth frequent fung gather generations hall herarchical icdm ieee information international itemset itemsets jain karger karypis kmeans knowledge kumar large management method methods mining other pages patterns pedersen prentice proc proceedings recent references retrieval rules scatter searching siam sigir sigkdd sigmod srikant steinbach strategies techiniques term text those tukey upgma using very vldb wang with without workshop http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420535abs.htm 94 Supervised Latent Semantic Indexing for Document Categorization Jian-Tao Sun1, Zheng Chen2, Hua-Jun Zeng2, Yu-Chang Lu1, Chun-Yi Shi1, Wei-Ying Ma2 accuracy acknowledgements algebra algorithm algorithms also american analysis ando annual applications approach approaches approximated automatically background based berry brien categorization category chao cheaper chemnitz classification comments comparable computation conclusions conference cost curves data deerwester development dimension dimensional dimensionality discriminant discussions document drastically dumais ecml editors european evaluation even examination fast features fective figure fourth furnas furthermore future ghani guang harshman heidelberg helpful hirsh hofmann hull hypertext icdm ieee improves improving indexing information intelligent inter international investigated iterative jing joachims journal karypis knowledge landauer larger latent leads learning linear locality lower machine machines macrof management many measurement methods microf mining much nedel neither ninth noise number optimize original outperform over pages paper performance precision presence preserving press probabilistic problem proceedings proposed proves reduce reducing reduction references relevant representation research result results retrieval routing rouveirol sacrificing scaling science semantic shen shows siam sigir similarity slattery slightly slsi society space springer statistical study supervised support systems tasks tenth text than thank that their theoretical this torkkola under used using vector verlag very wang when which will with without work workshop yang york zelikovitz http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420217abs.htm 27 MMAC: A New Multi-class, Multi-label Associative Classification Approach accuracy accurate agrawal algorithms amielinski amman anticipate approach approaches april artificial association associative australia austrian based bases been between boostexter boosting boutell brown business canberra carnegie categorisation categorization chakhlevitch clare class classification classifiers cmar collection comp company comparison complexity computation computer conclusions conference conferences conquer conserve consistent contain continuous cowling cpar creating data databases datasets decision department discovering discovery distinguishing download duda eastern editors effective efficient electronic employs ensures european evaluating evaluation evolutionary extending fast features fifteenth fourth francisco frank frequent from furnkranz further generating generation global hart heuristic heuristics high higher html http hyperheuristic hyperheuristics icdm ieee imaging indicated information institute integrates integrating intelligence international introduces items java joachims joint jose july kaufmann king knowledge kodak label labels large learn learning lecture less level machine machines madison management managing many mateo measures mellon method methods mining morgan most multi multilabel multiple notes ogihara ones only optimisation over part parthasarathy pattern peng performance personnel phase phenotype pkdd prediction predictive presents problem proceeding proceedings produces production products programs proposed prunes publishers quinlan raedt ranking rate redundant references relevant report require research ripper rochester rule rules runs runtime scan scene schapire schedule scheduling science semantic separate september sets shavlik shen shih siebes sigmod singer software springer srikant state statistical storage strok studies support swami system technical technique techniques tenth text thabtah than that thirtythree three time traditional training treat tree trees university used vector verlag very vienna volume waikato washington weka which wiley wisconsin with without witten work yang york zaki http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420051abs.htm 6 Test-Cost Sensitive Naive Bayes Classification acknowledgments active addition algorithm algorithms also another applications artificial attributes australia average background batch bayes bayesian blake both certain classification classifier classifiers committee comparisons competing complexity concept conclusions conditional conference consider considering cost costs costsensitive csnb data databases decision design designing develop diagnosis dietterich direction discovery disease domingos duda edition editor effective elkan empirical european evaluation example exercise experiments extensions figure finding foundations fourth framework fund future general generalize genetic grant greiner grove hart heart heuristic hill hong hybrid icdm icml ideas ieee ijcai improves including inducing induction inductive innovation instance intelligence intelligently interesting international journal kaufmann knowledge kong learning ling loss machine making markov mathematics mcgraw medical merz metacost method minimal minimize mining misclassification missing mitchell more morgan naive nets neural nunez operations optimality other outperforms pages papadimitriou paper patient pattern pazzani percentage plan principles proc proceedings processes programs proposed pruning publishers quinlan rates references repository research robotics roth search second selected sensitive sequential several show simple sons springer stork strategies stress such supported svms sydney symposium technology test testing that this training tree trees tsitsiklis turney types under unknown values varying verlag wang weighting whether which willey with work workshop worth yang york zero zhang zubek http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420146abs.htm 18 Transduction and typicalness for quality assessment of individual classifications in machine learning and data mining Matjaz Kukar University of Ljubljana, Faculty of Computer and Information Science, Trza ska 25, SI-1001 Ljubljana, Slovenia matjaz.kukar@fri.uni-lj.si acknowledgements active adaptive additional advance advances algorithm algorithmic algorithms almost also analysis applications applied approach artif artificial attributes australia based basic bayes bayesian bensuasan berlin besnard bound bounding calculate california cambridge carrier cases chapman characterizing choosing city class classifications classifier classifiers comparable comparison comparisons compensates complex complexity computational computationally compute computed conditional conf conference confidence continuous contrary correct credibility data dataset dean december decision density described diagnosis diagnostics differences different discrimination discussion distributed distributions does domains drawback each easier ecml edition editor editors education efficient elements eleventh elomaa errors estimate estimating estimation estimations european evaluation example experience experimental experiments fayyad figure foundations fourth frameworks francisco frequencies frequency friedman furnkranz future gain gammerman general generality gibbs giraud grading gray halck hall hanks hard hastie heidelberg however huge icdm ieee important improve improved improvement included incorrect incremental inductive integrating intell intelligence intelligent international interpret interpretability into john joint jones kaufmann kernel knearest kodratoff kononenko kukar landmarking langley learners learning leave levels limits london machine machines main makes mannila matter mcclelland mean medical melluish metalearning metrics mining ministry model modern modified more morgan mostly much myopia naive near nearest needs neighbour networks neural nouretdinov orlando other outperform overcoming pages paired parallel paring particular pattern pazzani perform performance performed performs pfahringer piscataway planning plus pobabilistic portland porto possible potrugal press prieditis probabilities probability problem problems proc proceedings processing proedrou prognostics propose proposed quite reach recognition reduction references regression regressioon relative reliability reliable relieff require requires respect resulting results retrofitting review ridge ripley risk robnik rogers romsdahl rumelhart russell saunders science seewald semi sensitive session several should show significant significantly sikonja simec similarly since slovenian smoothing smyth soybean specht specially springer springerverlag statistical statistics stockholm strict such supported sweden sydney symposium tahoe tailed temporary test testing than that their them theory there they thirteenth this tibisharani toivonen transduction transductive tree true twelvth typicalness uncertainty underlying unmodified used using values various venables verlag volume vovk wand weaknesses wechsler while with work working york http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420202abs.htm 25 Probabilistic Principal Surfaces for Yeast Gene Microarray Data Mining Antonino Staiano, Lara De Vinco, Angelo Ciaramella, Giancarlo Raiconi, Roberto Tagliaferri Dipartimento di Matematica ed Informatica Universita` di Salerno Via Ponte don Melillo, 84084 Fisciano (Sa), Italy {astaiano, ciaram, gianni, robtag}@unisa.it Roberto Amato, Giuseppe Longo, Ciro Donalek, Gennaro Miele Dipartimento di Scienze Fisiche Universita` Federico II di Napoli and INFN Napoli Unit Polo delle Scienze e della Tecnologia via Cintia 6, 80136 Napoli, Italy {longo, donalek, miele}@na.infn.it Diego Di Bernardo Telethon Institute for Genetics and Medicine Via Pietro Castellino 111 I-80131 Napoli, Italy dibernard@tigem.it academic algorithms analysis austin berlin bezdek bishop botstein brown chang cluster computation conference data dimensionality display eisen elements expression fourth friedman fuzzy generative genome ghosh hastie icdm ieee image inference intelligence international keller kluwer kohonen krisnapuram learning machine mapping maps mining model models neural nonlinear organizing pattern patterns pnas prediction principal probabilistic proceedings processing publisher recognition reduction references self spellman springer statistical surfaces svensen texas thesis tibshirani topographic transactions unified university using verlag wide williams http://csdl.computer.org/comp/proceedings/icdm/2004/2142/00/21420130abs.htm 16 Generation of Attribute Value Taxonomies from Data for Data-Driven Construction of Accurate and Compact Classifiers Dae-Ki Kang, Adrian Silvescu, Jun Zhang, and Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science Iowa State University, Ames, IA 50011 USA {dkkang, silvescu, junzhang, honavar}@iastate.edu abraham abstract abstraction accurate advanced advances agglomerative aiii algorithm algorithms ambiguous american analysis annual appear applications approach approximation artificial ashburner atramentov attribute attributeoriented automated baker ball based bases bayes bayesian behaviors berners beyond bias biology blake botstein bottleneck butler cactus categorical centric cherry classification classifiers classify clustering clusters colorful commerce compact computer concise conference consortium cstr daml data databases davis decision dept design detection development dhar discov discovery distributional dobbs document dolinski driven duda dwight dynamical edition editors electronic engineering english eppig experiments exploration fayyad fifth finin fourth framework franke friedman from function ganti gehrke geiger gene genetics gibson goldszmidt harris hart haussler helmer hendler hierarchies high hill honavar horvath icdm icml identification ieee implementation induction inductive information institute intelligence intelligent intelligible interaction international interscience intrusion intrusive involved isda issel issues january joshi journal kasarskis kaufmann kibler kleinberg knowl knowledge kohavi koppen langley language large lassila learn learning lecture leiva level levels lewis logic mach machine mani maryland matese mccallum meeting merz method miller mining morgan motif multi multiple naive national natural nature network neural nips notes ontology pages partially pattern pazzani pereira performance piatetsky pinkston power predictive press prior proceedings processing programming programs protein provost publishers quantifying quinlan raghavan ramakrishnan references reformulation relational report repository representation research residues retrieval review richardson ringwald role rubin rules schroeder science scientific semantic shankle shapiro sherlock sigir sigkdd sigmod silvescu slonim smyth softw specified springer stoffel stork studies summaries support surface symposium syst systems target tarver taxonomies taylor technical text thompson tishby tool transactions tree trees tuzhilin twentieth undercoffer unification university using uthurusamy valiant value vector verlag very vldb wang washington wiley wong word words workshop yamamoto yamazaki zhang