Email: vitor@cs.cmu.edu
Webpage: http://www.cs.cmu.edu/~vitor
Text and Data Mining, Machine Learning, Information Extraction, Information Retrieval, Email Management.
Ph.D., Computer Science (area: Language Technologies), Carnegie Mellon University, Summer 2008 (expected)
Advisor: William W. Cohen
M.S., Computer Science (area: Language Technologies), Carnegie Mellon University, 2005
M.S., Electrical Engineering (area: Telecommunications), Universidade Estadual de Campinas, Brazil, 2000
B.S., Electrical Engineering, Universidade Federal de Pernambuco, Brazil, 1998
Microsoft Research, Redmond, WA (05/2005 – 08/2005)
As a research intern , worked with Joshua Goodman and Scott Wen-tau Yih developing methods and features for an Implicit Query system for emails (see CEAS 2006 paper ) and helped extend similar ideas to the task of finding advertisement keywords in webpages (see WWW 2006 paper ).
Ericsson Research & Development Center, Sao Paulo, Brazil (03/2000 – 08/2003) o Software Maintenance Technical Coordinator and Systems Analyst
Team coordinator responsible for software maintenance activities (code fix assignments, code reviews, controlling release schedules, interface to field problems, etc.) and implementation of new software features for the IS-95 CDMA Cellular Radio Base Stations (RBS). Also responsible for maintenance and bug-fixes for the IS-95 CDMA RBSs. o Software Engineer at Ericsson Wireless , Boulder, Colorado (04/2002 – 12/2002)
Software design, development and maintenance for the cellular RBSs based on 1x-EVDO and
CDMA2000 cellular standards.
Ranking Users for Intelligent Message Addressing , 2008
Vitor R. Carvalho and William W. Cohen
ECIR-2008 (European Conference on Information Retrieval), Glasgow, Scotland.
Fast Learning of Document Ranking Functions with the Committee Perceptron , 2008
Jonathan Elsas, Vitor R. Carvalho and Jaime Carbonell
WSDM-2008 (ACM International Conference on Web Search and Data Mining), Stanford, CA.
Preventing Information Leaks in Email , 2007
Vitor R. Carvalho and William W. Cohen
SDM-2007 (SIAM International Conference on Data Mining), Minneapolis, MN.
Online Stacked Graphical Learning , 2007
Zhenzhen Kou, Vitor R. Carvalho and William W. Cohen
NIPS-2007 Workshop on Efficient Machine Learning, Vancouver, Canada.
Discovering Leadership Roles in Email Workgroups, 2007
Vitor R. Carvalho, Wen Wu and William W. Cohen
CEAS-2007 (Conference on Email and Anti-spam), Mountain View, CA.
Single-Pass Online Learning: Performance, Voting Schemes and Online Feature Selection , 2006
Vitor R. Carvalho and William W. Cohen
KDD-2006 (Knowledge Discovery and Data Mining Conference), Philadelphia, PA.
Finding Advertising Keywords on Web Pages , 2006
Wen Tau-Wih, Joshua Goodman and Vitor R. Carvalho
WWW-2006 (World Wide Web Conference), Edinburgh, Scotland.
Improving “Email Speech Act” Analysis via N-gram Selection , 2006
Vitor R. Carvalho and William W. Cohen
HLT-NAACL-2006 (Human Language Technology conference - North American chapter of the Association for Computational Linguistics), ACTS Workshop, New York City, NY.
On the Collective Classification of Email “Speech Acts” , 2005
Vitor R. Carvalho and William W. Cohen
SIGIR-2005 (Conference on Research and Development in Information Retrieval), Salvador, Brazil.
Stacked Sequential Learning , 2005
William W. Cohen and Vitor R. Carvalho
IJCAI-2005 (International Joint Conference on Artificial Intelligence), Edinburgh, Scotland.
Implicit Queries for Email , 2005
Joshua Goodman and Vitor R. Carvalho
CEAS-2005 (Conference on Email and Anti-Spam), Stanford, CA.
Learning to Classify Email into ”Speech Acts” , 2004
William W. Cohen, Vitor R. Carvalho and Tom M. Mitchell
EMNLP-2004 (Empirical Methods in Natural Language Processing), Barcelona, Spain.
Inferring Ongoing Activities of Workstation Users by Clustering Email , 2004
Yifen Huang, Dinesh Govindaraju, Tom Mitchell, Vitor Rocha de Carvalho, William Cohen
CEAS-2004 (Conference on Email and Anti-Spam), Mountain View, CA.
Learning to Extract Signature and Reply Lines from Email , 2004
Vitor R. Carvalho and William W. Cohen
CEAS-2004 (Conference on Email and Anti-Spam), Mountain View, CA.
Capacity analysis of an ARQ scheme for multimedia DS-CDMA systems , 2000
V. R. de Carvalho and C. de Almeida
IEE Proceedings – Communications , Vol. 147, Issue 4, pp. 201 – 204.
Prior 2000 references available at http://www.cs.cmu.edu/~vitor/publications.html
Organizer :
EMAIL-2008: first Workshop on Enhanced Messaging – AAAI-2008
Information Retrieval Discussion Series, CMU, 2005-2006
Machine Learning Lunch Seminars, CMU, 2007-present
Language Technologies Student Research Symposium, CMU, 2004
Program Committee:
WWW-2008
CEAS-2008
AAAI-2007
CEAS-2007
ACL-COLING-2006
CEAS-2006
CEAS-2005
Reviewer :
TOIS (ACM
2007
NESCAI-2007 Colloquium
Journal of Natural Language Engineering, 2006
ACL, 2005
IEEE Transactions on Knowledge and Data Engineering Journal, 2005
Journal of Intelligent Information Systems, 2005
ACM Transactions on Internet Technology Journal, 2005
ACM Transactions on Internet Technology Journal, 2004
Developed Ciranda , a Java package for email-speech-act prediction.
Developed Jangada , a Java package for extraction of signatures (sig files) and reply-to (quotes) lines in email messages. Jangada is currently being used by a few start-ups.
Contributed to Minorthird , a Java library of machine learning, information extraction and annotation tools written mostly by William W. Cohen .
Cut Once , a Mozilla Thunderbird plug-in for Email information leak and CC prediction.
US Patent: Implicit Queries for Electronic Documents . – Pending ( MS313217.02 - MSFTP1013USA)
US Patent: Web Document Keyword and Phrase Extraction – Pending MS318504.01 (MSFTP1543US)
Student Travel Award – International SIAM Data Mining Conference, 2007.
Honorable Mention - Student Research Symposium of the Language Technologies Inst., CMU, 2006
Scholarship for the M.S. program by FAPESP and CAPES (Brazilian funding agencies), 1998 – 2000.
Scholarship “Scientific Initiation” by CNPq (Brazilian funding agency), 1996 to 1997.
Scholarship “Scientific Initiation” by FACEPE (Brazilian funding agency), 1994 to 1995.
Scholarship “Scientist of the Future” by FACEPE (Brazilian funding agency), 1993. Scholarship granted to the top 3 electrical engineering applicants based on college admission exams.
Teaching Assistant, Carnegie Mellon University, Pittsburgh, PA (Fall, 2007)
Machine learning course taught by Prof. Roni Rosenfeld . Graduate/undergraduate level.
Teaching Assistant, Carnegie Mellon University, Pittsburgh, PA (Spring, 2007)
Information Extraction course taught by Prof. William Cohen . Graduate level.
Research Assistant at Carnegie Mellon University, Pittsburgh, PA (2003 - present)
Performed research on applying machine learning techniques to email communication as part of the
CALO/RADAR/Text Learning group headed by Profs. William Cohen and Tom Mitchell .
Research Assistant, Universidade Estadual de Campinas, Sao Paulo, Brazil (1998 - 2000)
Conducted research on adaptations of wireless communications techniques to optical networks. In particular, performed Monte Carlo simulations for different error-correcting codes and Automatic
Repeat Request schemes in CDMA systems.
Teacher Assistant, Universidade Estadual de Campinas, Sao Paulo, Brazil (1999)
Laboratory of Digital Circuits II – held office hours, prepared lab manuals and experiments.
Undergraduate Research Assistant, Physics Department, Univ. Federal de Pernambuco, (1996-
1997)
Performed computational simulations, numerical analysis and physical modeling at the Laboratory of
Cold Cesium Atoms and Non-Linear Optics.
Prepared the laboratory tools/experiments at the Laboratory of Digital Signal Processing and at the
Laboratory of Magnetic Resonance Imaging.
Portuguese (native), English (fluent), Spanish (advanced)