Yoh Okuno's Resume
Japanese Version
Profile
Accomplished, creative, experienced Software Engineer with expertise in natural language processing, machine learning, data mining, C/C++, Python, and Hadoop. Proven track record of applying theoretical knowledge demonstrated by academic papers and the development of open source software and commercial software. Language skills include Japanese (native) and English (intermediate).
Skills
- Programming Languages (Advanced): C/C++ (10 years), Java, Python, PHP
- Programming Languages (Intermediate): Perl, JavaScript, R, SQL
- Middleware: Hadoop, Thrift, Apache, MySQL, SWIG, memcached, NLTK, mecab, marisa-trie
- Platforms: Linux, FreeBSD, Mac OS X, Windows, Android
- Natural Languages: English (intermediate; TOEIC IP score of 820), Japanese (native)
Experiences
Yahoo Japan Corporation, 2009-Present
Software Engineer, R&D Department
Designed a phrase extraction technique for a predictive input method, including spam filtering, morphological analysis, and pronunciation inference algorithms; achieved 0.90 precision and 0.81 recall, reduced size of predictive dictionary by 80%.
Successfully developed and published the results on a data processing system for 1TB Japanese blog corpus crawled from the Internet and N-gram (N=1 to 7) counting program using Hadoop MapReduce. Created an algorithm 2x faster than a naive approach, achieved 5.65 bit cross entropy in the best case.
Built a prototype of statistical kana kanji conversion engine using C++ including predictive input method and spelling correction components. Developed high speed and memory-efficient dictionary compression and search engine, reducing memory consumption by 40%. Achieved 93% conversion accuracy (F score by character), outperforming Google Japanese IME (Mozc).
Exploratory Software Project, 2007-2008
Developer of Social IME: Cloud-Based Japanese Input Method (in Japanese)
Developed, implemented, and published a framework combining two technologies, cloud computing and input method; enables people to effectively share dictionaries on servers.
Reduced input time by 21% and keystrokes by 26% with predictive input method. Achieved 18 million accesses per month with over 7 million unique users per month.
Computer Game Development, 2006
Developed a computer game (in Japanese), a curtain fire shooting game with beautiful graphics. Wrote over 20,000 lines of C/C++ to implement game logic and general framework. Achieved high performance of 60 FPS in slow laptops. Sold to over 1,000 customers for total sales of one million yen, an exceptional case for an individual product.
Publications (First Author)
- Applying mpaligner to Statistical Machine Transliteration with Japanese-Specific Heuristics, The 4th Named Entities Workshop, in the 50th Annual Meeting of the Association of Computational Linguistics, 2012 (to be appeared).
- Phrase Extraction for Japanese Predictive Input Method as Post-Processing, Workshop on Advances in Text Input Methods, in the 5th International Joint Conference on Natural Language Processing, 2011.
- Spell Generation based on Edit Distance, Spelling Alteration for Web Search Workshop, 2011.
- Language Model Building and Evaluation using A Large-Scale Japanese Blog Corpus (in Japanese), The 17th Annual Meeting of The Association for Natural Language Processing, 2011.
- Japanese Input Method based on the Internet (in Japanese), Information Processing Society of Japan, Special Interest Group on Natural Language, No.190, 2009.
Achievements
- Winner of 5th place, Microsoft Speller Challenge, 2011.
- Founder of TokyoNLP, Tokyo, Japan, 2010-Present
- Program Committee, Workshop on Advances in Text Input Methods, 2011.
- Supervisor, Technologies inside Japanese Input Methods (in Japanese), Gihyo, 2012.
- Supervisor, Mining the Social Web (in Japanese), O'Reilly, 2011.
- Supervisor, Natural Language Processing with Python (in Japanese), O'Reilly, 2010.
- Supervisor, Introduction to Machine Learning for Language Processing (in Japanese), Corona, 2010.
- TopCoder rating: 1480, 2011.
Education
Keio University, Faculty of Science and Technology, Department of Information and Computer Science, Hagiwara Laboratory, Tokyo, Japan
-
Master of Computer Science (Emphasis in Natural Language Processing), 2009
Thesis: Japanese Input Method based on the Internet
-
Bachelor of Computer Science (Emphasis in Machine Learning), 2007
Thesis: Neural Network for Collaborative Filtering
Algorithms
Here is a list of algorithms which I have implemented.
Natural Language Processing
- Decoder: Viterbi Algorithm, N-beset Search, Beam Search
- Language Model: N-gram Counting via Hadoop MapReduce, Smoothing such as Kneser-Ney and Witten-Bell, Evaluation via Perplexity
- Structured Learning: Discriminative Models such as Structured Perceptron and Structured SVM
- Kana to Kanji Conversion: Class bigram model + Viterbi Algorithm + Compressed Trie
- Spelling Correction: Noisy Channel Model + Edit Distance + Trie Data Structure
- Spam Filter: Spam Blog Detection via SVM and bag-of-words feature
- Named Entity Recognition: Person, Location and Organization Name Recognition via CRF
- Transliteration: Monotone Alignment and Structured MIRA
Machine Learning
- SGD (Stochastic Gradient Descent): Perceptron, Logistic Regression, Online SVM
- Supervised Learning: Neural Network, Kernel Regression
- Clustering: K-means, Hierarchical Clustering, Self Organized Map
- EM Algorithm: Gaussian Mixture, pLSA (Latent Semantic Analysis), HMM (Hidden Markov Model), IBM Model
- Large-Scale Learning: PageRank, K-means, and Averaged Perceptron via MapReduce
- Optimization: Genetic Algorithm, Simulated Annealing, Gibbs Sampling
Contact Information